GSI2015
About
LIX Colloquium 2015 conferences

Provide an overview on the most recent stateoftheart

Exchange mathematical information/knowledge/expertise in the area

Identify research areas/applications for future collaboration

Identify academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will unify skills from Geometry, Probability and Information Theory. The conference proceedings are published in Springer's Lecture Note in Computer Science (LNCS) series.
Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI 
Provisional Topics of Special Sessions:

Manifold/Topology Learning

Riemannian Geometry in Manifold Learning

Optimal Transport theory and applications in Imagery/Statistics

Shape Space & Diffeomorphic mappings

Geometry of distributed optimization

Random Geometry/Homology

Hessian Information Geometry

Topology and Information

Information Geometry Optimization

Divergence Geometry

Optimization on Manifold

Lie Groups and Geometric Mechanics/Thermodynamics

Quantum Information Geometry

Infinite Dimensional Shape spaces

Geometry on Graphs

Bayesian and Information geometry for inverse problems

Geometry of Time Series and Linear Dynamical Systems

Geometric structure of Audio Processing

Lie groups in Structural Biology

Computational Information Geometry
Committees
Secrétaire
 Valérie Alidor  SEE, France https://www.see.asso.fr
Webmestre
 Jean Vieille  SyntropicFactory http://www.syntropicfactory.com
Program chairs
 Frédéric Barbaresco  Thales, France http://www.thalesgroup.com
 Frank Nielsen  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
Scientific committee
 PierreAntoine Absil  University of Louvain, Belgium http://sites.uclouvain.be/absil/
 Bijan Afsari  Johns Hopkins University, USA http://www.cis.jhu.edu/~bijan/
 Stéphanie Allassonnière  Ecole Polytechnique, France https://sites.google.com/site/stephanieallassonniere/home
 Shunichi Amari  RIKEN, Japan http://www.brain.riken.jp/labs/mns/amari/homeE.html
 Jesus Angulo  Mines ParisTech, France http://cmm.ensmp.fr/~angulo/
 JeanPhilippe Anker  Université d'Orléans, France http://www.univorleans.fr/mapmo/membres/anker/
 Sylvain Arguillère  John Hopkins University, USA http://www.cis.jhu.edu/~arguille/
 Marc Arnaudon  Université de Bordeaux, France http://www.math.ubordeaux1.fr/~marnaudo/
 Dena Asta  Carnegie Mellon University, USA http://www.stat.cmu.edu/~dasta/
 Michael Aupetit  Qatar Computing Research Institute, Quatar http://michael.aupetit.free.fr/
 Roger Balian  Academy of Sciences, France https://en.wikipedia.org/wiki/Roger_Balian
 Trivellato Barbara  Politecnico di Torino, Italy http://calvino.polito.it/~trivellato/
 Frédéric Barbaresco  Thales, France http://www.thalesgroup.com
 Michèle Basseville  IRISA, France http://people.irisa.fr/Michele.Basseville/
 Pierre Baudot  Max Planck Institute for Mathematic in the Sciences http://www.mis.mpg.de/jjost/members/pierrebaudot.html
 Martin Bauer  University of Vienna, Austria http://mat.univie.ac.at/~bauerm/Home_Page_of_Martin_Bauer/Home.html
 Roman Belavkin  Middlesex University, UK http://www.eis.mdx.ac.uk/staffpages/rvb/
 Daniel Bennequin  ParisDiderot University http://webusers.imjprg.fr/~daniel.bennequin/
 Jérémy Bensadon  LRI, France https://www.lri.fr/~bensadon/
 JeanFrançois Bercher  ESIEE, France http://perso.esiee.fr/~bercherj/
 Yannick Berthoumieu  IMS Université de Bordeaux, France https://sites.google.com/site/berthoumieuims/
 Jérémie Bigot  Université de Bordeaux, France https://sites.google.com/site/webpagejbigot/
 Michael Blum  IMAG, France http://membrestimc.imag.fr/Michael.Blum/
 Lionel Bombrun  IMS, Université de Bordeaux, France https://www.imsbordeaux.fr/fr/annuaire/4158bombrunlionel
 Silvère Bonnabel  MinesParistech http://www.silverebonnabel.com/
 Ugo Boscain  Ecole polytechnique, France http://www.cmapx.polytechnique.fr/~boscain/
 Nicolas Boumal  Inria & ENS Paris, France http://perso.uclouvain.be/nicolas.boumal/
 Charles Bouveyron  University Paris Descartes, France http://w3.mi.parisdescartes.fr/~cbouveyr/
 Michel Boyom  Université de Montpellier, France http://www.i3m.univmontp2.fr/
 Michel Broniatowski  University of Pierre and Marie Curie, France http://www.lsta.upmc.fr/Broniatowski/
 Martins Bruveris  Brunel University London, UK http://www.brunel.ac.uk/~mastmmb/
 Olivier Cappé  Telecom Paris, France http://perso.telecomparistech.fr/~cappe/
 Charles Cavalcante  Federal University of Ceará, Brazil http://www.ppgeti.ufc.br/charles/
 Antonin Chambolle  Ecole Polytechnique, France http://www.cmap.polytechnique.fr/~antonin/
 Frédéric Chazal  INRIA, France http://geometrica.saclay.inria.fr/team/Fred.Chazal/
 Emmanuel Chevallier  Mines ParisTech, France http://cmm.ensmp.fr/~chevallier/
 Sylvain Chevallier  IUT de Vélizy, France https://sites.google.com/site/sylvchev/
 Arshia Cont  Ircam, France http://repmus.ircam.fr/arshiacont
 Benjamin Couéraud  LAREMA Université d'Angers, France
 Philippe Cuvillier  Ircam, France http://repmus.ircam.fr/cuvillier
 Laurent Decreusefond  Telecom ParisTech, France http://www.infres.enst.fr/~decreuse/
 Alexis Decurninge  Huawei Technologies, Paris, France http://www.huawei.com/en/
 Michel Deza  Ecole Normale Supérieure Paris, CNRS, France http://www.liga.ens.fr/~deza/
 Stanley Durrleman  INRIA, France https://who.rocq.inria.fr/Stanley.Durrleman/index.html
 Patrizio Frosini  Università di Bologna, Italy http://www.dm.unibo.it/~frosini/
 Alfred Galichon  New York University, USA http://alfredgalichon.com/
 JeanPaul Gauthier  University of Toulon, France http://www.lsis.org/gauthierjp/
 Alexis Glaunès  Mines ParisTech, France http://www.mi.parisdescartes.fr/~glaunes/
 PierreYves Gousenbourger  Ecole Polytechnique de Louvain, Belgium http://www.uclouvain.be/pierreyves.gousenbourger
 Piotr Graczyk  University of Angers, France math.univangers.fr
 Peter Grunwald  CWI, Amsterdam, The Netherlands http://homepages.cwi.nl/~pdg/
 Nikolaus Hansen  INRIA, France www.lri.fr
 K V Harsha  Indian Institute of Space Science & Technology, India http://www.iist.ac.in/departments/
 Susan Holmes  Stanford University, USA http://statweb.stanford.edu/~susan/
 Wen Huang  University of Louvain, Belgium
 Stephan Huckemann  Institut für Mathematische Stochastik, Göttingen, Germany http://www.stochastik.math.unigoettingen.de/index.php?id=huckemann
 Shiro Ikeda  ISM, Japan http://www.ism.ac.jp/~shiro/
 Alexander Ivanov  Lomonosov Moscow State University, Russia  Imperial College, UK http://www.imperial.ac.uk/people/a.ivanov
 Jérémie Jakubowicz  Institut Mines Telecom, France http://wwwpublic.itsudparis.eu/~jakubowi/
 Martin Kleinsteuber  Technische Universität München, Germany http://www.professoren.tum.de/en/kleinsteubermartin/
 Ryszard Kostecki  Perimeter Institute for Theoretical Physics, Canada http://www.fuw.edu.pl/~kostecki/
 Hong Van Le  Mathematical Institute of ASCR, Czech Republik http://users.math.cas.cz/~hvle/
 Nicolas Le Bihan  Université de Grenoble, CNRS, France  University of Melbourne, Australia http://www.gipsalab.grenobleinp.fr/~nicolas.lebihan/
 Christian Léonard  Ecole Polytechnique, France http://www.cmap.polytechnique.fr/~leonard/
 Hervé Lombaert  INRIA, France http://step.polymtl.ca/~rv101/
 Jeanmichel Loubes  Toulouse University, France http://perso.math.univtoulouse.fr/loubes/
 Luigi Malagò  Shinshu University, Japan http://malago.di.unimi.it/
 Jonathan Manton  The University of Melbourne http://people.eng.unimelb.edu.au/jmanton/
 Matilde Marcolli  Caltech, USA http://www.its.caltech.edu/~matilde/
 JeanFrançois Marcotorchino  Thales, France https://www.thalesgroup.com/
 CharlesMichel Marle  Université Pierre et Marie Curie, France http://charlesmichel.marle.pagespersoorange.fr/
 Juliette Mattioli  THALES, France https://www.thalesgroup.com/en
 Bertrand Maury  Université Paris Sud, France http://www.math.upsud.fr/~maury/
 Quentin Mérigot  Université ParisDauphine / CNRS, France http://quentin.mrgt.fr/
 Fernand Meyer  Mines ParisTech, France fernandmeyer
 Klas Modin  Chalmers University of Technology, Göteborg, Sweden https://klasmodin.wordpress.com/
 Ali MohammadDjafari  Supelec, CNRS, France http://djafari.free.fr/
 Guido Montufar  Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany http://personalhomepages.mis.mpg.de/montufar/
 Subrahamanian Moosath  Indian Institute of Space Science and Technology, India http://www.iist.ac.in
 Eric Moulines  Telecom ParisTech, France http://perso.telecomparistech.fr/~moulines/
 Jan Naudts  Universiteit Antwerpen, Belgium https://www.uantwerpen.be/en/staff/jannaudts/mywebsite/
 Frank Nielsen  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
 Richard Nock  Université des Antilles et de la Guyane, France  NICTA, Australia http://www.univag.fr/rnock/index.html
 Yann Ollivier  Université Paris Sud, France http://www.yannollivier.org/
 JeanPhilippe Ovarlez  ONERA & SONDRA Lab, France http://www.jeanphilippeovarlez.com
 Bruno Pelletier  University of Rennes, France http://pelletierb.perso.math.cnrs.fr/
 Xavier Pennec  INRIA, France http://wwwsop.inria.fr/members/Xavier.Pennec/
 Michel Petitjean  Université Paris Diderot, CNRS, France http://petitjeanmichel.free.fr/itoweb.petitjean.html
 Gabriel Peyre  Université Paris Dauphine, CNRS, France http://gpeyre.github.io/
 Giovanni Pistone  Collegio Carlo Alberto, Moncalieri, Italie http://www.giannidiorestino.it/
 Julien Rabin  Université de Caen, France https://sites.google.com/site/rabinjulien/
 Tudor Ratiu  Ecole Polytechnique Federale de Lausanne, Swiss http://cag.epfl.ch/page39504en.html
 Johannes Rauh  Leibniz Universität hannover, Germany http://www2.iag.unihannover.de/~jrauh/index.php
 Olivier Rioul  Telecom ParisTech, France http://perso.telecomparistech.fr/~rioul/
 Said Salem  Université de Bordeaux, France https://www.imsbordeaux.fr/fr/annuaire/4069saidsalem
 Alessandro Sarti  Ecole des hautes études en sciences sociales, France http://cams.ehess.fr/document.php?id=1194
 Gery de Saxcé  Université des Sciences et des Technologies de Lille, France http://www.univlille1.fr/
 Olivier Schwander  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~schwander/en/
 Rodolphe Sepulchre  Cambridge University, Department of Engineering, UK http://wwwcontrol.eng.cam.ac.uk/Main/RodolpheSepulchre
 Hichem Snoussi  Université de Technologie de Troyes, France http://h.snoussi.free.fr/
 Anuj Srivastava  Florida State University, USA http://stat.fsu.edu/~anuj/
 Udo von Toussaint  MaxPlanckInstitut fuer Plasmaphysik, Garching, Germany http://home.rzg.mpg.de/~udt/
 Emmanuel Trelat  UPMC, France https://www.ljll.math.upmc.fr/trelat/
 Alain Trouvé  ENS Cachan, France http://atrouve.perso.math.cnrs.fr/
 Corinne Vachier  Université Paris Est Créteil, France www.upec.fr
 Claude Vallée  Poitiers University, France http://www.univpoitiers.fr/
 Geert Verdoolaege  Ghent University, Belgium http://www.ugent.be/ea/appliedphysics/en/research/fusion/personal_pages.htm/verdoolaege.htm
 JeanPhilippe Vert  Mines ParisTech, France http://cbio.ensmp.fr/~jvert/
 FrançoisXavier Vialard  Ceremade, Paris, France https://www.ceremade.dauphine.fr/~vialard/
 Rui Vigelis  Universidade Federal do ceará, Brazil
 Stephan Weis  Unicamp, Brazil http://www.stephanweis.info
 Laurent Younes  John Hopkins University, USA http://www.cis.jhu.edu
 Jun Zhang  University of Michigan, Ann Arbor, USA http://www.lsa.umich.edu/psych/junz/
Links
Documents
Opening Session (chaired by Frédéric Barbaresco)
Geometric Science of Information SEE/SMAI GSI’15 Conference LIX Colloquium 2015 Frédéric BARBARESCO* & Frank Nielsen** GSI’15 General Chairmen (*) President of SEE ISIC Club (Ingéniérie des Systèmes d’Information de Communications) (**) LIX Department, Ecole Polytechnique Société de l'électricité, de l'électronique et des technologies de l'information et de la communication Flashback GSI’13 Ecole des Mines de Paris Hirohiko Shima JeanLouis Koszul ShinIchi Amari SEE at a glance • Meeting place for science, industry and society • An officialy recognised nonprofit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 19 «Clubs techniques» and 12 «Groupes régionaux» • Organizes conferences and seminars • Initiates/attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, BlancLapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 3 periodical publications (REE, …) & 3 monographs each year • Web: http://www.see.asso.fr and LinkedIn SEE group • SEE Presidents: Louis de Broglie, Paul Langevin, … 18832015: From SIE & SFE to SEE: 132 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 Louis de Broglie Paul Langevin GSI’15 Sponsors GSI Logo: Adelard of Bath • He left England toward the end of the 11th century for Tours in France • Adelard taught for a time at Laon, leaving Laon for travel no later than 1109. • After Laon, he travelled to Southern Italy and Sicily no later than 1116. • Adelard also travelled extensively throughout the "lands of the Crusades": Greece, West Asia, Sicily, Spain, and potentially Palestine. The frontispiece of an Adelard of Bath Latin translation of Euclid's Elements, c. 1309– 1316; the oldest surviving Latin translation of the Elements is a 12thcentury translation by Adelard from an Arabic version Adelard of Bath was the first to translate Euclid’s Elements in Latin Adelard of Bath has introduced the word « Algorismus » in Latin after his translation of Al Khuwarizmi SMAI/SEE GSI’15 • More than 150 attendees from 15 different countries • 85 scientific presentations on 3 days • 3 keynote speakers • Mathilde MARCOLLI (CallTech): “From Geometry and Physics to Computational Linguistics” • Tudor RATIU (EPFL): “Symmetry methods in geometric mechanics” • Marc ARNAUDON (Bordeaux University): “Stochastic EulerPoincaré reduction” • 1 Short Course • Chaired by Roger BALIAN • Dominique SPEHNER (Grenoble University): “Geometry on the set of quantum states and quantum correlations” • 1 Guest speaker • CharlesMichel MARLE (UPMC): “Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems” • Social events: • Welcome cocktail at Ecole Polytechnique • Diner in Versailles Palace Gardens GSI’15 Topics • GSI’15 federates skills from Geometry, Probability and Information Theory: • Dimension reduction on Riemannian manifolds • Optimal Transport and applications in Imagery/Statistics • Shape Space & Diffeomorphic mappings • Random Geometry/Homology • Hessian Information Geometry • Topological forms and Information • Information Geometry Optimization • Information Geometry in Image Analysis • Divergence Geometry • Optimization on Manifold • Lie Groups and Geometric Mechanics/Thermodynamics • Computational Information Geometry • Lie Groups: Novel Statistical and Computational Frontiers • Geometry of Time Series and Linear Dynamical systems • Bayesian and Information Geometry for Inverse Problems • Probability Density Estimation GSI’15 Program GSI’15 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 9389 (800 pages), ISBN 9783319250397 • http://www.springer.com/us/book/9783319250397 GSI’15 Special Issue • Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI • http://www.mdpi.com/journal/entropy/special_issues/entropystatistics • A book could be edited by MDPI: e.g. Ecole Polytechnique • Special thanks to « LIX » Department A product of the French Revolution and the Age of Enlightenment, École Polytechnique has a rich history that spans over 220 years. https://www.polytechnique.edu/en/history Henri Poincaré – X1873 ParisSaclay University in Top 8 World Innovation Hubs http://www.technologyreview.com/news/517626/ infographictheworldstechnologyhubs/ A new Grammar of Information “Mathematics is the art of giving the same name to different things” – Henri Poincaré GROUP EVERYWHERE Elie Cartan Henri Poincaré METRIC EVERYWHERE Maurice Fréchet Misha Gromov “the problems addressed by Elie Cartan are among the most important, most abstract and most general dealing with mathematics; group theory is, so to speak, the whole mathematics, stripped of its material and reduced to pure form. This extreme level of abstraction has probably made my presentation a little dry; to assess each of the results, I would have had virtually render him the material which he had been stripped; but this refund can be made in a thousand different ways; and this is the only form that can be found as well as a host of various garments, which is the common link between mathematical theories that are often surprised to find so near” H. Poincaré Elie Cartan: Group Everywhere (Henri Poincaré review of Cartan’s Works) Maurice Fréchet: Metric Everywhere • Maurice Fréchet made major contributions to the topology of point sets and introduced the entire concept of metric spaces. • His dissertation opened the entire field of functionals on metric spaces and introduced the notion of compactness. • He has extended Probability in Metric space 1948 (Annales de l’IHP) Les éléments aléatoires de nature quelconque dans un espace distancié Extension of Probability/Statistic in abstract/Metric space GSI’15 & Geometric Mechanics • The master of geometry during the last century, Elie Cartan, was the son of Joseph Cartan who was the village blacksmith. • Elie recalled that his childhood had passed under “blows of the anvil, which started every morning from dawn”. • We can imagine easily that the child, Elie Cartan, watching his father Joseph “coding curvature” on metal between the hammer and the anvil, insidiously influencing Elie’s mind with germinal intuition of fundamental geometric concepts. • The etymology of the word “Forge”, that comes from the late XIV century, “a smithy”, from Old French forge “forge, smithy” (XII century), earlier faverge, from Latin fabrica “workshop, smith’s shop”, from faber (genitive fabri) “workman in hard materials, smith”. HAMMER = The CoderANVIL = Curvature Libraries Bigorne Bicorne Venus at the Forge of Vulcan, Le Nain Brothers, Musée SaintDenis, Reims From Homo Sapiens to Homo Faber “Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson Into the Flaming Forge of Vulcan, Diego Velázquez, Museo Nacional del Prado Geometric Thermodynamics & Statistical Physics Enjoy all « Geometries » (Dinner at Versailles Palace Gardens) Restaurant of GSI’15 Gala Dinner André Le Nôtre Landscape Geometer of Versailles the Apex of “Le Jardin à la française” Louis XIV Patron of Science The Royal Academy of Sciences was established in 1666 On 1st September 1715, 300 years ago, Louis XIV passed away at the age of 77, having reigned for 72 years Keynote Speakers Prof. Mathilde MARCOLLI (CALTECH, USA) From Geometry and Physics to Computational Linguistics Abstact: I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic structure and language evolution, within the context of Chomsky's Principles and Parameters framework. Biography: • Laurea in Physics, University of Milano, 1993 • Master of Science, Mathematics, University of Chicago, 1994 • PhD, Mathematics, University of Chicago, 1997 • Moore Instructor, Massachusetts Institute of Technology, 19972000 • Associate Professor (C3), Max Planck Institute for Mathematics, 20002008 • Professor, California Institute of Technology, 2008present • Distinguished Visiting Research Chair, Perimeter Institute for Theoretical Physics, 2013present . Talk chaired by Daniel Bennequin Keynote Speakers Prof. Marc ARNAUDON (Bordeaux University, France) Stochastic EulerPoincaré reduction Abstact: We will prove a EulerPoincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the NavierStokes equation on a bounded domain and the CamassaHolm equation. Biography: Marc Arnaudon was born in France in 1965. He graduated from Ecole Normale Supérieure de Paris, France, in 1991. He received the PhD degree in mathematics and the Habilitation à diriger des Recherches degree from Strasbourg University, France, in January 1994 and January 1998 respectively. After postdoctoral research and teaching at Strasbourg, he began in September 1999 a full professor position in the Department of Mathematics at Poitiers University, France, where he was the head of the Probability Research Group. In January 2013 he left Poitiers and joined the Department of Mathematics of Bordeaux University, France, where he is a full professor in mathematics. Talk chaired by Frank Nielsen Keynote Speakers Prof. Tudor RATIU (EPFL, Switzerland) Symmetry methods in geometric mechanics Abstact: The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. Biography: • BA in Mathematics, University of Timisoara, Romania, 1973 • MA in Applied Mathematics, University of Timisoara, Romania, 1974 • Ph.D. in Mathematics, University of California, Berkeley, 1980 • T.H. Hildebrandt Research Assistant Professor, University of Michigan, Ann Arbor, USA 19801983 • Associate Professor of Mathematics, University of Arizona, Tuscon, USA 1983 1988 • Professor of Mathematics, University of California, Santa Cruz, USA, 19882001 • Chaired Professor of Mathematics, Ecole Polytechnique Federale de Lausanne, Switzerland, 1998  present • Professor of Mathematics, Skolkovo Institute of Science and Technonology, Moscow, Russia, 2014  present Talk chaired by Xavier Pennec Short Course Prof. Dominique SPEHNER (Grenoble University) Geometry on the set of quantum states and quantum correlations Abstact: I will show that the set of states of a quantum system with a finite dimensional Hilbert space can be equipped with various Riemannian distances having nice properties from a quantum information viewpoint, namely they are contractive under all physically allowed operations on the system. The corresponding metrics are quantum analogs of the Fisher metric and have been classified by D. Petz. Two distances are particularly relevant physically: the BogoliubovKuboMori distance studied by R. Balian, Y. Alhassid and H. Reinhardt, and the Bures distance studied by A. Uhlmann and by S.L. Braunstein and C.M. Caves. The latter gives the quantum Fisher information playing an important role in quantum metrology. A way to measure the amount of quantum correlations (entanglement or quantum discord) in bipartite systems (that is, systems composed of two parties) with the help of these distances will be also discussed. Biography: • Diplôme d'Études Approfondies (DEA) in Theoretical Physics at the École Normale Supérieure de Lyon, 1994 • Civil Service (Service National de la Coopération), Technion Institute of Technology, Haifa, Israel, 19951996 • PhD in Theoretical Physics, Université Paul Sabatier, Toulouse, France, 1996 2000. • Postdoctoral fellow, Pontificia Universidad Católica, Santiago, Chile, 20002001 • Research Associate, University of DuisburgEssen, Germany, 20012005 • Maître de Conférences, Université Joseph Fourier, Grenoble, France, 2005present • Habilitation à diriger des Recherches (HDR), Université Grenoble Alpes, 2015 • Member of the Institut Fourier (since 2005) and the Laboratoire de Physique et Modélisation des Milieux Condensés (since 2013) of the university Grenoble Alpes, France Talk chaired by Roger Balian Guest Speakers Prof. CharlesMichel MARLE (UPMC, France) Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems Abstact: I will present some tools in Symplectic and Poisson Geometry in view of their applications in Geometric Mechanics and Mathematical Physics. Lie group and Lie algebra actions on symplectic and Poisson manifolds, momentum maps and their equivariance properties, first integrals associated to symmetries of Hamiltonian systems will be discussed. Reduction methods taking advantage of symmetries will be discussed. Biography: CharlesMichel Marle was born in 1934; He studied at Ecole Polytechnique (19531955), Ecole Nationale Supérieure des Mines de Paris (19571958) and Ecole Nationale Supérieure du Pétrole et des Moteurs (19571958). He obtained a doctor's degree in Mathematics at the University of Paris in 1968. From 1959 to 1969 he worked as a research engineer at the Institut Français du Pétrole. He joined the Université de Besançon as Associate Professor in 1969, and the Université Pierre et Marie Curie, first as Associate Professor (1975) and then as full Professor (1981). His resarch works were first about fluid flows through porous media, then about Differential Geometry, Hamiltonian systems and applications in Mechanics and Mathematical Physics. Talk chaired by Frédéric Barbaresco
Keynote speach Matilde Marcolli (chaired by Daniel Bennequin)
Keywords =
Abstract
From Geometry and Physics to Computational Linguistics Matilde Marcolli Geometric Science of Information, Paris, October 2015 Matilde Marcolli Geometry, Physics, Linguistics A Mathematical Physicist’s adventures in Linguistics Based on: 1 Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 2 Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 3 Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 4 Sharjeel Aziz, VyLuan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you Matilde Marcolli Geometry, Physics, Linguistics What is Linguistics? • Linguistics is the scientiﬁc study of language  What is Language? (langage, lenguaje, ...)  What is a Language? (lange, lengua,...) Similar to ‘What is Life?’ or ‘What is an organism?’ in biology • natural language as opposed to artiﬁcial (formal, programming, ...) languages • The point of view we will focus on: Language is a kind of Structure  It can be approached mathematically and computationally, like many other kinds of structures  The main purpose of mathematics is the understanding of structures Matilde Marcolli Geometry, Physics, Linguistics • How are di↵erent languages related? What does it mean that they come in families? • How do languages evolve in time? Phylogenetics, Historical Linguistics, Etymology • How does the process of language acquisition work? (Neuroscience) • Semiotic viewpoint (mathematical theory of communication) • Discrete versus Continuum (probabilistic methods, versus discrete structures) • Descriptive or Predictive? to be predictive, a science needs good mathematical models Matilde Marcolli Geometry, Physics, Linguistics A language exists at many di↵erent levels of structure An Analogy: Physics looks very di↵erent at di↵erent scales: General Relativity and Cosmology ( 1010 m) Classical Physics (⇠ 1 m) Quantum Physics ( 10 10 m) Quantum Gravity (10 35 m) Despite dreams of a Uniﬁed Theory, we deal with di↵erent mathematical models for di↵erent levels of structure Matilde Marcolli Geometry, Physics, Linguistics Similarly, we view language at di↵erent “scales”: units of sound (phonology) words (morphology) sentences (syntax) global meaning (semantics) We expect to be dealing with di↵erent mathematical structures and di↵erent models at these various di↵erent levels Main level I will focus on: Syntax Matilde Marcolli Geometry, Physics, Linguistics Linguistics view of syntax kind of looks like this... Alexander Calder, Mobile, 1960 Matilde Marcolli Geometry, Physics, Linguistics Modern Syntactic Theory: • grammaticality: judgement on whether a sentence is well formed (grammatical) in a given language, ilanguage gives people the capacity to decide on grammaticality • generative grammar: produce a set of rules that correctly predict grammaticality of sentences • universal grammar: ability to learn grammar is built in the human brain, e.g. properties like distinction between nouns and verbs are universal ... is universal grammar a falsiﬁable theory? Matilde Marcolli Geometry, Physics, Linguistics Principles and Parameters (Government and Binding) (Chomsky, 1981) • principles: general rules of grammar • parameters: binary variables (on/o↵ switches) that distinguish languages in terms of syntactic structures • Example of parameter: headdirectionality (headinitial versus headﬁnal) English is headinitial, Japanese is headﬁnal VP= verb phrase, TP= tense phrase, DP= determiner phrase Matilde Marcolli Geometry, Physics, Linguistics ...but not always so clearcut: German can use both structures auf seine Kinder stolze Vater (headﬁnal) or er ist stolz auf seine Kinder (headinitial) AP= adjective phrase, PP= prepositional phrase • Corpora based statistical analysis of headdirectionality (Haitao Liu, 2010): a continuum between headinitial and headﬁnal Matilde Marcolli Geometry, Physics, Linguistics Examples of Parameters Headdirectionality Subjectside Prodrop Nullsubject Problems • Interdependencies between parameters • Diachronic changes of parameters in language evolution Matilde Marcolli Geometry, Physics, Linguistics Dependent parameters • nullsubject parameter: can drop subject Example: among Latin languages, Italian and Spanish have nullsubject (+), French does not () it rains, piove, llueve, il pleut • prodrop parameter: can drop pronouns in sentences • Prodrop controls Nullsubject How many independent parameters? Geometry of the space of syntactic parameters? Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntax • Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 Databases of Syntactic Parameters of World Languages: 1 Syntactic Structures of World Languages (SSWL) http://sswl.railsplayground.net/ 2 TerraLing http://www.terraling.com/ 3 World Atlas of Language Structures (WALS) http://wals.info/ Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Data Sets how data cluster around topological shapes at di↵erent scales Matilde Marcolli Geometry, Physics, Linguistics Vietoris–Rips complexes • set X = {x↵} of points in Euclidean space EN, distance d(x, y) = kx yk = ( PN j=1(xj yj )2)1/2 • VietorisRips complex R(X, ✏) of scale ✏ over ﬁeld K: Rn(X, ✏) is Kvector space spanned by all unordered (n + 1)tuples of points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distances d(x↵i , x↵j ) ✏ Matilde Marcolli Geometry, Physics, Linguistics • inclusion maps R(X, ✏1) ,! R(X, ✏2) for ✏1 < ✏2 induce maps in homology by functoriality Hn(X, ✏1) ! Hn(X, ✏2) barcode diagrams: births and deaths of persistent generators Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntactic Parameters • Data: 252 languages from SSWL with 115 parameters • if consider all world languages together too much noise in the persistent topology: subdivide by language families • Principal Component Analysis: reduce dimensionality of data • compute Vietoris–Rips complex and barcode diagrams Persistent H0: clustering of data in components – language subfamilies Persistent H1: clustering of data along closed curves (circles) – linguistic meaning? Matilde Marcolli Geometry, Physics, Linguistics Sources of Persistent H1 • “Hopf bifurcation” type phenomenon • two di↵erent branches of a tree closing up in a loop two di↵erent types of phenomena of historical linguistic development within a language family Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of IndoEuropean Languages • Two persistent generators of H0 (IndoIranian, European) • One persistent generator of H1 Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Niger–Congo Languages • Three persistent components of H0 (Mande, AtlanticCongo, Kordofanian) • No persistent H1 Matilde Marcolli Geometry, Physics, Linguistics The origin of persistent H1 of IndoEuropean Languages? Naive guess: the AngloNorman bridge ... but lexical not syntactic Matilde Marcolli Geometry, Physics, Linguistics Answer: No, it is not the AngloNorman bridge! Persistent topology of the Germanic+Latin languages Matilde Marcolli Geometry, Physics, Linguistics Answer: It’s all because of Ancient Greek! Persistent topology with Hellenic (and IndoIranic) branch removed Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters as Dynamical Variables • Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSV Very uneven distribution across world languages Matilde Marcolli Geometry, Physics, Linguistics • Word order distribution: a neuroscience explanation?  D. Kemmerer, The crosslinguistic prevalence of SOV and SVO word orders reﬂects the sequential and hierarchical representation of action in Broca’s area, Language and Linguistics Compass, 6 (2012) N.1, 50–66. • Internal reasons for diachronic switch?  F.Antinucci, A.Duranti, L.Gebert, Relative clause structure, relative clause perception, and the change from SOV to SVO, Cognition, Vol.7 (1979) N.2 145–176. Matilde Marcolli Geometry, Physics, Linguistics Changes over time in Word Order • Ancient Greek: switched from Homeric to Classical  A. Taylor, The change from SOV to SVO in Ancient Greek, Language Variation and Change, 6 (1994) 1–37 • Sanskrit: di↵erent word orders allowed, but prevalent one in Vedic Sanskrit is SOV (switched at least twice by inﬂuence of Dravidian languages)  F.J. Staal, Word Order in Sanskrit and Universal Grammar, Springer, 1967 • English: switched from Old English (transitional between SOV and SVO) to Middle English (SVO)  J. McLaughlin, Old English Syntax: a handbook, Walter de Gruyter, 1983. Syntactic Parameters are Dynamical in Language Evolution Matilde Marcolli Geometry, Physics, Linguistics Spin Glass Models of Syntax • Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 – focus on linguistic change caused by language interactions – think of syntactic parameters as spin variables – spin interaction tends to align (ferromagnet) – strength of interaction proportional to bilingualism (MediaLab) – role of temperature parameter: probabilistic interpretation of parameters – not all parameters are independent: entailment relations – Metropolis–Hastings algorithm: simulate evolution Matilde Marcolli Geometry, Physics, Linguistics The Ising Model of spin systems on a graph G • conﬁgurations of spins s : V (G) ! {±1} • magnetic ﬁeld B and correlation strength J: Hamiltonian H(s) = J X e2E(G):@(e)={v,v0} sv sv0 B X v2V (G) sv • ﬁrst term measures degree of alignment of nearby spins • second term measures alignment of spins with direction of magnetic ﬁeld Matilde Marcolli Geometry, Physics, Linguistics Equilibrium Probability Distribution • Partition Function ZG ( ) ZG ( ) = X s:V (G)!{±1} exp( H(s)) • Probability distribution on the conﬁguration space: Gibbs measure PG, (s) = e H(s) ZG ( ) • low energy states weight most • at low temperature (large ): ground state dominates; at higher temperature ( small) higher energy states also contribute Matilde Marcolli Geometry, Physics, Linguistics Average Spin Magnetization MG ( ) = 1 #V (G) X s:V (G)!{±1} X v2V (G) sv P(s) • Free energy FG ( , B) = log ZG ( , B) MG ( ) = 1 #V (G) 1 ✓ @FG ( , B) @B ◆ B=0 Ising Model on a 2dimensional lattice • 9 critical temperature T = Tc where phase transition occurs • for T > Tc equilibrium state has m(T) = 0 (computed with respect to the equilibrium Gibbs measure PG, • demagnetization: on average as many up as down spins • for T < Tc have m(T) > 0: spontaneous magnetization Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters and Ising/Potts Models • characterize set of n = 2N languages Li by binary strings of N syntactic parameters (Ising model) • or by ternary strings (Potts model) if take values ±1 for parameters that are set and 0 for parameters that are not deﬁned in a certain language • a system of n interacting languages = graph G with n = #V (G) • languages Li = vertices of the graph (e.g. language that occupies a certain geographic area) • languages that have interaction with each other = edges E(G) (geographical proximity, or high volume of exchange for other reasons) Matilde Marcolli Geometry, Physics, Linguistics graph of language interaction (detail) from Global Language Network of MIT MediaLab, with interaction strengths Je on edges based on number of book translations (or Wikipedia edits) Matilde Marcolli Geometry, Physics, Linguistics • if only one syntactic parameter, would have an Ising model on the graph G: conﬁgurations s : V (G) ! {±1} set the parameter at all the locations on the graph • variable interaction energies along edges (some pairs of languages interact more than others) • magnetic ﬁeld B and correlation strength J: Hamiltonian H(s) = X e2E(G):@(e)={v,v0} NX i=1 Je sv,i sv0,i • if N parameters, conﬁgurations s = (s1, . . . , sN) : V (G) ! {±1}N • if all N parameters are independent, then it would be like having N noninteracting copies of a Ising model on the same graph G (or N independent choices of an initial state in an Ising model on G) Matilde Marcolli Geometry, Physics, Linguistics Metropolis–Hastings • detailed balance condition P(s)P(s ! s0) = P(s0)P(s0 ! s) for probabilities of transitioning between states (Markov process) • transition probabilities P(s ! s0) = ⇡A(s ! s0) · ⇡(s ! s0) with ⇡(s ! s0) conditional probability of proposing state s0 given state s and ⇡A(s ! s0) conditional probability of accepting it • Metropolis–Hastings choice of acceptance distribution (Gibbs) ⇡A(s ! s0 ) = ⇢ 1 if H(s0) H(s) 0 exp( (H(s0) H(s))) if H(s0) H(s) > 0. satisfying detailed balance • selection probabilities ⇡(s ! s0) singlespinﬂip dynamics • ergodicity of Markov process ) unique stationary distribution Matilde Marcolli Geometry, Physics, Linguistics Example: Single parameter dynamics SubjectVerb parameter Initial conﬁguration: most languages in SSWL have +1 for SubjectVerb; use interaction energies from MediaLab data Matilde Marcolli Geometry, Physics, Linguistics Equilibrium: low temperature all aligned to +1; high temperature: Temperature: ﬂuctuations in bilingual users between di↵erent structures (“codeswitching” in Linguistics) Matilde Marcolli Geometry, Physics, Linguistics Entailment relations among parameters • Example: {p1, p2} = {Strong Deixis, Strong Anaphoricity} p1 p2 `1 +1 +1 `2 1 0 `3 +1 +1 `4 +1 1 {`1, `2, `3, `4} = {English, Welsh, Russian, Bulgarian} Matilde Marcolli Geometry, Physics, Linguistics Modeling Entailment • variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} and Y`,p2 = S`,p2  2 {0, 1} • Hamiltonian H = HE + HV HE = Hp1 + Hp2 = X `,`02languages J``0 ⇣ S`,p1 ,S`0,p1 + S`,p2 ,S`0,p2 ⌘ HV = X ` HV ,` = X ` J` X`,p1 ,Y`,p2 J` > 0 antiferromagnetic • two parameters: temperature as before and coupling energy of entailment • if freeze p1 and evolution for p2: Potts model with external magnetic ﬁeld Matilde Marcolli Geometry, Physics, Linguistics Acceptance probabilities ⇡A(s ! s ± 1 (mod 3)) = ⇢ 1 if H 0 exp( H) if H > 0. H := min{H(s + 1 (mod 3)), H(s 1 (mod 3))} H(s) Equilibrium conﬁguration (p1, p2) HT/HE HT/LE LT/HE LT/LE `1 (+1, 0) (+1, 1) (+1, +1) (+1, 1) `2 (+1, 1) ( 1, 1) (+1, +1) (+1, 1) `3 ( 1, 0) ( 1, +1) (+1, +1) ( 1, 0) `4 (+1, +1) ( 1, 1) (+1, +1) ( 1, 0) Matilde Marcolli Geometry, Physics, Linguistics Average value of spin p1 left and p2 right in low entailment energy case Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters in Kanerva Networks • Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 – Address two issues: relative prevalence of di↵erent syntactic parameters and “degree of recoverability” (as sign of underlying relations between parameters) – If corrupt information about one parameter in data of group of languages can recover it from the data of the other parameters? – Answer: di↵erent parameters have di↵erent degrees of recoverability – Used 21 parameters and 165 languages from SSWL database Matilde Marcolli Geometry, Physics, Linguistics Kanerva networks (sparse distributed memories) • P. Kanerva, Sparse Distributed Memory, MIT Press, 1988. • ﬁeld F2 = {0, 1}, vector space FN 2 large N • uniform random sample of 2k hard locations with 2k << 2N • median Hamming distance between hard locations • Hamming spheres of radius slightly larger than median value (access sphere) • writing to network: storing datum X 2 FN 2 , each hard location in access sphere of X gets ith coordinate (initialized at zero) incremented depending on ith entry ot X • reading at a location: ith entry determined by majority rule of ith entries of all stored data in hard locations within access sphere Kanerva networks are good at reconstructing corrupted data Matilde Marcolli Geometry, Physics, Linguistics Procedure • 165 data points (languages) stored in a Kanerva Network in F21 2 (choice of 21 parameters) • corrupting one parameter at a time: analyze recoverability • language bitstring with a single corrupted bit used as read location and resulting bit string compared to original bitstring (Hamming distance) • resulting average Hamming distance used as score of recoverability (lowest = most easily recoverable parameter) Matilde Marcolli Geometry, Physics, Linguistics Parameters and frequencies 01 SubjectVerb (0.64957267) 02 VerbSubject (0.31623933) 03 VerbObject (0.61538464) 04 ObjectVerb (0.32478634) 05 SubjectVerbObject (0.56837606) 06 SubjectObjectVerb (0.30769232) 07 VerbSubjectObject (0.1923077) 08 VerbObjectSubject (0.15811966) 09 ObjectSubjectVerb (0.12393162) 10 ObjectVerbSubject (0.10683761) 11 AdpositionNounPhrase (0.58974361) 12 NounPhraseAdposition (0.2905983) 13 AdjectiveNoun (0.41025642) 14 NounAdjective (0.52564102) 15 NumeralNoun (0.48290598) 16 NounNumeral (0.38034189) 17 DemonstrativeNoun (0.47435898) 18 NounDemonstrative (0.38461539) 19 PossessorNoun (0.38034189) 20 NounPossessor (0.49145299) A01 AttributiveAdjectiveAgreement (0.46581197) Matilde Marcolli Geometry, Physics, Linguistics Matilde Marcolli Geometry, Physics, Linguistics Overall e↵ect related to relative prevalence of a parameter Matilde Marcolli Geometry, Physics, Linguistics More reﬁned e↵ect after normalizing for prelavence (syntactic dependencies) Matilde Marcolli Geometry, Physics, Linguistics • Overall e↵ect relating recoverability in a Kanerva Network to prevalence of a certain parameter among languages (depends only on frequencies: see in random data with assigned frequencies) • Additional e↵ects (that deviate from random case) which detect possible dependencies among syntactic parameters: increased recoverability beyond what e↵ect based on frequency • Possible neuroscience implications? Kanerva Networks as models of human memory (parameter prevalence linked to neuroscience models) • More reﬁned data if divided by language families? Matilde Marcolli Geometry, Physics, Linguistics Phylogenetic Linguistics (WORK IN PROGRESS) • Constructing family trees for languages (sometimes possibly graphs with loops) • Main information about subgrouping: shared innovation a speciﬁc change with respect to other languages in the family that only happens in a certain subset of languages  Example: among Mayan languages: Huastecan branch characterized by initial w becoming voiceless before a vowel and ts becoming t, q becoming k, ... Quichean branch by velar nasal becoming velar fricative, ´c becoming ˇc (prepalatal a↵ricate to palatoalveolar)... Known result by traditional Historical Linguistics methods: Matilde Marcolli Geometry, Physics, Linguistics Mayan Language Tree Matilde Marcolli Geometry, Physics, Linguistics Computational Methods for Phylogenetic Linguistics • Peter Foster, Colin Renfrew, Phylogenetic methods and the prehistory of languages, McDonald Institute Monographs, 2006 • Several computational methods for constructing phylogenetic trees available from mathematical and computational biology • Phylogeny Programs http://evolution.genetics.washington.edu/phylip/software.html • Standardized lexical databases: Swadesh list (100 words, or 207 words) Matilde Marcolli Geometry, Physics, Linguistics • Use Swadesh lists of languages in a given family to look for cognates:  without additional etymological information (keep false positives)  with additional etymological information (remove false positives) • Two further choices about loan words:  remove loan words  keep loan words • Keeping loan words produces graphs that are not trees • Without loan words it should produce trees, but small loops still appear due to ambiguities (di↵erent possible trees matching same data) ... more precisely: coding of lexical data ... Matilde Marcolli Geometry, Physics, Linguistics Coding of lexical data • After compiling lists of cognate words for pairs of languages within a given family (with/without lexical information and loan words) • Produce a binary string S(L1, L2) = (s1, . . . , sN) for each pair of languages L1, L2, with entry 0 or 1 at the ith word of the lexical list of N words if cognates for that meaning exist in the two languages or not (important to pay attention to synonyms) • lexical Hamming distance between two languages d(L1, L2) = #{i 2 {1, . . . , N}  si = 1} counts words in the list that do not have cognates in L1 and L2 Matilde Marcolli Geometry, Physics, Linguistics Distancematrix method of phylogenetic inference • after producing a measure of “genetic distance” Hamming metric dH(La, Lb) • hierarchical data clustering: collecting objects in clusters according to their distance • simplest method of tree construction: neighbor joining (1)  create a (leaf) vertex for each index a (ranging over languages in given family) (2)  given distance matrix D = (Dab) distances between each pair Dab = dH(La, Lb) construct a new matrix Qtest Q = (Qab) with Qab = (n 2)Dab nX k=1 Dak nX k=1 Dbk this matrix Q decides ﬁrst pairs of vertices to join Matilde Marcolli Geometry, Physics, Linguistics (3)  identify entries Qab with lowest values: join each such pair (a, b) of leaf vertices to a newly created vertex vab (4)  set distances to new vertex by d(a, vab) = 1 2 Dab + 1 2(n 2) nX k=1 Dak nX k=1 Dbk ! d(b, vab) = Dab d(a, vab) d(k, vab) = 1 2 (Dak + Dbk Dab) (5)  remove a and b and keep vab and all the remaining vertices and the new distances, compute new Q matrix and repeat until tree is completed Matilde Marcolli Geometry, Physics, Linguistics NeighborhoodJoining Method for Phylogenetic Inference Matilde Marcolli Geometry, Physics, Linguistics Example of a neighborjoining lexical linguistic phylogenetic tree from DelmestriCristianini’s paper Matilde Marcolli Geometry, Physics, Linguistics N. Saitou, M. Nei, The neighborjoining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol. Vol.4 (1987) N. 4, 406425. R. Mihaescu, D. Levy, L. Pachter, Why neighborjoining works, arXiv:cs/0602041v3 A. Delmestri, N. Cristianini, Linguistic Phylogenetic Inference by PAMlike Matrices, Journal of Quantitative Linguistics, Vol.19 (2012) N.2, 95120. F. Petroni, M. Serva, Language distance and tree reconstruction, J. Stat. Mech. (2008) P08012 Matilde Marcolli Geometry, Physics, Linguistics Syntactic Phylogenetic Trees (instead of lexical) • instead of coding lexical data based on cognate words, use binary variables of syntactic parameters • Hamming distance between binary string of parameter values • shown recently that one gets an accurate reconstruction of the phylogenetic tree of IndoEuropean languages from syntactic parameters only • G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, Towards a syntactic phylogeny of modern IndoEuropean languages, Journal of Historical Linguistics 3 (2013) N.1, 122–152. • G. Longobardi, C. Guardiano, Evidence for syntax as a signal of historical relatedness, Lingua 119 (2009) 1679–1706. Matilde Marcolli Geometry, Physics, Linguistics Work in Progress • Sharjeel Aziz, VyLuan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you – Assembled a phylogenetic tree of world languages using the SSWL database of syntactic parameters – Ongoing comparison with speciﬁc historical linguistic reconstruction of phylogenetic trees – Comparison with Computational Linguistic reconstructions based on lexical data (Swadesh lists) and on phonetical analysis – not all linguistic families have syntactic parameters mapped with same level of completeness... di↵erent levels of accuracy in reconstruction Matilde Marcolli Geometry, Physics, Linguistics
Random Geometry/Homology (chaired by Laurent Decreusefond/Frédéric Chazal)
Keywords = Extreme values, Poisson point process, Random tessellations
Abstract
Random tessellations Main problem Extremal index The extremal index for a random tessellation Nicolas Chenavier Université Littoral Côte d’Opale October 28, 2015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Plan 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Random tessellations Deﬁnition A (convex) random tessellation m in Rd is a partition of the Euclidean space into random polytopes (called cells). We will only consider the particular case where m is a : PoissonVoronoi tessellation ; PoissonDelaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index PoissonVoronoi tessellation X, Poisson point process in Rd ; ∀x ∈ X, CX(x) := {y ∈ Rd , y − x ≤ y − x , x ∈ X} (Voronoi cell with nucleus x) ; mPVT := {CX(x), x ∈ X}, PoissonVoronoi tessellation ; ∀CX(x) ∈ mPVT , we let z(CX(x)) := x. x CX(x) Mosaique de PoissonVoronoi Figure: PoissonVoronoi tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index PoissonDelaunay tessellation X, Poisson point process in Rd ; ∀x, x ∈ X, x and x deﬁne an edge if CX(x) ∩ CX(x ) = ∅ ; mPDT , PoissonDelaunay tessellation ; ∀C ∈ mPDT , we let z(C) as the circumcenter of C. x x z(C) Mosaique de PoissonDelaunay Figure: PoissonDelaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Typical cell Deﬁnition Let m be a stationary random tessellation. The typical cell of m is a random polytope C in Rd which distribution given as follows : for each bounded translationinvariant function g : {polytopes} → R, we have E [g(C)] := 1 N(B) E C∈m, z(C)∈B g(C) , where : B ⊂ R is any Borel subset with ﬁnite and nonempty volume ; N(B) is the mean number of cells with nucleus in B. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Main problem Framework : m = mPVT , mPDT ; Wρ := [0, ρ]d , with ρ > 0 ; g : {polytopes} → R, geometrical characteristic. Aim : asymptotic behaviour, when ρ → ∞, of Mg,ρ = max C∈m, z(C)∈Wρ g(C)? Figure: Voronoi cell maximizing the area in the square. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Objective and applications Objective : ﬁnd ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρt + bg,ρ converges, as ρ → ∞, for each t ∈ R. Applications : regularity of the tessellation ; discrimination of point processes and tessellations ; PoissonVoronoi approximation. Approximation de PoissonVoronoi Figure: PoissonVoronoi approximation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Asymptotics under a local correlation condition Notation : let vρ := ag,ρt + bρ be a threshold such that ρd · P (g(C) > vρ) −→ ρ→∞ τ, for some τ := τ(t) ≥ 0. Local Correlation Condition (LCC) ρd (log ρ)d · E (C1,C2)=∈m2, z(C1),z(C2)∈[0,log ρ]d 1g(C1)>vρ,g(C2)>vρ −→ ρ→∞ 0. Theorem Under (LCC), we have : P (Mg,ρ ≤ vρ) −→ ρ→∞ e−τ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Deﬁnition of the extremal index Proposition Assume that for all τ ≥ 0, there exists a threshold v (τ) ρ depending on ρ such that ρd · P(g(C) > v (τ) ρ ) −→ ρ→∞ τ. Then there exists θ ∈ [0, 1] such that, for all τ ≥ 0, lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ , provided that the limit exists. Deﬁnition According to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if, for each τ ≥ 0, we have : ρd · P g(C) > v(τ) ρ −→ ρ→∞ τ and lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 1 Framework : m := mPVT : PoissonVoronoi tessellation ; g(C) := r(C) : inradius of any cell C := CX(x) with x ∈ X, i.e. r(C) := r (CX(x)) := max{r ∈ R+ : B(x, r) ⊂ CX(x)}. rmin,PVT (ρ) := minx∈X∩Wρ r (CX(x)). Extremal index : θ = 1/2 for each d ≥ 1. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Minimum of inradius for a PoissonVoronoi tessellation (b) Typical Poisson−Voronoï cell with a small inradii x y −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 2 Framework : m := mPDT : PoissonDelaunay tessellation ; g(C) := R(C) : circumradius of any cell C, i.e. R(C) := min{r ∈ R+ : B(x, r) ⊃ C}. Rmax,PDT (ρ) := maxC∈mPDT :z(C)∈Wρ R(C). Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Maximum of circumradius for a PoissonDelaunay tessellation (d) Typical Poisson−Delaunay cell with a large circumradii x y −15 −10 −5 0 5 10 15 −15−10−5051015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Work in progress Joint work with C. Robert (ISFA, Lyon 1) : new characterization of the extremal index (not based on classical block and run estimators appearing in the classical Extreme Value Theory) ; simulation and estimation for the extremal index and cluster size distribution (for PoissonVoronoi and PoissonDelaunay tessellations). Nicolas Chenavier The extremal index for a random tessellation
Keywords =
Abstract
A testing procedure A model for colocalization Estimation A twocolor interacting random balls model for colocalization analysis of proteins. Frédéric Lavancier, Laboratoire de Mathématiques Jean Leray, Nantes INRIA Rennes, Serpico team Joint work with C. Kervrann (INRIA Rennes, Serpico team). GSI’15, 2830 October 2015. A testing procedure A model for colocalization Estimation Introduction : some data Vesicular traﬃcking analysis and colocalization quantiﬁcation by TIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA] ? =⇒ Langerin proteins (left) and Rab11 GTPase proteins (right). Is there colocalization ? ⇔ Is there some spatial dependencies between the two types of proteins ? A testing procedure A model for colocalization Estimation Image preprocessing After segmentation Superposition : ? ⇒ After a Gaussian weights thresholding Superposition : ? ⇒ A testing procedure A model for colocalization Estimation The problem of colocalization can be described as follows : We observe two binary images in a domain Ω : First image (green) : realization of a random set Γ1 ∩ Ω Second image (red) : realization of a random set Γ2 ∩ Ω −→ Is there some dependencies between Γ1 and Γ2 ? −→ If so, can we quantify/model this dependency ? A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A testing procedure A model for colocalization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A natural measure of departure from independency is ˆp12 − ˆp1 ˆp2 where ˆp1 = Ω−1 x∈Ω 1Γ1 (x), ˆp2 = Ω−1 x∈Ω 1Γ2 (x), ˆp12 = Ω−1 x∈Ω 1Γ1∩Γ2 (x). A testing procedure A model for colocalization Estimation Testing procedure Assume Γ1 and Γ2 are mdependent stationary random sets. If Γ1 is independent of Γ2, then as Ω tends to inﬁnity, T := Ω ˆp12 − ˆp1 ˆp2 x∈Ω y∈Ω ˆC1(x − y) ˆC2(x − y) → N(0, 1) where ˆC1 and ˆC2 are the empirical covariance functions of Γ1 ∩ Ω and Γ2 ∩ Ω respectively. Hence to test the null hypothesis of independence between Γ1 and Γ2 pvalue = 2(1 − Φ(T)) where Φ is the c.d.f. of the standard normal distribution. A testing procedure A model for colocalization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls A testing procedure A model for colocalization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls Independent case (and each color ∼ Poisson) Number of p−values < 0.05 over 100 realizations : 4. A testing procedure A model for colocalization Estimation Some simulations Dependent case (see later for the model) Number of p−values < 0.05 over 100 realizations : 100. A testing procedure A model for colocalization Estimation Some simulations Independent case, larger radii Number of p−values < 0.05 over 100 realizations : 5. A testing procedure A model for colocalization Estimation Some simulations Dependent case, larger radii and "small" dependence Number of p−values < 0.05 over 100 realizations : 97. A testing procedure A model for colocalization Estimation Real Data Depending on the preprocessing : T = 9.9 T = 17 p − value = 0 p − value = 0 A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a twotype (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a twotype (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. Notation : (ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}. → viewed as a marked point, marked by R and i. xi : collection of all marked points with color i. Hence Γi = (ξ,R)i∈xi (ξ, R)i x = x1 ∪ x2 : collection of all marked points. A testing procedure A model for colocalization Estimation Example : three realizations of the reference process A testing procedure A model for colocalization Estimation The model We consider a density on any bounded domain Ω with respect to the reference model f(x) ∝ zn1 1 zn2 2 eθ Γ1∩ Γ2 where n1 : number of green balls and n2 : number of red balls. This density depends on 3 parameters z1 : rules the mean number of green balls z2 : rules the mean number of red balls θ : interaction parameter. If θ > 0 : attraction (colocalization) between Γ1 and Γ2 If θ = 0 : back to the reference model, up to the intensities (independence between Γ1 and Γ2). A testing procedure A model for colocalization Estimation Simulation Realizations can be generated by a standard birthdeath MetropolisHastings algorithm. Examples : A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ Γ1∩ Γ2 , where c(z1, z2, θ) is the normalizing constant. A testing procedure A model for colocalization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ Γ1∩ Γ2 , where c(z1, z2, θ) is the normalizing constant. Issue : The number of balls n1 and n2 is not observed. ⇒ likelihood or pseudolikelihood based inference is not feasible. = A testing procedure A model for colocalization Estimation An equilibrium equation Consider, for any nonnegative function h, C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) and for i = 1, 2, Ii(θ; h) = Rmax Rmin Ω h((ξ, R)i, x) λ((ξ, R)i, x) 2zi dξ µ(dR). Denoting by z∗ 1 , z∗ 2 and θ∗ the true unknown values of the parameters, we know from the GeorgiiNguyenZessin equation that for any h E(C(z∗ 1 , z∗ 2 , θ∗ ; h)) = 0. A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must ﬁnd test functions hk such that S(h) is computable A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must ﬁnd test functions hk such that S(h) is computable How many ? At least K = 3 because 3 parameters to estimate. A testing procedure A model for colocalization Estimation A ﬁrst possibility : h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} where S(ξ, R) is the sphere {y, y − ξ = R}. ⇓ ⇓ ⇓ ⇓ A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = ⇒ S(h1) = P(Γ1) (the perimeter of Γ1) A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). Let h3((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2)c then S(h3) = P(Γ1 ∪ Γ2). A testing procedure A model for colocalization Estimation Simulations with test functions h1, h2 and h3 over 100 realizations θ = 0.2 (and small radii) θ = 0.05 (and large radii) Frequency 0.15 0.20 0.25 0.30 05101520 Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 010203040 A testing procedure A model for colocalization Estimation Real Data We assume the law of the radii is uniform on [Rmin, Rmax]. (each image is embedded in [0, 250] × [0, 280]) Rmin = 0.5, Rmax = 2.5 Rmin = 0.5, Rmax = 10 ˆθ = 0.45 ˆθ = 0.03 A testing procedure A model for colocalization Estimation Conclusion The testing procedure allows to detect colocalization between two binary images is easy and fast to implement does not depend too much on the image preprocessing The model for colocalization relies on geometric features (area of intersection) can be ﬁtted by the TakacsFiksel method allows to compare the degree of colocalization θ between two pairs of images if the laws of radii are similar
Keywords = Ginibre point process, Poisson point process, Stein’s method, Stochastic geometry, βGinibre point process
Abstract
IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications 2nd conference on Geometric Science of Information Aurélien VASSEUR Asymptotics of some Point Processes Transformations Ecole Polytechnique, ParisSaclay, October 28, 2015 1/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Mobile network in Paris  Motivation −2000 0 2000 4000 100020003000 −2000 0 2000 4000 100020003000 Figure: On the left, positions of all BS in Paris. On the right, locations of BS for one frequency band. 2/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Table of Contents IGeneralities on point processes Correlation function, Papangelou intensity and repulsiveness Determinantal point processes IIKantorovichRubinstein distance Convergence dened by dKR dKR(PPP, Φ) ≤ "nice" upper bound IIIApplications to transformations of point processes Superposition Thinning Rescaling 3/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Framework Y a locally compact metric space µ a diuse and locally nite measure of reference on Y NY the space of congurations on Y NY the space of nite congurations on Y 4/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Correlation function  Papangelou intensity Correlation function ρ of a point process Φ: E[ α∈NY α⊂Φ f (α)] = +∞ k=0 1 k! ˆ Yk f · ρ({x1, . . . , xk})µ(dx1) . . . µ(dxk) ρ(α) ≈ probability of nding a point in at least each point of α Papangelou intensity c of a point process Φ: E[ x∈Φ f (x, Φ \ {x})] = ˆ Y E[c(x, Φ)f (x, Φ)]µ(dx) c(x, ξ) ≈ conditionnal probability of nding a point in x given ξ 5/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Point process Properties Intensity measure: A ∈ FY → ´ A ρ({x})µ(dx) ρ({x}) = E[c(x, Φ)] If Φ is nite, then: IP(Φ = 1) = ˆ Y c(x, ∅)µ(dx) IP(Φ = 0). 6/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Poisson point process Properties Φ PPP with intensity M(dy) = m(y)dy Correlation function: ρ(α) = x∈α m(x) Papangelou intensity: c(x, ξ) = m(x) 7/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Repulsive point process Denition Point process repulsive if φ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ) Point process weakly repulsive if c(x, ξ) ≤ c(x, ∅) 8/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Determinantal point process Denition Determinantal point process DPP(K, µ): ρ({x1, · · · , xk}) = det(K(xi , xj ), 1 ≤ i, j ≤ k) Proposition Papangelou intensity of DPP(K, µ): c(x0, {x1, · · · , xk}) = det(J(xi , xj ), 0 ≤ i, j ≤ k) det(J(xi , xj ), 1 ≤ i, j ≤ k) where J = (I − K)−1K. 9/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Ginibre point process Denition Ginibre point process on B(0, R): K(x, y) = 1 π e−1 2 (x2 +y2 ) exy 1{x∈B(0,R)}1{y∈B(0,R)} βGinibre point process on B(0, R): Kβ(x, y) = 1 π e − 1 2β (x2 +y2 ) e 1 β xy 1{x∈B(0,R)} 1{y∈B(0,R)} 10/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process βGinibre point processes 11/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications KantorovichRubinstein distance Total variation distance: dTV(ν1, ν2) := sup A∈FY ν1(A),ν2(A)<∞ ν1(A) − ν2(A) F : NY → IR is 1Lipschitz (F ∈ Lip1) if F(φ1) − F(φ2) ≤ dTV (φ1, φ2) for all φ1, φ2 ∈ NY KantorovichRubinstein distance: dKR(IP1, IP2) = sup F∈Lip1 ˆ NY F(φ) IP1(dφ) − ˆ NY F(φ) IP2(dφ) Convergence in K.R. distance =⇒ strictly Convergence in law 12/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Upper bound theorem Theorem (L. Decreusefond, AV) Φ a nite point process on Y ζM a PPP with nite control measure M(dy) = m(y)µ(dy). Then, we have: dKR(IPΦ, IPζM ) ≤ ˆ Y ˆ NY m(y) − c(y, φ)IPΦ(dφ)µ(dy). 13/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Superposition of weakly repulsive point processes Φn,1, . . . , Φn,n: n independent, nite and weakly repulsive point processes on Y Φn := n i=1 Φn,i Rn := ´ Y  n i=1 ρn,i (x) − m(x)µ(dx) ζM a PPP with control measure M(dx) = m(x)µ(dx) 14/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Superposition of weakly repulsive point processes Proposition (LD, AV) Φn = n i=1 Φn,i ζM a PPP with control measure M(dx) = m(x)µ(dx) dKR(IPΦn , IPζM ) ≤ Rn + max 1≤i≤n ˆ Y ρn,i (x)µ(dx) 15/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Consequence Corollary (LD, AV) f pdf on [0; 1] such that f (0+) := limx→0+ f (x) ∈ IR Λ compact subset of IR+ X1, . . . , Xn i.i.d. with pdf fn = 1 n f (1 n ·) Φn = {X1, . . . , Xn} ∩ Λ dKR(Φn, ζ) ≤ ˆ Λ f 1 n x − f (0+) dx + 1 n ˆ Λ f 1 n x dx where ζ is the PPP(f (0+)) reduced to Λ. 16/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning βGinibre point processes Proposition (LD, AV) Φn the βnGinibre process reduced to a compact set Λ ζ the PPP with intensity 1/π on Λ dKR(IPΦn , IPζ) ≤ Cβn 17/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Kallenberg's theorem Theorem (O. Kallenberg) Φn a nite point process on Y pn : Y → [0; 1) uniformly −−−−−→ 0 Φn the pnthinning of Φn γM a Cox process (pnΦn) law −−→ M ⇐⇒ (Φn) law −−→ γM 18/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Polish distance (fn) a sequence in the space of real continuous functions with compact support generating FY d∗(ν1, ν2) = n≥1 1 2n Ψ(ν1(fn) − ν2(fn)) with Ψ(x) = x 1 + x d∗ KR the KantorovichRubinstein distance associated to the distance d∗ 19/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Thinned point processes Proposition (LD, AV) Φn a nite point process on Y pn : Y → [0; 1) Φn the pnthinning of Φn γM a Cox process Then, we have: d∗ KR(IPΦn , IPγM ) ≤ 2E[ x∈Φn p2 n(x)] + d∗ KR(IPM, IPpnΦn ). 20/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning References L.Decreusefond, and A.Vasseur, Asymptotics of superposition of point processes, 2015. H.O. Georgii, and H.J. Yoo, Conditional intensity and gibbsianness of determinantal point processes, J. Statist. Phys. (118), January 2004. J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P. Martins, and Wei Chen, A Case Study on Regularity in Cellular Network Deployment, IEEE Wireless Communications Letters, 2015. A.F. Karr, Point Processes and their Statistical Inference, Ann. Probab. 15 (1987), no. 3, 12261227. 21/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Thank you ... ... for your attention. Questions? 22/22 Aurélien VASSEUR Télécom ParisTech
Keywords =
Abstract
Asymptotic properties of random polytopes Pierre Calka 2nd conference on Geometric Science of Information ´Ecole Polytechnique, ParisSaclay, 28 October 2015 default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Joint work with Joseph Yukich (Lehigh University, USA) & Tomasz Schreiber (Toru´n University, Poland) default Outline Random polytopes: an overview Uniform polytopes Gaussian polytopes Expectation asymptotics Main results: variance asymptotics Sketch of proof: Gaussian case default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K50, K ball K50, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K100, K ball K100, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K500, K ball K500, K square default Uniform polytopes Poissonian model K := convex body of Rd Pλ, λ > 0:= Poisson point process of intensity measure λdx Kλ := Conv(Pλ ∩ K) K500, K ball K500, K square default Gaussian polytopes Binomial model Φd (x) := 1 (2π)d/2 e− x 2/2, x ∈ Rd, d ≥ 2 (Xk, k ∈ N∗):= independent and with density Φd Kn := Conv(X1, · · · , Xn) Poissonian model Pλ, λ > 0:= Poisson point process of intensity measure λΦd(x)dx Kλ := Conv(Pλ) default Gaussian polytopes K50 K100 K500 default Gaussian polytopes: spherical shape K50 K100 K500 default Asymptotic spherical shape of the Gaussian polytope Geﬀroy (1961) : dH(Kn, B(0, 2 log(n))) → n→∞ 0 a.s. K50000 default Expectation asymptotics Considered functionals fk(·) := number of kdimensional faces, 0 ≤ k ≤ d Vol(·) := volume B. Efron’s relation (1965): Ef0(Kn) = n 1 − EVol(Kn−1) Vol(K) Uniform polytope, K smooth E[fk(Kλ)] ∼ λ→∞ cd,k ∂K κ 1 d+1 s ds λ d−1 d+1 κs := Gaussian curvature of ∂K Uniform polytope, K polytope E[fk(Kλ)] ∼ λ→∞ c′ d,kF(K) logd−1 (λ) F(K) := number of ﬂags of K Gaussian polytope E[fk(Kλ)] ∼ λ→∞ c′′ d,k log d−1 2 (λ) A. R´enyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Aﬀentranger & R. Schneider (1992) default Outline Random polytopes: an overview Main results: variance asymptotics Uniform model, K smooth Uniform model, K polytope Gaussian model Sketch of proof: Gaussian case default Uniform model, K smooth K := convex body of Rd with volume 1 and with a C3 boundary κ := Gaussian curvature of ∂K lim λ→∞ λ−(d−1)/(d+1) Var[fk(Kλ)] = ck,d ∂K κ(z)1/(d+1) dz lim λ→∞ λ(d+3)/(d+1) Var [Vol(Kλ)] = c′ d ∂K κ(z)1/(d+1) dz (ck,d , c′ d explicit positive constants) M. Reitzner (2005): Var[fk (Kλ)] = Θ(λ(d−1)/(d+1) ) default Uniform model, K polytope K := simple polytope of Rd with volume 1 i.e. each vertex of K is included in exactly d facets. lim λ→∞ log−(d−1) (λ)Var[fk(Kλ)] = cd,kf0(K) lim λ→∞ λ2 log−(d−1) (λ)Var[Vol(Kλ)] = c′ d,k f0(K) (ck,d , c′ k,d explicit positive constants) I. B´ar´any & M. Reitzner (2010): Var[fk (Kλ)] = Θ(log(d−1) (λ)) default Gaussian model lim λ→∞ log− d−1 2 (λ)Var[fk(Kλ)] = ck,d lim λ→∞ log−k+ d+3 2 (λ)Var[Vol(Kλ)] = c′ k,d E Vol(Kλ) Vol(B(0, 2 log(n))) = λ→∞ 1 − d log(log(λ)) 4 log(λ) + O 1 log(λ) (ck,d , c′ k,d explicit positive constants) D. Hug & M. Reitzner (2005), I. B´ar´any & V. Vu (2007): Var[fk (Kλ)] = Θ(log(d−1)/2 (λ)) default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Calculation of the expectation of fk(Kλ) Calculation of the variance of fk(Kλ) Scaling transform default Calculation of the expectation of fk(Kλ) 1. Decomposition: E[fk(Kλ)] = E x∈Pλ ξ(x, Pλ) ξ(x, Pλ) := 1 k+1 #kface containing x if x extreme 0 if not 2. MeckeSlivnyak formula E[fk(Kλ)] = λ E[ξ(x, Pλ ∪ {x})]Φd (x)dx 3. Limit of the expectation of one score default Calculation of the variance of fk(Kλ) Var[fk (Kλ)] = E x∈Pλ ξ2 (x, Pλ) + x=y∈Pλ ξ(x, Pλ)ξ(y, Pλ) − (E[fk (Kλ)]) 2 = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 E[ξ(x, Pλ ∪ {x, y})ξ(y, Pλ ∪ {x, y})]Φd (x)Φd (y)dxdy − λ2 E[ξ(x, Pλ ∪ {x})]E[ξ(y, Pλ ∪ {y})]Φd (x)Φd (y)dxdy = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 ”Cov”(ξ(x, Pλ ∪ {x}), ξ(y, Pλ ∪ {y}))Φd (x)Φd (y)dxdy default Scaling transform Question : Limits of E[ξ(x, Pλ)] and ”Cov”(ξ(x, Pλ), ξ(y, Pλ)) ? Answer : deﬁnition of limit scores in a new space ◮ Critical radius Rλ := 2 log λ − log(2 · (2π)d · log λ) ◮ Scaling transform : Tλ : Rd \ {0} −→ Rd−1 × R x −→ Rλ exp−1 d−1 x x, R2 λ(1 − x Rλ ) expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1 ◮ Image of a score : ξ(λ)(Tλ(x), Tλ(Pλ)) := ξ(x, Pλ) ◮ Convergence of Pλ : Tλ(Pλ) D → P o`u P : Poisson point process in Rd−1 × R of intensity measure ehdvdh default Action of the scaling transform Π↑ := {(v, h) ∈ Rd−1 × R : h ≥ v 2 2 } Π↓ := {(v, h) ∈ Rd−1 × R : h ≤ − v 2 2 } Halfspace Translate of Π↓ Sphere containing O Translate of ∂Π↑ Convexity Parabolic convexity Extreme point (x + Π↑) not fully covered kface of Kλ Parabolic kface RλVol Vol default Limiting picture Ψ := x∈P(x + Π↑) In red : image of the balls of diameter [0, x] where x is extreme default Limiting picture Φ := x∈Rd−1×R:x+Π↓∩P=∅(x + Π↓) In green : image of the boundary of the convex hull Kλ default Thank you for your attention!
Keywords =
Abstract
Asymmetric Topologies on Statistical Manifolds Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK GSI2015, October 28, 2015 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 1 / 16 Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 2 / 16 Sources and Consequences of Asymmetry Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 3 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Practically all other results have to be reconsidered (e.g. Baire category theorem, AlaogluBourbaki, etc). (Cobzas, 2013). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} D∗[x, 0] = ex − 1 − x, z Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} 0 /∈ Int(dom Eq⊗p{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Method: Symmetric Sandwich Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 8 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µM◦ ≤ µ(−M◦ ) ∨ µM◦ µ(−M)co ∧ µM ≤ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µ(−M◦ )co ∧ µM◦ ≤ µM◦ µM ≤ µ(−M) ∨ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ uϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ uϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Results Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 11 / 16 Results KL Induces Hausdorﬀ (T2) Asymmetric Topology Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is Hausdorﬀ. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results KL Induces Hausdorﬀ (T2) Asymmetric Topology Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is Hausdorﬀ. Proof. u ϕ+ ≤ uϕ (resp. x ϕ− ≤ xϕ) implies (Y, · ϕ) (resp. (X, · ∗ ϕ)) is ﬁner than normed space (Y, · ϕ+) (resp. (X, · ∗ ϕ−)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · ϕ) (resp. (X, · ∗ ϕ)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · ϕ) (resp. (X, · ∗ ϕ)). Proof. ϕ+(u) = (1 + u) ln(1 + u) − u ∈ ∆2 (resp. ϕ∗ −(x) = e−x − 1 + x ∈ ∆2). Note that ϕ− /∈ ∆2 and ϕ∗ + /∈ ∆2. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. 3 Right Ksequentially complete: right KCauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. 3 Right Ksequentially complete: right KCauchy yn ρ → y. Proof. ρs(y, z) = z − yϕ ∨ y − zϕ ≤ y − z ϕ−, where (Y, · ϕ−) is Banach. Then use theorems of Reilly et al. (1982) and Chen et al. (2007). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Other asymmetric information distances (e.g. Renyi divergence). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 References Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16 Results Borodin, P. A. (2001). The BanachMazur theorem for spaces with asymmetric norm. Mathematical Notes, 69(3–4), 298–305. Chen, S.A., Li, W., Zou, D., & Chen, S.B. (2007, Aug). Fixed point theorems in quasimetric spaces. In Machine learning and cybernetics, 2007 international conference on (Vol. 5, p. 24992504). IEEE. Cobzas, S. (2013). Functional analysis in asymmetric normed spaces. Birkh¨auser. Fletcher, P., & Lindgren, W. F. (1982). Quasiuniform spaces (Vol. 77). New York: Marcel Dekker. Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982). Cauchy sequences in quasipseudometric spaces. Monatshefte f¨ur Mathematik, 93, 127–140. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16
Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)
Keywords =
Abstract
Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of GoodnessofFit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodnessofﬁt testing in large sparse extended multinomial contexts 2 CressieRead power divergence λfamily  equivalent to Amari’s αfamily asymptotic properties of two test statistics: Pearson’s χ2test and deviance simulation study for other statistics within power divergence family 3 kasymptotics instead of Nasymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, HighDimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, ﬁnance, document classiﬁcation, astronomy and atmospheric science. continuous data  High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis  example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in LogLinear Models Despite the widespread usage of these [loglinear] models, the applicability and statistical properties of loglinear models under sparse settings are still very poorly understood. As a result, even though highdimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difﬁcult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodnessofﬁt test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary  example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary  theory Theorem For k > 1 and N ≥ 6, the ﬁrst three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for ﬁxed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Deﬁne the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i N∗ = N) ∼ Mult(N, πi) deﬁne S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i N∗ = N) ∼ Mult(N, πi) deﬁne S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) deﬁne ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions  Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of kasymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does kasymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp k i=0 ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for ﬁxed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for ﬁxed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 CressieRead recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 CressieRead, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= 1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= 3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary  key points 1 goodnessofﬁt testing in large sparse extended multinomial contexts 2 kasymptotics instead of Nasymptotics 3 CressieRead power divergence λfamily asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. AnayaIzquierdo, F. Critchley, and P. Marriott (2014): When are ﬁrst order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. AnayaIzquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in loglinear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efﬁciency for certain goodnesofﬁt tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling
Keywords = Computational information geometry, Computing boundaries, Embedded manifolds, Local mixture models, Polytopes, Ruled and developable surfaces
Abstract
Computing Boundaries in Local Mixture Models Computing Boundaries in Local Mixture Models Vahed Maroufy & Paul Marriott Department of Statistics and Actuarial Science University of Waterloo October 28 GSI 2015, Paris Computing Boundaries in Local Mixture Models Outline Outline 1 Inﬂuence of boundaries on parameter inference 2 Local mixture models (LMM) 3 Parameter space and boundaries Hard boundaries and Soft boundaries 4 Computing the boundaries for LMMs 5 Summary and future direction Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Deﬁnition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties AnayaIzquierdo and Marriott (2007) g is identiﬁable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a ﬁxed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Deﬁnition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties AnayaIzquierdo and Marriott (2007) g is identiﬁable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a ﬁxed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation The costs for all these good properties and ﬂexibility are Hard boundary =⇒ Positivity (boundary of Λµ) Soft boundary =⇒ Mixture behavior We compute them for two models here: Poisson and Normal We ﬁx k = 4 Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ  1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of halfspaces so convex Hard boundary is constructed by a set of (hyper)planes Soft boundary Deﬁnition For a density function f (x; µ) with k ﬁnite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M deﬁne C = convhull{Mr (f )µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ  1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of halfspaces so convex Hard boundary is constructed by a set of (hyper)planes Soft boundary Deﬁnition For a density function f (x; µ) with k ﬁnite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M deﬁne C = convhull{Mr (f )µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ  A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a ﬁnite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ  A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a ﬁnite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Normal model let y = x−µ σ2 Λµ = {λ  (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}. We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Ruled and developable surfaces Deﬁnition Ruled surface: Γ(x, γ) = α(x) + γ · β(x), x ∈ I ⊂ R, γ ∈ Rk Developable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Deﬁnition The family of planes, A = {λ ∈ R3  a(x) · λ + d(x) = 0, x ∈ R}, each determined by an x ∈ R, is called a oneparameter inﬁnite family of planes. Each element of the set {λ ∈ R3 a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R} is called a characteristic line of the surface at x and the union is called the envelope of the family. A characteristic line is the intersection of two consecutive planes The envelope is a developable surface Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Hard boundary of for Normal LMM (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 = 0, ∀y ∈ R . λ2 λ3 λ4 λ4 λ3 λ2 Figure : Left: The hard boundary for the normal LMM (shaded) as a subset of a self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Soft boundary of for Normal LMM recap : Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). For visualization purposes let k = 3, (µ ∈ M, ﬁx σ) M3(f ) = (µ, µ2 + σ2 , µ3 + 3µσ2 ), M3(g) = (µ, µ2 + σ2 + 2λ2, µ3 + 3µσ2 + 6µλ2 + 6λ3). Figure : the 3D curve ϕ(µ); Middle: the bounding ruled surface γa(µ, u); Right: the convex subspace restricted to soft boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Ruled surface parametrization Two boundary surfaces, each constructed by a curve and a set of lines attached to it. γa(µ, u) = ϕ(µ) + u La(µ) γb(µ, u) = ϕ(µ) + u Lb(µ) where for M = [a, b] and ϕ(µ) = M3(f ) La(µ): lines between ϕ(a) and ϕ(µ) Lb(µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture Models Summary Summary Understanding these boundaries is important if we want to exploit the nice statistical properties of LMM The boundaries described in this paper have both discrete aspects and smooth aspects The two example discussed represent the structure for almost all exponential family models It is a interesting problem to design optimization algorithms on these boundaries for ﬁnding boundary maximizers of likelihood Computing Boundaries in Local Mixture Models References AnayaIzquierdo, K., Critchley, F., and Marriott, P. (2013). when are ﬁrst order asymptotics adequate? a diagnostic. Stat, 3(1):17–22. AnayaIzquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640. Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, rnt078. Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20. Boroczky, K. and Fodor, F. (2008). Approximating 3dimensional convex bodies by polytopes with a restricted number of edges. Contributions to Algebra and Geometry, 49(1):177–193. Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272. Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal of Statistics, 3:259–289. Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of Diﬀerential Geometry, 57(2):239–271. Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492. Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93. Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture Models END Thank You
Keywords =
Abstract
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen1 Ga¨etan Hadjeres2 ´Ecole Polytechnique 1 Sony Computer Science Laboratories, Inc 1,2 Conference on Geometric Science of Information c 2015 Frank Nielsen  Ga¨etan Hadjeres 1 The Minimum Enclosing Ball problem Finding the Minimum Enclosing Ball (or the 1center) of a ﬁnite point set P = {p1, . . . , pn} in the metric space (X, dX (., .)) consists in ﬁnding c ∈ X such that c = argminc ∈X max p∈P dX (c , p) Figure : A ﬁnite point set P and its minimum enclosing ball MEB(P) c 2015 Frank Nielsen  Ga¨etan Hadjeres 2 The approximating minimum enclosing ball problem In a euclidean setting, this problem is welldeﬁned: uniqueness of the center c∗ and radius R∗ of the MEB computationally intractable in high dimensions. We ﬁx an > 0 and focus on the Approximate Minimum Enclosing Ball problem of ﬁnding an approximation c ∈ X of MEB(P) such that dX (c, p) ≤ (1 + )R∗ ∀p ∈ P. c 2015 Frank Nielsen  Ga¨etan Hadjeres 3 The approximating minimum enclosing ball problem: prior work Approximate solution in the euclidean case are given by Badoiu and Clarkson’s algorithm [Badoiu and Clarkson, 2008]: Initialize center c1 ∈ P Repeat 1/ 2 times the following update: ci+1 = ci + fi − ci i + 1 where fi ∈ P is the farthest point from ci . How to deal with point sets whose underlying geometry is not euclidean ? c 2015 Frank Nielsen  Ga¨etan Hadjeres 4 The approximating minimum enclosing ball problem: prior work This algorithm has been generalized to dually ﬂat manifolds [Nock and Nielsen, 2005] Riemannian manifolds [Arnaudon and Nielsen, 2013] Applying these results to hyperbolic geometry give the existence and uniqueness of MEB(P), but give no explicit bounds on the number of iterations assume that we are able to precisely cut geodesics. c 2015 Frank Nielsen  Ga¨etan Hadjeres 5 The approximating minimum enclosing ball problem: our contribution We analyze the case of point sets whose underlying geometry is hyperbolic. Using a closedform formula to compute geodesic αmidpoints, we obtain a intrinsic (1 + )approximation algorithm to the approximate minimum enclosing ball problem a O(1/ 2) convergence time guarantee a oneclass clustering algorithm for speciﬁc subfamilies of normal distributions using their Fisher information metric c 2015 Frank Nielsen  Ga¨etan Hadjeres 6 Model of ddimensional hyperbolic geometry: The Poincar´e ball model The Poincar´e ball model (Bd , ρ(., .)) consists in the open unit ball Bd = {x ∈ Rd : x < 1} together with the hyperbolic distance ρ (p, q) = arcosh 1 + 2 p − q 2 (1 − p 2) (1 − q 2) , ∀p, q ∈ Bd . This distance induces on the metric space (Bd , ρ) a Riemannian structure. c 2015 Frank Nielsen  Ga¨etan Hadjeres 7 Geodesics in the Poincar´e ball model Shorter paths between two points (geodesics) are exactly straight (euclidean) lines passing through the origin circle arcs orthogonal to the unit sphere Figure : “Straight” lines in the Poincar´e ball model c 2015 Frank Nielsen  Ga¨etan Hadjeres 8 Circles in the Poincar´e ball model Circles in the Poincar´e ball model look like euclidean circles but with diﬀerent center Figure : Diﬀerence between euclidean MEB (in blue) and hyperbolic MEB (in red) for the set of blue points in hyperbolic Poincar´e disk (in black). The red cross is the hyperbolic center of the red circle while the pink one is its euclidean center. c 2015 Frank Nielsen  Ga¨etan Hadjeres 9 Translations in the Poincar´e ball model Tp (x) = 1 − p 2 x + x 2 + 2 x, p + 1 p p 2 x 2 + 2 x, p + 1 Figure : Tiling of the hyperbolic plane by squares c 2015 Frank Nielsen  Ga¨etan Hadjeres 10 Closedform formula for computing αmidpoints A point m is the αmidpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m veriﬁes ρ (p, mα) = αρ (p, q) . c 2015 Frank Nielsen  Ga¨etan Hadjeres 11 Closedform formula for computing αmidpoints A point m is the αmidpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m veriﬁes ρ (p, mα) = αρ (p, q) . For the special case p = (0, . . . , 0), q = (xq, 0, . . . , 0), we have p#αq := (xα, 0, . . . , 0) with xα = cα,q − 1 cα,q + 1 , where cα,q := eαρ(p,q) = 1 + xq 1 − xq α . c 2015 Frank Nielsen  Ga¨etan Hadjeres 11 Closedform formula for computing αmidpoints Noting that p#αq = Tp (T−p (p) #αT−p (q)) ∀p, q ∈ Bd we obtain a closedform formula for computing p#αq how to compute p#αq in linear time O(d) that these transformations are exact. c 2015 Frank Nielsen  Ga¨etan Hadjeres 12 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius For a ﬁxed radius r > R∗, we can ﬁnd c ∈ Bd such that ρ (c, P) ≤ (1 + )r ∀p ∈ P with Algorithm 1: (1 + )approximation of EHB(P, r) 1: c0 := p1 2: t := 0 3: while ∃p ∈ P such that p /∈ B (ct, (1 + ) r) do 4: let p ∈ P be such a point 5: α := ρ(ct ,p)−r ρ(ct ,p) 6: ct+1 := ct#αp 7: t := t+1 8: end while 9: return ct c 2015 Frank Nielsen  Ga¨etan Hadjeres 13 Idea of the proof By the hyperbolic law of cosines : ch (ρt) ≥ ch (h) ch (ρt+1) ch (ρ1) ≥ ch (h)T ≥ ch ( r)T . ct+1 ct c∗ pt h > r ρt+1 ρt r ≤ rr θ θ Figure : Update of ct c 2015 Frank Nielsen  Ga¨etan Hadjeres 14 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius The EHB(P, r) algorithm is a O(1/ 2)time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. c 2015 Frank Nielsen  Ga¨etan Hadjeres 15 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius The EHB(P, r) algorithm is a O(1/ 2)time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. Our error with the true MEHB center c∗ veriﬁes ρ (c, c∗ ) ≤ arcosh ch ((1 + ) r) ch (R∗) c 2015 Frank Nielsen  Ga¨etan Hadjeres 15 (1 + + 2 /4)approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. c 2015 Frank Nielsen  Ga¨etan Hadjeres 16 (1 + + 2 /4)approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. This suggests to implement a dichotomic search in order to compute an approximation of the minimal hyperbolic enclosing ball. We obtain a O(1 + + 2/4)approximation of MEHB(P) in O N 2 log 1 iterations. c 2015 Frank Nielsen  Ga¨etan Hadjeres 16 (1 + + 2 /4)approximation of MEHB(P) algorithm Algorithm 2: (1 + )approximation of MEHB(P) 1: c := p1 2: rmax := ρ (c, P); rmin = rmax 2 ; tmax := +∞ 3: r := rmax; 4: repeat 5: ctemp := Alg1 P, r, 2 , interrupt if t > tmax in Alg1 6: if call of Alg1 has been interrupted then 7: rmin := r 8: else 9: rmax := r ; c := ctemp 10: end if 11: dr := rmax−rmin 2 ; r := rmin + dr ; tmax := log(ch(1+ /2)r)−log(ch(rmin)) log(ch(r /2)) 12: until 2dr < rmin 2 13: return c c 2015 Frank Nielsen  Ga¨etan Hadjeres 17 Experimental results The number of iterations does not depend on d. Figure : Number of αmidpoint calculations as a function of in logarithmic scale for diﬀerent values of d. c 2015 Frank Nielsen  Ga¨etan Hadjeres 18 Experimental results The running time is approximately O(dn 2 ) (vertical translation in logarithmic scale). Figure : execution time as a function of in logarithmic scale for diﬀerent values of d. c 2015 Frank Nielsen  Ga¨etan Hadjeres 19 Applications Hyperbolic geometry arises when considering certain subfamilies of multivariate normal distributions. For instance, the following subfamilies N µ, σ2In of nvariate normal distributions with scalar covariance matrix (In is the n × n identity matrix), N µ, diag σ2 1, . . . , σ2 n of nvariate normal distributions with diagonal covariance matrix N(µ0, Σ) of dvariate normal distributions with ﬁxed mean µ0 and arbitrary positive deﬁnite covariance matrix Σ are statistical manifolds whose Fisher information metric is hyperbolic. c 2015 Frank Nielsen  Ga¨etan Hadjeres 20 Applications In particular, our results apply to the twodimensional locationscale subfamily: Figure : MEHB (D) of probability density functions (left) in the (µ, σ) superior halfplane (right). P = {A, B, C}. c 2015 Frank Nielsen  Ga¨etan Hadjeres 21 Openings Plugging the EHB and MEHB algorithms to compute clusters centers in the approximation algorithm by [Gonzalez, 1985], we obtain approximate algorithms for covering in hyperbolic spaces the kcenter problem in O kNd 2 log 1 c 2015 Frank Nielsen  Ga¨etan Hadjeres 22 Algorithm 3: Gonzalez farthestﬁrst traversal approximation algo rithm 1: C1 := P, i = 0 2: while i ≤ k do 3: ∀j ≤ i, compute cj := MEB(Cj ) 4: ∀j ≤ i, set fj := argmaxp∈P ρ(p, cj ) 5: ﬁnd f ∈ {fj } whose distance to its cluster center is maximal 6: create cluster Ci containing f 7: add to Ci all points whose distance to f is inferior to the distance to their cluster center 8: increment i 9: end while 10: return {Ci }i c 2015 Frank Nielsen  Ga¨etan Hadjeres 23 Openings The computation of the minimum enclosing hyperbolic ball does not necessarily involve all points p ∈ P. Coresets in hyperbolic geometry the MEHB obtained by the algorithm is an coreset diﬀerences with the euclidean setting: coresets are of size at most 1/ [Badoiu and Clarkson, 2008] c 2015 Frank Nielsen  Ga¨etan Hadjeres 24 Thank you! c 2015 Frank Nielsen  Ga¨etan Hadjeres 25 Bibliography I Arnaudon, M. and Nielsen, F. (2013). On approximating the Riemannian 1center. Computational Geometry, 46(1):93–104. Badoiu, M. and Clarkson, K. L. (2008). Optimal coresets for balls. Comput. Geom., 40(1):14–22. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306. Nock, R. and Nielsen, F. (2005). Fitting the smallest enclosing Bregman ball. In Machine Learning: ECML 2005, pages 649–656. Springer. c 2015 Frank Nielsen  Ga¨etan Hadjeres 26
Keywords = Brain Computer, Information geometry, Interfaces, Riemannian means, Steady State, Visually Evoked Potentials
Abstract
From Euclidean to Riemannian Means: Information Geometry for SSVEP Classiﬁcation Emmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al. F’SATI  Tshawne University of Technology (South Africa) LISV  Université de Versailles SaintQuentin (France) Mensia Technologies (France) sylvain.chevallier@uvsq.fr 28 October 2015 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Cerebral interfaces Context Rehabilitation and disability compensation ) Outofthelab solutions ) Open to a wider population Problem Intrasubject variabilities ) Online methods, adaptative algorithms Intersubject variabilities ) Good generalization, fast convergence Opportunities New generation of BCI (Congedo & Barachant) • Growing interest in EEG community • Large community, available datasets • Challenging situations and problems S. Chevallier 28/10/2015 GSI 2 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Outline BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances S. Chevallier 28/10/2015 GSI 3 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction based on brain activity BrainComputer Interface (BCI) for nonmuscular communication • Medical applications • Possible applications for wider population Recording at what scale ? • Neuron !LFP • Neuronal group !ECoG !SEEG • Brain !EEG !MEG !IRMf !TEP S. Chevallier 28/10/2015 GSI 4 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback S. Chevallier 28/10/2015 GSI 5 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Electroencephalography Most BCI rely on EEG ) Eﬃcient to capture brain waves • Lightweight system • Low cost • Mature technologies • High temporal resolution • No trepanation S. Chevallier 28/10/2015 GSI 6 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Origins of EEG • Local ﬁeld potentials • Electric potential diﬀerence between dendrite and soma • Maxwell’s equation • Quasistatic approximation • Volume conduction eﬀect • Sensitive to conductivity of brain skull • Sensitive to tissue anisotropies S. Chevallier 28/10/2015 GSI 7 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental paradigms Diﬀerent brain signals for BCI : • Motor imagery : (de)synchronization in premotor cortex • Evoked responses : low amplitude potentials induced by stimulus SteadyState Visually Evoked Potentials 8 electrodes in occipital region SSVEP stimulation LEDs 13 Hz 17 Hz 21 Hz • Neural synchronization with visual stimulation • No learning required, based on visual attention • Strong induced activation S. Chevallier 28/10/2015 GSI 8 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances BCI Challenges Limitations • Data scarsity ) A few sources are nonlinearly mixed on all electrodes • Individual variabilities ) Eﬀect of mental fatigue • Intersession variabilities ) Electronic impedances, localizations of electrodes • Interindividual variabilities ) State of the art approaches fail with 20% of subjects Desired properties : • Online systems ) Continously adapt to the user’s variations • No calibration phase ) Non negligible cognitive load, raises fatigue • Generic model classiﬁers and transfert learning ) Use data from one subject to enhance the results for another S. Chevallier 28/10/2015 GSI 9 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Spatial covariance matrices Common approach : spatial ﬁltering • Eﬃcient on clean datasets • Speciﬁc to each user and session ) Require user calibration • Two step training with feature selection ) Overﬁtting risk, curse of dimensionality Working with covariance matrices • Good generalization across subjects • Fast convergence • Existing online algorithms • Eﬃcient implementations S. Chevallier 28/10/2015 GSI 10 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG • An EEG trial : X 2 RC⇥N , C electrodes, N time samples • Assuming that X ⇠ N(0, ⌃) • Covariance matrices ⌃ belong to MC = ⌃ 2 RC⇥C : ⌃ = ⌃ and x ⌃x > 0, 8x 2 RC \0 • Mean of the set {⌃i }i=1,...,I is ¯⌃ = argmin⌃2MC PI i=1 dm (⌃i , ⌃) • Each EEG class is represented by its mean • Classiﬁcation based on those means • How to obtain a robust and eﬃcient algorithm ? Congedo, 2013 S. Chevallier 28/10/2015 GSI 11 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Minimum distance to Riemannian mean Simple and robust classiﬁer • Compute the center ⌃ (k) E of each of the K classes • Assign a given unlabelled ˆ⌃ to the closest class k⇤ = argmin k (ˆ⌃, ⌃ (k) E ) Trajectories on tangent space at mean of all trials ¯⌃µ −4 −2 0 2 4 −4 −2 0 2 4 6 Resting class 13Hz class 21Hz class 17Hz class Delay S. Chevallier 28/10/2015 GSI 12 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Riemannian potato Removing outliers and artifacts Reject any ⌃i that lies too far from the mean of all trials ¯⌃µ z( i ) = i µ > zth , i is d(⌃i , ¯⌃), µ and are the mean and standard deviation of distances { i } I i=1 Raw matrices Riemannian potato ﬁltering S. Chevallier 28/10/2015 GSI 13 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEGbased BCI Riemannian approaches in BCI : • Achieve state of the art results ! performing like spatial ﬁltering or sensorspace methods • Rely on simpler algorithms ! less errorprone, computationally eﬃcient What are the reason of this success ? • Invariances embedded with Riemannian distances ! invariance to rescaling, normalization, whitening ! invariance to electrode permutation or positionning • Equivalent to working in an optimal source space ! spatial ﬁltering are sensitive to outliers and userspeciﬁc ! no question on "sensors or sources" methods ) What are the most desirable invariances for EEG ? S. Chevallier 28/10/2015 GSI 14 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Considered distances and divergences Euclidean dE(⌃1, ⌃2) = k⌃1 ⌃2kF LogEuclidean dLE(⌃1, ⌃2) = klog(⌃1) log(⌃2)kF V. Arsigny et al., 2006, 2007 Aﬃneinvariant dAI(⌃1, ⌃2) = klog(⌃ 1 1 ⌃2)kF T. Fletcher & S. Joshi, 2004 , M. Moakher, 2005 ↵divergence d↵ D(⌃1, ⌃2) 1<↵<1 = 4 1 ↵2 log det( 1 ↵ 2 ⌃1+ 1+↵ 2 ⌃2) det(⌃1) 1 ↵ 2 det(⌃2) 1+↵ 2 Z. Chebbi & M. Moakher, 2012 Bhattacharyya dB(⌃1, ⌃2) = ⇣ log det 1 2 (⌃1+⌃2) (det(⌃1) det(⌃2))1/2 ⌘1/2 Z. Chebbi & M. Moakher, 2012 S. Chevallier 28/10/2015 GSI 15 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental results • Euclidean distances yield the lowest results ! Usually attributed to the invariance under inversion that is not guaranteed ! Displays swelling eﬀect • Riemannian approaches outperform stateoftheart methods (CCA+SVM) • ↵divergence shows the best performances ! but requires a costly optimisation to ﬁnd the best ↵ value • Bhattacharyya has the lowest computational cost and a good accuracy −1 −0.5 0 0.5 1 20 30 40 50 60 70 80 90 Accuracy(%) Alpha values (α) −1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 CPUtime(s) S. Chevallier 28/10/2015 GSI 16 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Conclusion Working with covariance matrices in BCI • Achieves very good results • Simple algorithms work well : MDM, Riemannian potato • Need for robust and online methods Interesting applications for IG : • Many freely available datasets • Several competitions • Many open source toolboxes for manipulating EEG Several open questions : • Handling electrodes misplacements and others artifacts • Missing data and covariance matrices of lower rank • Inter and intraindividual variabilities S. Chevallier 28/10/2015 GSI 17 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Thank you ! S. Chevallier 28/10/2015 GSI 18 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback First systems in early ’70 S. Chevallier 28/10/2015 GSI 19 / 19
Keywords =
Abstract
Group Theoretical Study on Geodesics for the Elliptical Models Hiroto Inoue Kyushu University, Japan October 28, 2015 GSI2015, ´Ecole Polytechnique, ParisSaclay, France Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 1 / 14 Overview 1 Eriksen’s construction of geodesics on normal model Problem 2 Reconsideration of Eriksen’s argument Embedding Nn → Sym+ n+1(R) 3 Geodesic equation on Elliptical model 4 Future work Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 2 / 14 Eriksen’s construction of geodesics on normal model Let Sym+ n (R) be the set of ndimensional positivedeﬁnite matrices. The normal model Nn = (M, ds2) is a Riemannian manifold deﬁned by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 ). The geodesic equation on Nn is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0. (1) The solution of this geodesic equation has been obtained by Eriksen. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 3 / 14 Theorem ([Eriksen 1987]) For any x ∈ Rn, B ∈ Symn(R), deﬁne a matrix exponential Λ(t) by Λ(t) = ∆ δ Φ tδ tγ tΦ γ Γ := exp(−tA), A := B x 0 tx 0 −tx 0 −x −B ∈ Mat2n+1. (2) Then, the curve (µ(t), Σ(t)) := (−∆−1δ, ∆−1) is the geodesic on Nn satisﬁying the initial condition (µ(0), Σ(0)) = (0, In), ( ˙µ(0), ˙Σ(0)) = (x, B). (proof) We see that by the deﬁnition, (µ(t), Σ(t)) satisﬁes the geodesic equation. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 4 / 14 Problem 1 Explain Eriksen’s theorem, to clarify the relation between the normal model and symmetric spaces. 2 Extend Eriksen’s theorem to the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 5 / 14 Reconsideration of Eriksen’s argument Sym+ n+1(R) Notice that the positivedeﬁnite symmetric matrices Sym+ n+1(R) is a symmetric space by G/K Sym+ n+1(R) gK → g · tg, where G = GLn+1(R), K = O(n + 1). This space G/K has the Ginvariant Riemannian metric ds2 = 1 2 tr (S−1 dS)2 . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 6 / 14 Embedding Nn → Sym+ n+1(R) Put an aﬃne subgroup GA := P µ 0 1 P ∈ GLn(R), µ ∈ Rn ⊂ GLn+1(R). Deﬁne a Riemannian submanifold as the orbit GA · In+1 = {g · t g g ∈ GA} ⊂ Sym+ n+1(R). Theorem (Ref. [Calvo, Oller 2001]) We have the following isometry Nn ∼ −→ GA · In+1 ⊂ Sym+ n+1(R), (Σ, µ) → Σ + µtµ µ tµ 1 . (3) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 7 / 14 Embedding Nn → Sym+ n+1(R) By using the above embedding, we get a simpler expression of the metric and the geodesic equation. Nn ∼= GA · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σ + µtµ µ tµ 1 metric ds2 = (tdµ)Σ−1(dµ) +1 2tr((Σ−1dΣ)2) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0 ⇔ (In, 0)(S−1 ˙S) = (B, x) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 8 / 14 Reconsideration of Eriksen’s argument We can interpret the Eriksen’s argument as follows. Diﬀerential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (B, x) A = B x 0 t x 0 −t x 0 −x −B −→ e−tA = ∆ δ ∗ t δ ∗ ∗ ∗ ∗ −→ S := ∆ δ t δ −1 ∈ ∈ ∈ {A : JAJ = −A} −→ {Λ : JΛJ = Λ−1 } −→ Essential! Nn ∼= GA · In+1 ∩ ∩ ∩ sym2n+1(R) −→ exp Sym+ 2n+1(R) −→ projection Sym+ n+1(R) Here J = In 1 In . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 9 / 14 Geodesic equation on Elliptical model Deﬁnition Let us deﬁne a Riemannian manifold En(α) = (M, ds2) by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 )+ 1 2 dα tr(Σ−1 dΣ) 2 . (4) where dα = (n + 1)α2 + 2α, α ∈ C. Then En(0) = Nn. The geodesic equation on En(α) is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ− dα ndα + 1 t ˙µΣ−1 ˙µΣ = 0. (5) This is equivalent to the geodesic equation on the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 10 / 14 Geodesic equation on Elliptical model The manifold En(α) is also embedded into positivedeﬁnite symmetric matrices Sym+ n+1(R), ref. [Calvo, Oller 2001], and we have simpler expression of the geodesic equation. En(α) ∼= ∃GA(α) · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σα Σ + µtµ µ tµ 1 metric (4) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. (5) ⇔ (In, 0)(S−1 ˙S) = (C, x) − α(log S) (In, 0) A = det A Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 11 / 14 Geodesic equation on Elliptical model But, in general, we do not ever construct any submanifold N ⊂ Sym+ 2n+1(R) such that its projection is En(α): Diﬀerential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (C, x) − α(log S) (In, 0) Λ(t) −→ S(t) ∈ ∈ N −→ En(α) ∼= GA(α) · In+1 ∩ ∩ Sym+ 2n+1(R) −→ projection Sym+ n+1(R) The geodesic equation on elliptical model has not been solved. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 12 / 14 Future work 1 Extend Eriksen’s theorem for elliptical models (ongoing) 2 Find Eriksen type theorem for general symmetric spaces G/K Sketch of the problem: For a projection p : G/K → G/K, ﬁnd a geodesic submanifold N ⊂ G/K, such that pN maps all the geodesics to the geodesics: ∀Λ(t): Geodesic −→ p(Λ(t)): Geodesic ∈ ∈ N −→ pN p(N) ∩ ∩ G/K −→ p:projection G/K Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 13 / 14 References Calvo, M., Oller, J.M. A distance between elliptical distributions based in an embedding into the Siegel group, J. Comput. Appl. Math. 145, 319–334 (2002). Eriksen, P.S. Geodesics connected with the Fisher metric on the multivariate normal manifold, pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987). Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 14 / 14
Keywords =
Abstract
Path connectedness on a space of probability density functions Osamu Komori1 , Shinto Eguchi2 University of Fukui1 , Japan The Institute of Statistical Mathematics2 , Japan Ecole Polytechnique, ParisSaclay (France) October 28, 2015 Komori, O. (University of Fukui) GSI2015 October 28, 2015 1 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 2 / 18 Setting Terminology . . X : data space P : probability measure on X FP: space of probability density functions associated with P We consider a path connecting f and g, where f, g ∈ FP, and investigate the property from a viewpoint of information geometry. Komori, O. (University of Fukui) GSI2015 October 28, 2015 3 / 18 KolmogorovNagumo (KN) average Let ϕ : (0, ∞) → R be an monotonic increasing and concave continuous function. Then for f and g in Fp The KolmogorovNagumo (KN) average . . ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) for 0 ≤ t ≤ 1. Remark 1 . . ϕ−1 is monotone increasing, convex and continuous on (0, ∞) Komori, O. (University of Fukui) GSI2015 October 28, 2015 4 / 18 ϕpath Based on KN average, we consider ϕpath connecting f and g in FP: ϕpath . . ft(x, ϕ) = ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) , where κt ≤ 0 is a normalizing factor, where the equality holds if t = 0 or t = 1. Komori, O. (University of Fukui) GSI2015 October 28, 2015 5 / 18 Existence of κt Theorem 1 . . There uniquely exists κt such that ∫ X ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) dP(x) = 1 Proof From the convexity of ϕ−1 , we have 0 ≤ ∫ ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) dP(x) ≤ ∫ {(1 − t)f(x) + tg(x)}dP(x) ≤ 1 And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing. Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equation above. Komori, O. (University of Fukui) GSI2015 October 28, 2015 6 / 18 Illustration of ϕpath Komori, O. (University of Fukui) GSI2015 October 28, 2015 7 / 18 Examples of ϕpath Example 1 . 1 ϕ0(x) = log(x). The ϕ0path is given by ft(x, ϕ0) = exp((1 − t) log f(x) + t log g(x) − κt), where κt = log ∫ exp((1 − t) log f(x) + t log g(x))dP(x). 2 ϕη(x) = log(x + η) with η ≥ 0. The ϕηpath is given by ft(x, ϕη) = exp [ (1 − t) log{ f(x) + η} + t log{g(x) + η} − κt ] , where κt = log [ ∫ exp{(1 − t) log{f(x) + η} + t log{g(x) + η}}dP(x) − η ] . 3 ϕβ(x) = (xβ − 1)/β with β ≤ 1. The ϕβpath is given by ft(x, ϕβ) = {(1 − t)f(x)β + tg(x)β − κt} 1 β , where κt does not have an explicit form. Komori, O. (University of Fukui) GSI2015 October 28, 2015 8 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 9 / 18 Extended expectation For a function a(x): X → R, we consider Extended expectation . . E(ϕ) f {a(X)} = ∫ X 1 ϕ′(f(x)) a(x)dP(x) ∫ X 1 ϕ′(f(x)) dP(x) , where ϕ: (0, ∞) → R is a generator function. Remark 2 If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation. Komori, O. (University of Fukui) GSI2015 October 28, 2015 10 / 18 Properties of extended expectation We note that 1 E(ϕ) f (c) = c for any constant c. 2 E(ϕ) f {ca(X)} = cE(ϕ) f {a(X)} for any constant c. 3 E(ϕ) f {a(X) + b(X)} = E(ϕ) f {a(X)} + E(ϕ) f {b(X)}. 4 E(ϕ) f {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 for Palmost everywhere x in X. Remark 3 If we deﬁne f(ϕ) (x) = 1/ϕ′ ( f(x))/ ∫ X 1/ϕ′ (f(x))dP(x), then E(ϕ) f {a(X)} = Ef(ϕ) {a(X)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 11 / 18 Tangent space of FP Let Hf be a Hilbert space with the inner product deﬁned by ⟨a, b⟩f = E(ϕ) f {a(X)b(X)}, and the tangent space Tangent space associated with extended expectation . . Tf = {a ∈ Hf : ⟨a, 1⟩f = 0}. For a statistical model M = { fθ(x)}θ∈Θ we have E(ϕ) fθ {∂iϕ(fθ(X))} = 0 for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi)i=1,··· ,p. Further, E(ϕ) fθ {∂i∂jϕ(fθ(X))} = E(ϕ) fθ { ϕ′′ ( fθ(X)) ϕ′(fθ(X))2 ∂iϕ(fθ(X))∂iϕ(fθ(X)) } . Komori, O. (University of Fukui) GSI2015 October 28, 2015 12 / 18 Parallel displacement A(ϕ) t Deﬁne A(ϕ) t (x) in Tft by the solution for a differential equation ˙A(ϕ) t (x) − E(ϕ) ft { A(ϕ) t ˙ft ϕ′′ ( ft) ϕ′(ft) } = 0, where ft is a path connecting f and g such that f0 = f and f1 = g. ˙A(ϕ) t (x) is the derivative of A(ϕ) t (x) with respect to t. Theorem 2 The geodesic curve {ft}0≤t≤1 by the parallel displacement A(ϕ) t is the ϕpath. Komori, O. (University of Fukui) GSI2015 October 28, 2015 13 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 14 / 18 Udivergence Assume that U(s) is a convex and increasing function of a scalar s and let ξ(t) = argmaxs{st − U(s)} . Then we have Udivergence . . DU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP − ∫ {U(ξ(f)) − fξ( f)}dP. In fact, Udivergence is the difference of the cross entropy CU( f, g) with the diagonal entropy CU( f, f), where CU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP. Komori, O. (University of Fukui) GSI2015 October 28, 2015 15 / 18 Connections based on Udivergence For a manifold of ﬁnite dimension M = { fθ(x) : θ ∈ Θ} and vector ﬁelds X and Y on M, the Riemannian metric is G(U) (X, Y)(f) = ∫ X f Yξ( f)dP for f ∈ M and linear connections ∇(U) and ∇∗(U) are G(U) (∇(U) X Y, Z)(f) = ∫ XY f Zξ(f)dP and G(U) (∇∗ X (U) Y, Z)(f) = ∫ Z f XYξ(f)dP. See Eguchi (1992) for details. Komori, O. (University of Fukui) GSI2015 October 28, 2015 16 / 18 Equivalence between ∇∗ geodesic and ξpath Let ∇(U) and ∇∗(U) be linear connections associated with Udivergence DU, and let C(ϕ) = {ft(x, ϕ) : 0 ≤ t ≤ 1} be the ϕ path connecting f and g of FP. Then, we have Theorem 3 A ∇(U) geodesic curve connecting f and g is equal to C(id) , where id denotes the identity function; while a ∇∗(U) geodesic curve connecting f and g is equal to C(ξ) , where ξ(t) = argmaxs{st − U(s)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 17 / 18 Summary 1 We consider ϕpath based on KolmogorovNagumo average. 2 The relation between Udivergence and ϕpath was investigated (ϕ corresponds to ξ). 3 The idea of ϕpath can be applied to probability density estimation as well as classiﬁcation problems. 4 Divergence associated with ϕpath can be considered, where a special case would be Bhattacharyya divergence. Komori, O. (University of Fukui) GSI2015 October 28, 2015 18 / 18
Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 2830 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of afﬁne connections { −1 , +1 } (mixture and exponential connections). These allow to deﬁne notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two afﬁne structures (maps on M) are used simultaneously: 1: Mixture afﬁne geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential afﬁne geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifoldbased systems to simplexbased geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efﬁcient algorithms in a computational framework. It is inherently ﬁnite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in 1space π1 π2 6 4 2 0 2 4 6 6420246 mixed geodesics in +1space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the loglikelihood function. The loglikelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an inﬁnitedimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {XX ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2simplex. Right: Embedded binomial (k = 3) in the 3simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefﬁcient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates wellknown difﬁculties: Identiﬁability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore deﬁned on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efﬁcient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows lowdimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a ndim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with lowdimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efﬁcient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple oneparameter binomial [...] models generally provides poor ﬁts to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Deﬁne the norm ππ0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Deﬁne the distance function as d(π(θ), π0) := inf π∈∪Si π(θ) − ππ0 . Theorem (AnayaIzquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N(ˆπG − ˆπNP MLE )ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary Highdimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, SI (1985), Differentialgeometrical methods in statistics, SpringerVerlag. AnayaIzquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 11971224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 24542471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 6976. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 7793. Germain Van Bever CIG for mixtures 19/19
Bayesian and Information Geometry for Inverse Problems (chaired by Ali MohammadDjafari, Olivier Swander)
Keywords =
Abstract
Stochastic PDE projection on manifolds: AssumedDensity and Galerkin Filters GSI 2015, Oct 28, 2015, Paris Damiano Brigo Dept. of Mathematics, Imperial College, London www.damianobrigo.it — Joint work with John Armstrong Dept. of Mathematics, King’s College, London — Full paper to appear in MCSS, see also arXiv.org D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 1 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a mdim manifold (?) S (S1/2). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a mdim manifold (?) S (S1/2). The topology & differential structure in the chart is the L2 structure, but two possibilities: S : d2(p1, p2) = p1 − p2 (L2 direct distance), p1,2 ∈ L2 S1/2 : dH( √ p1, √ p2) = √ p1 − √ p2 (Hellinger distance), p1,2 ∈ L1 where · is the norm of Hilbert space L2. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is deﬁned (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous FisherRao matrix (dH) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is deﬁned (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous FisherRao matrix (dH) d2 ort. projection: Πγ θ [v] = m i=1 [ m j=1 γij (θ) v, ∂p(·, θ) ∂θj ] ∂p(·, θ) ∂θi (dH proj. analogous inserting √ · and replacing γ with g) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) The nonlinear ﬁltering problem consists in ﬁnding the conditional probability distribution πt of the state Xt given the observations up to time t, i.e. πt (dx) := P[Xt ∈ dx  Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t). Assume πt has a density pt : then pt satisﬁes the Str SPDE: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a ﬁnite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a ﬁnite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). With Ito calculus we would have terms ∂2p(·,θ) ∂θi ∂θj d θi, θj (not tang vec) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Projection Filters Projection ﬁlter in the metrics h (L2) and g (Fisher) dθi t = m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Nonlinear Projection Filtering Projection Filters Projection ﬁlter in the metrics h (L2) and g (Fisher) dθi t = m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . Instead, using the Hellinger distance & the Fisher metric with projection Πg dθi t = m j=1 gij (θt ) L∗ t p(x, θt ) p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 gij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 gij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). Projection ﬁlter in η coincides with classical approx ﬁlter: assumed density ﬁlter (based on generalized “moment matching”) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? The answer is afﬁrmative, and this is the mixture family. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? The answer is afﬁrmative, and this is the mixture family. We deﬁne a simple mixture family as follows. Given m + 1 ﬁxed squared integrable probability densities q = [q1, q2, . . . , qm+1]T , deﬁne ˆθ(θ) := [θ1, θ2, . . . , θm, 1 − θ1 − θ2 − . . . − θm]T for all θ ∈ Rm. We write ˆθ instead of ˆθ(θ). Mixture family (simplex): SM (q) = {ˆθ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1} D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). The L2 metric does not depend on the speciﬁc point θ of the manifold. The same holds for the tangent space at p(·, θ), which is given by span{q1 − qm+1, q2 − qm+1, · · · , qm − qm+1} Also the L2 projection becomes particularly simple. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection ﬁlter that is the same as approximate ﬁltering via Galerkin [5] methods. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection ﬁlter that is the same as approximate ﬁltering via Galerkin [5] methods. See the full paper for the details. Summing up: Family → Exponential Basic Mixture Metric ↓ Hellinger dH Good Nothing special Fisher g(θ) ∼ADF ≈ local moment matching Direct L2 d2 Nothing special Good matrix γ(θ) (∼Galerkin) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not ﬁxed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not ﬁxed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 We are now going to illustrate the Gaussian mixture projection ﬁlter (GMPF) in a fundamental example. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (pink) vs EKF (N) (blue) vs exact (green, ﬁnite diff. method, grid 1000 state & 5000 time) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 0 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 13 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 1 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 14 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 2 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 15 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 3 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 16 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 4 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 17 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 5 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 18 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 6 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 19 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 7 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 20 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 8 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 21 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 9 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 22 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 10 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 23 / 37 Mixture Projection Filter The quadratic sensor Comparing local approximation errors (L2 residuals) εt ε2 t = (pexact,t (x) − papprox,t (x))2 dx papprox,t (x): three possible choices. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (blue) vs EKF (N) (green) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 24 / 37 Mixture Projection Filter The quadratic sensor L2 residuals for the quadratic sensor 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 25 / 37 Mixture Projection Filter The quadratic sensor Comparing local approx errors (Prokhorov residuals) εt εt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) + ∀x} with F the CDF of p’s. LevyProkhorov metric works well with singular densities like particles where L2 metric not ideal. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (green) vs best three particles (blue) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 26 / 37 Mixture Projection Filter The quadratic sensor L´evy residuals for the quadratic sensor 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 1 2 3 4 5 6 7 8 9 10 Time ProkhorovResiduals Prokhorov Residual (L2NM) Prokhorov Residual (HE) Best possible residual (3Deltas) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 27 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular The solution is to dynamically change the parameterization and even the dimension of the manifold. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? Optimality: introducing new projections (forthcoming J. Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Thanks With thanks to the organizing committee. Thank you for your attention. Questions and comments welcome D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 30 / 37 Conclusions and References References I [1] J. Aggrawal: Sur l’information de Fisher. In: Theories de l’Information (J. Kampe de Feriet, ed.), SpringerVerlag, Berlin–New York 1974, pp. 111117. [2] Amari, S. Differentialgeometrical methods in statistics, Lecture notes in statistics, SpringerVerlag, Berlin, 1985 [3] Armstrong, J., and Brigo, D. (2016). Nonlinear ﬁltering via stochastic PDE projection on mixture manifolds in L2 direct metric, Mathematics of Control, Signals and Systems, 2016, accepted. [4] Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W. (1999). Nonlinear Projection Filter based on Galerkin approximation. AIAA Journal of Guidance Control and Dynamics, 22 (2): 258266. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 31 / 37 Conclusions and References References II [5] Beard, R. and Gunther, J. (1997). Galerkin Approximations of the Kushner Equation in Nonlinear Estimation. Working Paper, Brigham Young University. [6] BarndorffNielsen, O.E. (1978). Information and Exponential Families. John Wiley and Sons, New York. [7] Brigo, D. Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. BarndorffNielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientiﬁc, 1999 [8] Brigo, D, On SDEs with marginal laws evolving in ﬁnitedimensional exponential families, STAT PROBABIL LETT, 2000, Vol: 49, Pages: 127 – 134 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 32 / 37 Conclusions and References References III [9] Brigo, D. (2011). The direct L2 geometric structure on a manifold of probability densities with applications to Filtering. Available on arXiv.org and damianobrigo.it [10] Brigo, D, Hanzon, B, LeGland, F, A differential geometric approach to nonlinear ﬁltering: The projection ﬁlter, IEEE T AUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252 [11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear ﬁltering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 – 534 [12] D. Brigo, Filtering by Projection on the Manifold of Exponential Densities, PhD Thesis, Free University of Amsterdam, 1996. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 33 / 37 Conclusions and References References IV [13] Brigo, D., and Pistone, G. (1996). Projecting the FokkerPlanck Equation onto a ﬁnite dimensional exponential family. Available at arXiv.org [14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbook of Nonlinear Filtering, Oxford University Press. [15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear ﬁltering, in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: The Mathematics of Filtering and Identiﬁcation and Applications (Reidel, Dordrecht, 1981) 53–75. [16] Elworthy, D. (1982). Stochastic Differential Equations on Manifolds. LMS Lecture Notes. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 34 / 37 Conclusions and References References V [17] Hanzon, B. A differentialgeometric approach to approximate nonlinear ﬁltering. In C.T.J. Dodson, Geometrization of Statistical Theory, pages 219 – 223,ULMD Publications, University of Lancaster, 1987. [18] B. Hanzon, Identiﬁability, recursive identiﬁcation and spaces of linear dynamical systems, CWI Tracts 63 and 64, CWI, Amsterdam, 1989 [19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence of ﬁnite dimensional ﬁlters for conditional statistics of the cubic sensor problem, Systems and Control Letters 3 (1983) 331–340. [20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften, vol. 288 (1987), SpringerVerlag, Berlin, D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 35 / 37 Conclusions and References References VI [21] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochastic differential equations for the non linear ﬁltering problem. Osaka J. Math. Volume 9, Number 1 (1972), 1940. [23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets of Probability Distributions. Presented at the 1st International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 29 June  2 July 1999 [24] R. Z. Khasminskii (1980). Stochastic Stability of Differential Equations. Alphen aan den Reijn [25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I, General Theory (Springer Verlag, Berlin, 1978). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 36 / 37 Conclusions and References References VII [26] M. Murray and J. Rice  Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993. [27] D. Ocone, E. Pardoux, A Lie algebraic criterion for nonexistence of ﬁnite dimensionally computable ﬁlters, Lecture notes in mathematics 1390, 197–204 (Springer Verlag, 1989) [28] Pistone, G., and Sempi, C. (1995). An Inﬁnite Dimensional Geometric Structure On the space of All the Probability Measures Equivalent to a Given one. The Annals of Statistics 23(5), 1995 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 37 / 37
Keywords =
Abstract
. Variational Bayesian Approximation method for Classiﬁcation and Clustering with a mixture of Studentt model Ali MohammadDjafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRSCentraleSup´elecUNIV PARIS SUD SUPELEC, 91192 GifsurYvette, France http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr http://djafari.free.fr http://publicationslist.org/djafari A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 1/20 Contents 1. Mixture models 2. Diﬀerent problems related to classiﬁcation and clustering Training Supervised classiﬁcation Semisupervised classiﬁcation Clustering or unsupervised classiﬁcation 3. Mixture of Studentt 4. Variational Bayesian Approximation 5. VBA for Mixture of Studentt 6. Conclusion A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 2/20 Mixture models General mixture model p(xa, Θ, K) = K k=1 ak pk(xkθk), 0 < ak < 1 Same family pk(xkθk) = p(xkθk), ∀k Gaussian p(xkθk) = N(xkµk, Σk) with θk = (µk, Σk) Data X = {xn, n = 1, · · · , N} where each element xn can be in one of these classes cn. ak = p(cn = k), a = {ak, k = 1, · · · , K}, Θ = {θk, k = 1, · · · , K} p(Xn, cn = ka, θ) = N n=1 p(xn, cn = ka, θ). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 3/20 Diﬀerent problems Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classiﬁcation: Given a sample xm and the parameters K, a and Θ determine its class k∗ = arg max k {p(cm = kxm, a, Θ, K)} . Semisupervised classiﬁcation (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k∗ = arg max k {p(cm = kxm, Θ, K)} . Clustering or unsupervised classiﬁcation (Number of classes K is not known): Given a set of data X, determine K and c. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 4/20 Training Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): (a, Θ) = arg max (a,Θ) {p(X, ca, Θ, K)} . Bayesian: Assign priors p(aK) and p(ΘK) = K k=1 p(θk) and write the expression of the joint posterior laws: p(a, ΘX, c, K) = p(X, ca, Θ, K) p(aK) p(ΘK) p(X, cK) where p(X, cK) = p(X, ca, ΘK)p(aK) p(ΘK) da dΘ Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 5/20 Supervised classiﬁcation Given a sample xm and the parameters K, a and Θ determine p(cm = kxm, a, Θ, K) = p(xm, cm = ka, Θ, K) p(xma, Θ, K) where p(xm, cm = ka, Θ, K) = akp(xmθk) and p(xma, Θ, K) = K k=1 ak p(xmθk) Best class k∗: k∗ = arg max k {p(cm = kxm, a, Θ, K)} A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 6/20 Semisupervised classiﬁcation Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = kxm, Θ, K) = p(xm, cm = kΘ, K) p(xmΘ, K) where p(xm, cm = kΘ, K) = p(xm, cm = ka, Θ, K)p(aK) da and p(xmΘ, K) = K k=1 p(xm, cm = kΘ, K) Best class k∗, for example the MAP solution: k∗ = arg max k {p(cm = kxm, Θ, K)} . A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 7/20 Clustering or nonsupervised classiﬁcation Given a set of data X, determine K and c. Determination of the number of classes: p(K = LX) = p(X, K = L) p(X) = p(XK = L) p(K = L) p(X) and p(X) = L0 L=1 p(K = L) p(XK = L), where L0 is the a priori maximum number of classes and p(XK = L) = n L k=1 akp(xn, cn = kθk)p(aK) p(ΘK) da dΘ When K and c are determined, we can also determine the characteristics of those classes a and Θ. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 8/20 Mixture of Studentt model Studentt and its Inﬁnite Gaussian Scaled Model (IGSM): T (xν, µ, Σ) = ∞ 0 N(xµ, z−1 Σ) G(z ν 2 , ν 2 ) dz where N(xµ, Σ)= 2πΣ−1 2 exp −1 2(x − µ) Σ−1 (x − µ) = 2πΣ−1 2 exp −1 2Tr (x − µ)Σ−1 (x − µ) and G(zα, β) = βα Γ(α) zα−1 exp [−βz] . Mixture of Studentt: p(x{νk, ak, µk, Σk, k = 1, · · · , K}, K) = K k=1 ak T (xnνk, µk, Σk). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 9/20 Mixture of Studentt model Introducing znk, zk = {znk, n = 1, · · · , N}, Z = {znk}, c = {cn, n = 1, · · · , N}, θk = {νk, ak, µk, Σk}, Θ = {θk, k = 1, · · · , K} Assigning the priors p(Θ) = k p(θk), we can write: p(X, c, Z, ΘK) = n k akN(xnµk, z−1 n,k Σk) G(znkνk 2 , νk 2 ) p(θk) Joint posterior law: p(c, Z, ΘX, K) = p(X, c, Z, ΘK) p(XK) . The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classiﬁcation or clustering. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA) Main idea: to propose easy computational approximation q(c, Z, Θ) for p(c, Z, ΘX, K). Criterion: KL(q : p) Interestingly, by noting that p(c, Z, ΘX, K) = p(X, c, Z, ΘK)/p(XK) we have: KL(q : p) = −F(q) + ln p(XK) where F(q) = − ln p(X, c, Z, ΘK) q is called free energy of q and we have the following properties: – Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(XK). – When the optimum q∗ is obtained, F(q∗) can be used as a criterion for model selection. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 11/20 VBA: choosing the good families Using KL(q : p) has the very interesting property that using q to compute the means we obtain the same values if we have used p (Conservation of the means). Unfortunately, this is not the case for variances or other moments. If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 12/20 Hierarchical graphical model ξ0 d d © αk βk znk E γ0, Σ0 c Σk µ0, η0 c µk k0 c a d d © d d © ¨ ¨¨¨ ¨¨%xn E Figure : Graphical representation of the model. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 13/20 VBA for mixture of Studentt In our case, noting that p(X, c, Z, ΘK) = n k p(xn, cn, znkak, µk, Σk, νk) k [p(αk) p(βk) p(µkΣk) p(Σk)] with p(xn, cn, znkak, µk, Σk, νk) = N(xnµk, z−1 n,k Σk) G(znkαk, βk) is separable, in one side for [c, Z] and in other size in components of Θ, we propose to use q(c, Z, Θ) = q(c, Z) q(Θ). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 14/20 VBA for mixture of Studentt With this decomposition, the expression of the KullbackLeibler divergence becomes: KL(q1(c, Z)q2(Θ) : p(c, Z, ΘX, K) = c q1(c, Z)q2(Θ) ln q1(c, Z)q2(Θ) p(c, Z, ΘX, K) dΘ dZ The expression of the Free energy becomes: F(q1(c, Z)q2(Θ)) = c q1(c, Z)q2(Θ) ln p(X, c, ZΘ, K)p(ΘK) q1(c, Z)q2(Θ) dΘ dZ A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Studentt priors model Using a generalized Studentt obtained by replacing G(zn,kνk 2 , νk 2 ) by G(zn,kαk, βk) it will be easier to propose conjugate priors for αk, βk than for νk. p(xn, cn = k, znkak, µk, Σk, αk, βk, K) = ak N(xnµk, z−1 n,k Σk) G(zn,kαk, βk). In the following, noting by Θ = {(ak, µk, Σk, αk, βk), k = 1, · · · , K}, we propose to use the factorized prior laws: p(Θ) = p(a) k [p(αk) p(βk) p(µkΣk) p(Σk)] with the following components: p(a) = D(ak0), k0 = [k0, · · · , k0] = k01 p(αk) = E(αkζ0) = G(αk1, ζ0) p(βk) = E(βkζ0) = G(αk1, ζ0) p(µkΣk) = N(µkµ01, η−1 0 Σk) p(Σk) = IW(Σkγ0, γ0Σ0) A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Studentt priors model where D(ak) = Γ( l kk) l Γ(kl ) l akl −1 l is the Dirichlet pdf, E(tζ0) = ζ0 exp [−ζ0t] is the Exponential pdf, G(ta, b) = ba Γ(a) ta−1 exp [−bt] is the Gamma pdf and IW(Σγ, γ∆) = 1 2∆γ/2 exp −1 2Tr ∆Σ−1 ΓD(γ/2)Σ γ+D+1 2 . is the inverse Wishart pdf. With these prior laws and the likelihood: joint posterior law: pk(c, Z, ΘX) = p(X, c, Z, Θ) p(X) . A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 17/20 Expressions of q q(c, Z, Θ) = q(c, Z) q(Θ) = n k[q(cn = kznk) q(znk)] k[q(αk) q(βk) q(µkΣk) q(Σk)] q(a). with: q(a) = D(a˜k), ˜k = [˜k1, · · · , ˜kK ] q(αk) = G(αk˜ζk, ˜ηk) q(βk) = G(βk˜ζk, ˜ηk) q(µkΣk) = N(µkµ, ˜η−1Σk) q(Σk) = IW(Σk˜γ, ˜γ ˜Σ) With these choices, we have F(q(c, Z, Θ)) = ln p(X, c, Z, ΘK) q(c,Z,Θ) = k n F1kn + k F2k F1kn = ln p(xn, cn, znk, θk) q(cn=kznk )q(znk ) F2k = ln p(xn, cn, znk, θk) q(θk )A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 18/20 VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) ﬁxed, we obtain the expression of q(cn = kznk) = ˜ak, q(znk) = G(znkαk, βk). M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) ﬁxed, we obtain the expression of q(a) = D(a˜k), ˜k = [˜k1, · · · , ˜kK ], q(αk) = G(αk˜ζk, ˜ηk), q(βk) = G(βk˜ζk, ˜ηk), q(µkΣk) = N(µkµ, ˜η−1Σk), and q(Σk) = IW(Σk˜γ, ˜γ ˜Σ), which gives the updating algorithm for the corresponding tilded parameters. F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. Final value of F(q) for each value of K, noted Fk, can be used as a criterion for model selection, i.e.; the determination of the number of clusters. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 19/20 Conclusions Clustering and classiﬁcation of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. Mixture models and in particular Mixture of Gaussians are classical models for these tasks. We proposed to use a mixture of generalised Studentt distribution model for the data via a hierarchical graphical model. To obtain fast algorithms and be able to handle large data sets, we used conjugate priors everywhere it was possible. The proposed algorithm has been used for clustering, classiﬁcation and discriminant analysis of some biological data (Cancer research related), but in this paper, we only presented the main algorithm. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 20/20
Keywords =
Abstract
What is textile plot? Textile set Main result Other results Summary Geometric Properties of textile plot Tomonari SEI and Ushio TANAKA University of Tokyo and Osaka Prefecture University at ´Ecole Polytechnique, Oct 28, 2015 1 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a diﬀerential geometrical point of view. It is shown that the textile set is written as the union of two diﬀerentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a diﬀerential geometrical point of view. It is shown that the textile set is written as the union of two diﬀerentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary 1 What is textile plot? 2 Textile set 3 Main result 4 Other results 5 Summary 3 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a locationscale transformation. Categorical data is quantiﬁed. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a locationscale transformation. Categorical data is quantiﬁed. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Coeﬃcients a = (aj ) and b = (bj ) are the solution of the following minimization problem: Minimize a,b n∑ t=1 p∑ j=1 (ytj − ¯yt·)2 subject to yj = aj + bj xj , p∑ j=1 yj 2 = 1. Intuition: as horizontal as possible. Solution: a = 0 and b is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of X. yt1 yt2 yt3 yt4 yt5 yt. 6 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = 100, p = 4) X ∈ R100×4. Each row ∼ N(0, Σ), Σ = 1 −0.6 0.5 0.1 −0.6 1 −0.6 −0.2 0.5 −0.6 1 0.0 0.1 −0.2 0.0 1 . −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 (a) raw data X (b) textile plot Y 7 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2)  y1 = y2 = 1/ √ 2}, B = {(y1, y2)  y1 − y2 = y1 + y2 = 1}, each of which is diﬀeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2)  y1 = y2 = 1/ √ 2}, B = {(y1, y2)  y1 − y2 = y1 + y2 = 1}, each of which is diﬀeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = p = 2) T2,2 ⊂ R4 is the union of two tori, glued along O(2). θ φ ξ η T2,2 = { 1 √ 2 ( cos θ cos φ sin θ sin φ )} ∪ { 1 2 ( cos ξ + cos η cos ξ − cos η sin ξ + sin η sin ξ − sin η )} 11 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Deﬁnition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0 , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Deﬁnition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0 , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary Main theorem The diﬀerential geometrical property of U∗∗ n,p is given as follows: Theorem Let n ≥ p ≥ 3. Then we have the following decomposition U∗∗ n,p = M1 ∪ M2, where each Mi is a diﬀerentiable manifold, the dimensions of which are given by dim M1 = p(p + 1) 2 − (p − 1), dim M2 = p(p + 1) 2 − p, respectively. M2 is connected while M1 may not. 16 / 23 What is textile plot? Textile set Main result Other results Summary Example U∗∗ 3,3 is the union of 4dim and 3dim manifolds. We look at a cross section with y11 = y22 = 1: y12 y13 y33 Union of a surface and a vertical line. 17 / 23 What is textile plot? Textile set Main result Other results Summary Corollary Let n ≥ p ≥ 3. Then we have U∗ n,p = π−1 (M1) ∪ π−1 (M2), where π denotes the map of GramSchmidt orthonormalization. The dimensions are dim π−1 (M1) = np − (p − 1), dim π−1 (M2) = np − p. 18 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) := ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1 . Lemma We have a classiﬁcation of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) := ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1 . Lemma We have a classiﬁcation of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Lastly, we state a characterization of fλ −1 (O) from the viewpoint of diﬀerential geometry. Theorem Let λ ≥ 0. fλ −1 (O) is a regular submanifold of Rn×p with codimension p + 1 whenever λ > 0, y11yjj − y1j yj1 = 0, j = 2, . . . , p, ∃ ∈ { 2, . . . , p }; p∑ j=2 yij + yi (1 − 2λ) = 0, i = 1, . . . , n. 21 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary References . 1 Absil, P.A., Mahony, R., and Sepulchre, R. (2008), Optimization Algorithms on Matrix Manifolds, Princeton University Press. . 2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot, Proceedings of the Institute of Statistical Mathematics, 55, 69–83. . 3 Inselberg, A. (2009), Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Springer. 4 Kumasaka, N. and Shibata, R. (2008), Highdimensional data visualisation: The textile plot, Computational Statistics and Data Analysis, 52, 3616–3644. 23 / 23
Keywords =
Abstract
A generalization of independence and multivariate Student’s tdistributions MATSUZOE Hiroshi Nagoya Institute of Technology joint works with SAKAMOTO Monta (Efrei, Paris) 1 Deformed exponential family 2 Nonadditive diﬀerentials and expectation functionals 3 Geometry of deformed exponential families 4 Generalization of independence 5 qindependence and Student’s tdistributions 6 Appendix Notions of expectations, independence are determined from the choice of statistical models. Introduction: Geometry and statistics • Geometry for the sample space • Geometry for the parameter space • Wasserstein geometry • Optimal transport theory • A pdf is regarded as a distribution of mass • Information geometry • Convexity of entropy and free energy • Duality of estimating function
Hessian Information Geometry (chaired by ShunIchi Amari, Michel Boyom)
Keywords =
Abstract
2nd Conference on Geometric Science of Information, GSI2015 October 28–30, 2015 – Ecole Polytechnique, ParisSaclay New Metric and Connections in Statistical Manifolds Rui F. Vigelis,1 David C. de Souza,2 and Charles C. Cavalcante3 1 3 Federal University of Ceará – Brazil 2 Federal Institute of Ceará – Brazil Session “Hessian Information Geometry”, October 28 Outline Introduction ϕFunctions ϕDivergence Generalized Statistical Manifold Connections ϕFamilies Discussion Introduction In the paper R.F. Vigelis, C.C. Cavalcante. On ϕfamilies of probability distributions. J. Theor. Probab., 26(3):870–884, 2013, the authors proposed the so called ϕdivergence Dϕ(p q), for p, q ∈ Pµ. The ϕdivergence is deﬁned in terms of a ϕfunction. The metric and connections that we propose is derived from the ϕdivergence Dϕ(· ·). Introduction The proposition of new geometric structures (metric and connections) in statistical manifolds is a recurrent research topic. To cite a few: J. Zhang. Divergence function, duality, and convex analysis. Neural Computation, 16(1): 159–195, 2004. J. Naudts. Estimators, escort probabilities, and φexponential families in statistical physics. JIPAM, 5(4): Paper No. 102, 15 p., 2004. S.i. Amari, A. Ohara, H. Matsuzoe. Geometry of deformed exponential families: invariant, duallyﬂat and conformal geometries. Physica A, 391(18): 4308–4319, 2012. H. Matsuzoe. Hessian structures on deformed exponential families and their conformal structures. Diﬀerential Geom. Appl, 35(suppl.): 323–333, 2014. Introduction Let (T, Σ, µ) be a measure space. All probability distributions will be considered Pµ = p ∈ L0 : p > 0 and ˆ T pdµ = 1 , where L0 denotes the set of all realvalued, measurable functions on T, with equality µa.e. ϕFunctions A function ϕ: R → (0, ∞) is said to be a ϕfunction if the following conditions are satisﬁed: (a1) ϕ(·) is convex; (a2) limu→−∞ ϕ(u) = 0 and limu→∞ ϕ(u) = ∞; (a3) there exists a measurable function u0 : T → (0, ∞) such that ˆ T ϕ(c(t) + λu0(t))dµ < ∞, for all λ > 0, for each measurable function c : T → R such that ϕ(c) ∈ Pµ. Not all functions satisfying (a1) and (a2) admit the existence of u0. Condition (a3) is imposed so that ϕfamilies are parametrizations for Pµ in the same manner as exponential families. ϕFunctions The κexponential function expκ : R → (0, ∞), for κ ∈ [−1, 1], which is given by expκ(u) = (κu + √ 1 + κ2u2)1/κ, if κ = 0, exp(u), if κ = 0, is a ϕfunction. The qexponential function expq(u) = [1 + (1 − q)u] 1 1−q + , where q > 0 and q = 1, is not a ϕfunction (expq(u) = 0 for u < 1/(1 − q)). A ϕfunction ϕ(·) may not be a φexponential function expφ(·), which is deﬁned as the inverse of lnφ(u) = ˆ u 1 1 φ(x) dx, u > 0, for some increasing function φ: [0, ∞) → [0, ∞). ϕDivergence We deﬁne the ϕdivergence as Dϕ(p q) = ˆ T ϕ−1(p) − ϕ−1(q) (ϕ−1) (p) dµ ˆ T u0 (ϕ−1) (p) dµ , for any p, q ∈ Pµ. If ϕ(·) = exp(·) and u0 = 1 then Dϕ(p q) coincides with the Kullback–Leibler divergence DKL(p q) = ˆ T p log p q dµ. Generalized Statistical Manifold A metric (gij ) can be derived from the ϕdivergence: gij = − ∂ ∂θi p ∂ ∂θj q Dϕ(p q) q=p = −Eθ ∂2fθ ∂θi ∂θj , where fθ = ϕ−1(pθ) and Eθ[·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . Considering the loglikelihood function lθ = log(pθ) in the place of fθ = ϕ−1(pθ), we get the Fisher information matrix. Generalized Statistical Manifold A family o probability distributions P = {pθ : θ ∈ Θ} ⊆ Pµ is said to be a generalized statistical manifold if the following conditions are satisﬁed: (P1) Θ is a domain (an open and connected set) in Rn. (P2) p(t; θ) = pθ(t) is a diﬀerentiable function with respect to θ. (P3) The operations of integration with respect to µ and diﬀerentiation with respect to θi commute. (P4) The matrix g = (gij ), which is deﬁned by gij = −Eθ ∂2fθ ∂θi ∂θj , is positive deﬁnite at each θ ∈ Θ. Generalized Statistical Manifold The matrix (gij ) can also be expressed as gij = Eθ ∂fθ ∂θi ∂fθ ∂θj , where Eθ [·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . As consequence, the mapping X = i ai ∂ ∂θi → X = i ai ∂fθ ∂θi is an isometry between the tangent space TθP at pθ and TθP = span ∂fθ ∂θi : i = 1, . . . , n , equipped with the inner product X, Y θ = Eθ [XY ]. Connections We use the ϕdivergence Dϕ(· ·) to deﬁne a pair of mutually dual connections D(1) and D(−1), whose Christoﬀel symbols are given by Γ (1) ijk = − ∂2 ∂θi ∂θj p ∂ ∂θk q Dϕ(p q) q=p and Γ (−1) ijk = − ∂ ∂θk p ∂2 ∂θi ∂θj q Dϕ(p q) q=p . Connections D(1) and D(−1) correspond to the exponential e mixture connections. Connections Expressions for the Christoﬀel symbols Γ (1) ijk and Γ (−1) ijk are given by Γ (1) ijk = Eθ ∂2fθ ∂θi ∂θj ∂fθ ∂θk − Eθ ∂2fθ ∂θi ∂θj Eθ u0 ∂fθ ∂θk and Γ (−1) ijk = Eθ ∂2fθ ∂θi ∂θj ∂fθ ∂θk + Eθ ∂fθ ∂θi ∂fθ ∂θj ∂fθ ∂θk − Eθ ∂fθ ∂θj ∂fθ ∂θk Eθ u0 ∂fθ ∂θi − Eθ ∂fθ ∂θi ∂fθ ∂θk Eθ u0 ∂fθ ∂θj , where Eθ [·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . Terms in red vanish if ϕ(·) = exp(·) and u0 = 1. Connections Using the pair of mutually dual connections D(1) and D(−1), we can specify a family of αconnections D(α) in generalized statistical manifolds, whose Christoﬀel symbols are Γ (α) ijk = 1 + α 2 Γ (1) ijk + 1 − α 2 Γ (−1) ijk . The connections D(α) and D(−α) are mutually dual. For α = 0 , the connection D(0), which is clearly selfdual. corresponds to the Levi–Civita connection . ϕFamilies A parametric ϕfamily Fp = {pθ : θ ∈ Θ} centered at p = ϕ(c) is deﬁned by pθ(t) := ϕ c(t) + n i=1 θi ui (t) − ψ(θ)u0(t) , where ψ: Θ → [0, ∞) is a normalizing function. The functions satisfy some conditions, which imply ψ ≥ 0. The domain Θ can be chosen to be maximal. If ϕ(·) = exp(·) and u0 = 1, then Fp corresponds to an exponential family. ϕFamilies The normalizing function and ϕdivergence are related by ψ(θ) = Dϕ(p pθ). The matrix (gij ) is the Hessian of the normalizing function ψ: gij = ∂2ψ ∂θi ∂θj . As a result, Γ (0) ijk = 1 2 ∂gij ∂θk = 1 2 ∂2ψ ∂θi ∂θj ∂θj . ϕFamilies In ϕfamilies, the Christoﬀel symbols Γ (1) ijk vanish identically, i.e., (θi ) is an aﬃne coordinate system, and the connection D(1) is ﬂat (and D(−1) is also ﬂat). Thus Fp admits a coordinate system (ηj ) that is dual to (θi ), and there exist potential functions ψ and ψ∗ such that θi = ∂ψ∗ ∂ηi , ηj = ∂ψ ∂θj , and ψ(p) + ψ∗ (p) = i θi (p)ηi (p). Discussion Advantages of (gij ), and Γ (1) ijk , Γ (−1) ijk being derived from Dϕ(· ·): Duality. Pythagorean Relation. Projection Theorem. Open questions: An example of generalized statistical manifold whose coordinate system is D(−1) ﬂat. Parallel transport with respect to D(−1) . Divergence or ϕfunction associated with αconnections. End Thank you!
Keywords = Affine connection, Curvature tensor, Laplacian Bochner’s technique, Ricci tensor, Sectional curvature
Abstract
Curvatures of statistical structures Barbara Opozda Paris, October 2015 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 1 / 29 Statistical structures  statistical setting M  open subset of Rn Λ  probability space with a ﬁxed σalgebra p : M × Λ (x, λ) → p(x, λ) ∈ R  smooth relative to x such that px (λ) := p(x, λ) is a probability measure on Λ — probability distribution (x, λ) := log(p(x, λ)) gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to the probability px ∀x ∈ M, ∂1, ..., ∂n  the canonical frame on M g – Fisher information metric tensor ﬁeld on M Cijk(x) = Ex [(∂i )(∂j )(∂k )]  cubic form (g, C) – statistical structure on M Barbara Opozda () Curvatures of statistical structures Paris, October 2015 2 / 29 Statistical structures (Codazzi structures)– geometric setting; three equivalent deﬁnitions M – manifold, dim M = n I) (g, C), C  totally symmetric (0, 3)tensor ﬁeld on M, that is, C(X, Y , Z) = C(Y , X, Z) = C(Y , Z, X) ∀X, Y , Z ∈ Tx M, x ∈ M C – cubic form II) (g, K), K – symmetric (1, 2)tensor ﬁeld (i.e., K(X, Y ) = K(Y , X)) and symmetric relative to g, that is, g(X, K(Y , Z)) = g(Y , K(X, Z)) is symmetric for all arguments. C(X, Y , Z) = g(X, K(Y , Z)) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 3 / 29 III) (g, ),  torsionfree connection such that ( X g)(Y , Z) = ( Y g)(X, Z) (1) — statistical connection T – any tensor ﬁeld of type (p, q) on M, T – of type (p, q + 1) T(X, Y1, ..., Yq) = ( X T)(Y1, ..., Yq) In particular, g(X, Y , Z) = ( X g)(Y , Z) (1) ⇔ g is a symmetric cubic form ˆ  LeviCivita connection for g K(X, Y ) := X Y − ˆ X Y K – diﬀerence tensor g(X, Y , Z) = −2g(X, K(Y , Z)) = −2C(X, Y , Z) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 4 / 29 A statistical structure is trivial if and only if K = 0 or equivalently C = 0 or equivalently = ˆ . KX Y := K(X, Y ) E := tr g K = K(e1, e1) + ... + K(en, en) = (tr Ke1 )e1 + ... + (tr Ken )en E – mean diﬀerence vector ﬁeld E = 0 ⇔ tr KX = 0 ∀X ∈ TM ⇔ tr g C(X, ·, ·) = 0 ∀X ∈ TM E = 0 ⇒ tracefree statistical structure Fact. (g, ) – tracefree if and only if νg = 0, where νg – volume form determined by g Barbara Opozda () Curvatures of statistical structures Paris, October 2015 5 / 29 Examples Riemannian geometry of the second fundamental form M – locally strongly hypersurface in Rn+1 – the second fundamental form h satisﬁes the Codazzi equation h(X, Y , Z) = h(Y , X, Z), where is the induced connection (the LeviCivita connection of the ﬁrst fundamental form) (h, )  statistical structure Similarly one gets statistical structures on hypersurfaces in space forms. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 6 / 29 Equiaﬃne geometry of hypersurfaces in the standard aﬃne space Rn+1 M – locally strongly convex hypersurface in Rn+1 ξ – a transversal vector ﬁeld D – standard ﬂat connection on Rn+1, X, Y ∈ X(M), ξ  transversal vector ﬁeld DX Y = X Y + h(X, Y )ξ − Gauss formula – induced connection, h – second fundamental form (metric tensor ﬁeld) DX ξ = −SX + τ(X)ξ − Weingarten formula If τ = 0, ξ is called equiaﬃne. In this case the Codazzi equation is satisﬁed h(X, Y , Z) = h(Y , X, Z) (h, ) – statistical structure Barbara Opozda () Curvatures of statistical structures Paris, October 2015 7 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Geometry of Lagrangian submanifolds in Kaehler manifolds N – Kaehler manifold of real dimension 2n and with complex structure J M – Lagrangian submanifold of N  ndimensional submanifold such that JTM orthogonal to TM, i.e. JTM is the normal bundle (in the metric sense) for M ⊂ N D – the Kaehler connection on N DX Y = X Y + JK(X, Y ) g – induced metric tensor ﬁeld on M (g, K) – statistical structure It is tracefree ⇔ M is minimal in N. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 11 / 29 Most of statistical structures are outside the three classes of examples. For instance, in order that a statistical structure is locally realizable on an equiaﬃne hypersurface it is necessary that is projectively ﬂat. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 12 / 29 Dual connections, curvature tensors g – metric tensor ﬁeld on M, – any connection Xg(Y , Z) = g( X Y , Z) + g(Y , X Z) (2) – dual connection (g, ) – statistical structure if and only if (g, ) – statistical structure R(X, Y )Z – (1, 3)  curvature tensor for If R = 0 the structure is called Hessian R(X, Y )Z – curvature tensor for g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) (3) In particular, R = 0 ⇔ R = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 13 / 29 ˆ – LeviCivita connection for g, = ˆ + K, = ˆ − K ˆR – curvature tensor for ˆ R(X, Y ) = ˆR(X, Y ) + ( ˆ X K)Y − ( ˆ Y K)X + [KX , KY ] (4) , where [KX , KY ] = KX KY − KY KX R(X, Y ) = ˆR(X, Y ) − ( ˆ X K)Y + ( ˆ Y K)X + [KX , KY ] (5) R(X, Y ) + R(X, Y ) = 2ˆR(X, Y ) + 2[KX , KY ] (6) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 14 / 29 Sectional curvatures R does not have to be skewsymmetric relative to g, i.e. g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z), in general. Lemma * The following conditions are equivalent: 1) g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) ∀X, Y , Z, W 2) R = R 3) ˆ K is symmetric, that is, ( ˆ K)(X, Y , Z) = ( ˆ X K)(Y , Z) = ( ˆ Y K)(X, Z) = ( ˆ K)(Y , X, Z) ∀X, Y , Z. For hypersurfaces in Rn+1 each of the above conditions describes an aﬃne sphere Barbara Opozda () Curvatures of statistical structures Paris, October 2015 15 / 29 R := R+R 2 [K, K](X, Y )Z := [KX , KY ]Z R(X, Y )Z and [K, K](X, Y )Z are Riemanncurvaturelike tensors – they are skewsymmetric in X, Y , satisfy the ﬁrst Bianchi identity, R(X, Y ), [K, K](X, Y ) are skewsymmetric relative to g ∀X, Y π – vector plane in Tx M, X, Y – orthonormal basis of π sectional curvature for g – ˆk(π) := g(ˆR(X, Y )Y , X) sectional Kcurvature – k(π) := g([K, K](X, Y )Y , X) sectional curvature – k (π) := g(R(X, Y )Y , X) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 16 / 29 In general, Schur’s lemma does not hold for k and k. We have, however, Lemma Assume that M is connected, dim M > 2 and the sectional  curvature (the sectional Kcurvature) is pointwise constant. If one of the equivalent conditions in Lemma * holds then the sectional curvature (the sectional Kcurvature) is constant on M. sectional Kcurvature The easiest situation which should be taken into account is when the sectional Kcurvature is constant for all vector planes in Tx M. In this respect we have Barbara Opozda () Curvatures of statistical structures Paris, October 2015 17 / 29 Theorem If the sectional Kcurvature is constant and equal to A for all vector planes in Tx M then there is an orthonormal basis e1, ..., en of Tx M and numbers λ1, ..., λn, µ1, ..., µn−1 such that Ke1 = λ1 µ1 ... µ1 Kei = µ1 ... µi−1 µ1 · · · µi−1 λi µi ... µi Ken = µ1 ... µn−1 µ1 · · · µn−1 λn Barbara Opozda () Curvatures of statistical structures Paris, October 2015 18 / 29 continuation of the theorem Moreover µi = λi − λ2 i − 4Ai−1 2 , Ai = Ai−1 − µ2 i , for i = 1, ..., n − 1 where A0 = A. The above representation of K is not unique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µi for i = 1, ..., n − 1 are expressed as follows λi = (n − i) −Ai−1 n − i + 1 , µi = − −Ai−1 n − i + 1 . In particular, in the last case the numbers λi , µi depend only on A and the dimension of M. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 19 / 29 Example 1. Ke1 = λ λ/2 ... λ/2 Kei = λ/2 ... 0 λ/2 · · · 0 0 0 ... 0 Ken = λ/2 ... 0 λ/2 · · · 0 0 The sectional Kcurvature is constant = λ2/4 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 20 / 29 Example 2. Kcurvature vanishes, i.e. [K, K] = 0. There is an orthonormal frame e1, ..., e1 such that Ke1 = λ1 0 ... 0 Kei = 0 ... 0 0 · · · 0 λi 0 ... 0 Ken = 0 ... 0 0 · · · 0 λn Barbara Opozda () Curvatures of statistical structures Paris, October 2015 21 / 29 Some theorems on the sectional Kcurvature (g, K) – tracefree if E = tr g K = 0 Theorem Let (g, K) be a tracefree statistical structure on M with symmetric ˆ K. If the sectional Kcurvature is constant then either K = 0 (the statistical structure is trivial) or ˆR = 0 and ˆ K = 0. Theorem Let ˆ K = 0. Each of the following conditions implies that ˆR = 0: 1) the sectional Kcurvature is negative, 2) [K,K]=0 and K is nondegenerate, i.e. X → KX is a monomorphism. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 22 / 29 Theorem K is as in Example 1. at each point of M, ˆ K is symmetric, div E is constant on M (E = tr g K). Then the sectional curvature for g by any plane containing E is nonpositive. Moreover, if M is connected it is constant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g) by any plane containing E vanishes. Theorem If the sectional Kcurvature is nonpositive on M and [K, K] · K = 0 then the sectional Kcurvature vanishes on M. Corollary If (g, K) is a Hessian structure on M with nonnegative sectional curvature of g and such that ˆR · K = 0 then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 23 / 29 Theorem The sectional Kcurvature is negative on M, ˆR · K = 0. Then ˆR = 0. Theorem Let M be a Lagrangian submanifold of N, where N is a Kaehler manifold of constant holomorphic curvature 4c, the sectional curvature of the ﬁrst fundamental form g on M is smaller than c on M and ˆR · K = 0, where K is the second fundamental tensor of M ⊂ N. Then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 24 / 29 sectional curvature All aﬃne spheres are statistical manifolds of constant sectional curvature A Riemann curvaturelike tensor deﬁnes the curvature operator. For instance, for the curvature tensor R = (R + R)/2 we have the curvature operator R : Λ2TM → Λ2TM given by g(R(X ∧ Y ), Z ∧ W ) = g(R(Z, W )Y , X) A curvature operator is symmetric relative to the canonical extension of g to the bundle Λ2TM. Hence it is diagonalizable. In particular, it can be positive deﬁnite, negative deﬁnite etc. The assumption that R is positive deﬁnite is stronger than the assumption that the sectional curvature is positive. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 25 / 29 Theorem Let M be a connected compact oriented manifold and (g, ) be a tracefree statistical structure on M. If R = R and the curvature operator determined by the curvature tensor ˆR is positive deﬁnite on M then the sectional curvature is constant. Theorem Let M be a connected compact oriented manifold and (g, ) be a tracefree statistical structure on M. If the curvature operator for R = R+R 2 is positive on M then the Betti numbers b1(M) = ... = bn−1(M) = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 26 / 29 sectional curvature for g ˆk(π) = g(ˆR(X, Y )Y , X), X, Y – an orthonormal basis for π Theorem Let M be a compact manifold equipped with a tracefree statistical structure (g, ) such that R = R. If the sectional curvature ˆk for g is positive on M then the structure is trivial, that is = ˆ . In the 2dimensional case we have Theorem Let M be a compact surface equipped with a tracefree statistical structure (g, ). If M is of genus 0 and R = R then the structure is trivial. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 27 / 29 B. Opozda, Bochner’s technique for statistical manifolds, Annals of Global Analysis and Geometry, DOI 10.1007/s104550159475z B. Opozda, A sectional curvature for statistical structures, arXiv:1504.01279[math.DG] Barbara Opozda () Curvatures of statistical structures Paris, October 2015 28 / 29 Hessian structures (g, ) – Hessian if R = 0. Then R = 0 and ˆR = −[K, K]. (g, ) is Hessian if and only if ˆ K is symmetric and ˆR = −[K, K]. All Hessian structure are locally realizable on aﬃne hypersurfaces in Rn+1 equipped with Calabi’s structure. If they are tracefree they are locally realizable on improper aﬃne spheres. If the diﬀerence tensor is as in Example 1. and the structure is Hessian then K = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 29 / 29
Authors
Frédéric Barbaresco
Geometric Science of Information SEE/SMAI GSI’15 Conference LIX Colloquium 2015 Frédéric BARBARESCO* & Frank Nielsen** GSI’15 General Chairmen (*) President of SEE ISIC Club (Ingéniérie des Systèmes d’Information de Communications) (**) LIX Department, Ecole Polytechnique Société de l'électricité, de l'électronique et des technologies de l'information et de la communication Flashback GSI’13 Ecole des Mines de Paris Hirohiko Shima JeanLouis Koszul ShinIchi Amari SEE at a glance • Meeting place for science, industry and society • An officialy recognised nonprofit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 19 «Clubs techniques» and 12 «Groupes régionaux» • Organizes conferences and seminars • Initiates/attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, BlancLapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 3 periodical publications (REE, …) & 3 monographs each year • Web: http://www.see.asso.fr and LinkedIn SEE group • SEE Presidents: Louis de Broglie, Paul Langevin, … 18832015: From SIE & SFE to SEE: 132 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 Louis de Broglie Paul Langevin GSI’15 Sponsors GSI Logo: Adelard of Bath • He left England toward the end of the 11th century for Tours in France • Adelard taught for a time at Laon, leaving Laon for travel no later than 1109. • After Laon, he travelled to Southern Italy and Sicily no later than 1116. • Adelard also travelled extensively throughout the "lands of the Crusades": Greece, West Asia, Sicily, Spain, and potentially Palestine. The frontispiece of an Adelard of Bath Latin translation of Euclid's Elements, c. 1309– 1316; the oldest surviving Latin translation of the Elements is a 12thcentury translation by Adelard from an Arabic version Adelard of Bath was the first to translate Euclid’s Elements in Latin Adelard of Bath has introduced the word « Algorismus » in Latin after his translation of Al Khuwarizmi SMAI/SEE GSI’15 • More than 150 attendees from 15 different countries • 85 scientific presentations on 3 days • 3 keynote speakers • Mathilde MARCOLLI (CallTech): “From Geometry and Physics to Computational Linguistics” • Tudor RATIU (EPFL): “Symmetry methods in geometric mechanics” • Marc ARNAUDON (Bordeaux University): “Stochastic EulerPoincaré reduction” • 1 Short Course • Chaired by Roger BALIAN • Dominique SPEHNER (Grenoble University): “Geometry on the set of quantum states and quantum correlations” • 1 Guest speaker • CharlesMichel MARLE (UPMC): “Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems” • Social events: • Welcome cocktail at Ecole Polytechnique • Diner in Versailles Palace Gardens GSI’15 Topics • GSI’15 federates skills from Geometry, Probability and Information Theory: • Dimension reduction on Riemannian manifolds • Optimal Transport and applications in Imagery/Statistics • Shape Space & Diffeomorphic mappings • Random Geometry/Homology • Hessian Information Geometry • Topological forms and Information • Information Geometry Optimization • Information Geometry in Image Analysis • Divergence Geometry • Optimization on Manifold • Lie Groups and Geometric Mechanics/Thermodynamics • Computational Information Geometry • Lie Groups: Novel Statistical and Computational Frontiers • Geometry of Time Series and Linear Dynamical systems • Bayesian and Information Geometry for Inverse Problems • Probability Density Estimation GSI’15 Program GSI’15 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 9389 (800 pages), ISBN 9783319250397 • http://www.springer.com/us/book/9783319250397 GSI’15 Special Issue • Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI • http://www.mdpi.com/journal/entropy/special_issues/entropystatistics • A book could be edited by MDPI: e.g. Ecole Polytechnique • Special thanks to « LIX » Department A product of the French Revolution and the Age of Enlightenment, École Polytechnique has a rich history that spans over 220 years. https://www.polytechnique.edu/en/history Henri Poincaré – X1873 ParisSaclay University in Top 8 World Innovation Hubs http://www.technologyreview.com/news/517626/ infographictheworldstechnologyhubs/ A new Grammar of Information “Mathematics is the art of giving the same name to different things” – Henri Poincaré GROUP EVERYWHERE Elie Cartan Henri Poincaré METRIC EVERYWHERE Maurice Fréchet Misha Gromov “the problems addressed by Elie Cartan are among the most important, most abstract and most general dealing with mathematics; group theory is, so to speak, the whole mathematics, stripped of its material and reduced to pure form. This extreme level of abstraction has probably made my presentation a little dry; to assess each of the results, I would have had virtually render him the material which he had been stripped; but this refund can be made in a thousand different ways; and this is the only form that can be found as well as a host of various garments, which is the common link between mathematical theories that are often surprised to find so near” H. Poincaré Elie Cartan: Group Everywhere (Henri Poincaré review of Cartan’s Works) Maurice Fréchet: Metric Everywhere • Maurice Fréchet made major contributions to the topology of point sets and introduced the entire concept of metric spaces. • His dissertation opened the entire field of functionals on metric spaces and introduced the notion of compactness. • He has extended Probability in Metric space 1948 (Annales de l’IHP) Les éléments aléatoires de nature quelconque dans un espace distancié Extension of Probability/Statistic in abstract/Metric space GSI’15 & Geometric Mechanics • The master of geometry during the last century, Elie Cartan, was the son of Joseph Cartan who was the village blacksmith. • Elie recalled that his childhood had passed under “blows of the anvil, which started every morning from dawn”. • We can imagine easily that the child, Elie Cartan, watching his father Joseph “coding curvature” on metal between the hammer and the anvil, insidiously influencing Elie’s mind with germinal intuition of fundamental geometric concepts. • The etymology of the word “Forge”, that comes from the late XIV century, “a smithy”, from Old French forge “forge, smithy” (XII century), earlier faverge, from Latin fabrica “workshop, smith’s shop”, from faber (genitive fabri) “workman in hard materials, smith”. HAMMER = The CoderANVIL = Curvature Libraries Bigorne Bicorne Venus at the Forge of Vulcan, Le Nain Brothers, Musée SaintDenis, Reims From Homo Sapiens to Homo Faber “Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson Into the Flaming Forge of Vulcan, Diego Velázquez, Museo Nacional del Prado Geometric Thermodynamics & Statistical Physics Enjoy all « Geometries » (Dinner at Versailles Palace Gardens) Restaurant of GSI’15 Gala Dinner André Le Nôtre Landscape Geometer of Versailles the Apex of “Le Jardin à la française” Louis XIV Patron of Science The Royal Academy of Sciences was established in 1666 On 1st September 1715, 300 years ago, Louis XIV passed away at the age of 77, having reigned for 72 years Keynote Speakers Prof. Mathilde MARCOLLI (CALTECH, USA) From Geometry and Physics to Computational Linguistics Abstact: I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic structure and language evolution, within the context of Chomsky's Principles and Parameters framework. Biography: • Laurea in Physics, University of Milano, 1993 • Master of Science, Mathematics, University of Chicago, 1994 • PhD, Mathematics, University of Chicago, 1997 • Moore Instructor, Massachusetts Institute of Technology, 19972000 • Associate Professor (C3), Max Planck Institute for Mathematics, 20002008 • Professor, California Institute of Technology, 2008present • Distinguished Visiting Research Chair, Perimeter Institute for Theoretical Physics, 2013present . Talk chaired by Daniel Bennequin Keynote Speakers Prof. Marc ARNAUDON (Bordeaux University, France) Stochastic EulerPoincaré reduction Abstact: We will prove a EulerPoincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the NavierStokes equation on a bounded domain and the CamassaHolm equation. Biography: Marc Arnaudon was born in France in 1965. He graduated from Ecole Normale Supérieure de Paris, France, in 1991. He received the PhD degree in mathematics and the Habilitation à diriger des Recherches degree from Strasbourg University, France, in January 1994 and January 1998 respectively. After postdoctoral research and teaching at Strasbourg, he began in September 1999 a full professor position in the Department of Mathematics at Poitiers University, France, where he was the head of the Probability Research Group. In January 2013 he left Poitiers and joined the Department of Mathematics of Bordeaux University, France, where he is a full professor in mathematics. Talk chaired by Frank Nielsen Keynote Speakers Prof. Tudor RATIU (EPFL, Switzerland) Symmetry methods in geometric mechanics Abstact: The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. Biography: • BA in Mathematics, University of Timisoara, Romania, 1973 • MA in Applied Mathematics, University of Timisoara, Romania, 1974 • Ph.D. in Mathematics, University of California, Berkeley, 1980 • T.H. Hildebrandt Research Assistant Professor, University of Michigan, Ann Arbor, USA 19801983 • Associate Professor of Mathematics, University of Arizona, Tuscon, USA 1983 1988 • Professor of Mathematics, University of California, Santa Cruz, USA, 19882001 • Chaired Professor of Mathematics, Ecole Polytechnique Federale de Lausanne, Switzerland, 1998  present • Professor of Mathematics, Skolkovo Institute of Science and Technonology, Moscow, Russia, 2014  present Talk chaired by Xavier Pennec Short Course Prof. Dominique SPEHNER (Grenoble University) Geometry on the set of quantum states and quantum correlations Abstact: I will show that the set of states of a quantum system with a finite dimensional Hilbert space can be equipped with various Riemannian distances having nice properties from a quantum information viewpoint, namely they are contractive under all physically allowed operations on the system. The corresponding metrics are quantum analogs of the Fisher metric and have been classified by D. Petz. Two distances are particularly relevant physically: the BogoliubovKuboMori distance studied by R. Balian, Y. Alhassid and H. Reinhardt, and the Bures distance studied by A. Uhlmann and by S.L. Braunstein and C.M. Caves. The latter gives the quantum Fisher information playing an important role in quantum metrology. A way to measure the amount of quantum correlations (entanglement or quantum discord) in bipartite systems (that is, systems composed of two parties) with the help of these distances will be also discussed. Biography: • Diplôme d'Études Approfondies (DEA) in Theoretical Physics at the École Normale Supérieure de Lyon, 1994 • Civil Service (Service National de la Coopération), Technion Institute of Technology, Haifa, Israel, 19951996 • PhD in Theoretical Physics, Université Paul Sabatier, Toulouse, France, 1996 2000. • Postdoctoral fellow, Pontificia Universidad Católica, Santiago, Chile, 20002001 • Research Associate, University of DuisburgEssen, Germany, 20012005 • Maître de Conférences, Université Joseph Fourier, Grenoble, France, 2005present • Habilitation à diriger des Recherches (HDR), Université Grenoble Alpes, 2015 • Member of the Institut Fourier (since 2005) and the Laboratoire de Physique et Modélisation des Milieux Condensés (since 2013) of the university Grenoble Alpes, France Talk chaired by Roger Balian Guest Speakers Prof. CharlesMichel MARLE (UPMC, France) Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems Abstact: I will present some tools in Symplectic and Poisson Geometry in view of their applications in Geometric Mechanics and Mathematical Physics. Lie group and Lie algebra actions on symplectic and Poisson manifolds, momentum maps and their equivariance properties, first integrals associated to symmetries of Hamiltonian systems will be discussed. Reduction methods taking advantage of symmetries will be discussed. Biography: CharlesMichel Marle was born in 1934; He studied at Ecole Polytechnique (19531955), Ecole Nationale Supérieure des Mines de Paris (19571958) and Ecole Nationale Supérieure du Pétrole et des Moteurs (19571958). He obtained a doctor's degree in Mathematics at the University of Paris in 1968. From 1959 to 1969 he worked as a research engineer at the Institut Français du Pétrole. He joined the Université de Besançon as Associate Professor in 1969, and the Université Pierre et Marie Curie, first as Associate Professor (1975) and then as full Professor (1981). His resarch works were first about fluid flows through porous media, then about Differential Geometry, Hamiltonian systems and applications in Mechanics and Mathematical Physics. Talk chaired by Frédéric Barbaresco
Frederic Lavancier, Charles Kervrann
Keywords =
Abstract
A testing procedure A model for colocalization Estimation A twocolor interacting random balls model for colocalization analysis of proteins. Frédéric Lavancier, Laboratoire de Mathématiques Jean Leray, Nantes INRIA Rennes, Serpico team Joint work with C. Kervrann (INRIA Rennes, Serpico team). GSI’15, 2830 October 2015. A testing procedure A model for colocalization Estimation Introduction : some data Vesicular traﬃcking analysis and colocalization quantiﬁcation by TIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA] ? =⇒ Langerin proteins (left) and Rab11 GTPase proteins (right). Is there colocalization ? ⇔ Is there some spatial dependencies between the two types of proteins ? A testing procedure A model for colocalization Estimation Image preprocessing After segmentation Superposition : ? ⇒ After a Gaussian weights thresholding Superposition : ? ⇒ A testing procedure A model for colocalization Estimation The problem of colocalization can be described as follows : We observe two binary images in a domain Ω : First image (green) : realization of a random set Γ1 ∩ Ω Second image (red) : realization of a random set Γ2 ∩ Ω −→ Is there some dependencies between Γ1 and Γ2 ? −→ If so, can we quantify/model this dependency ? A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A testing procedure A model for colocalization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A natural measure of departure from independency is ˆp12 − ˆp1 ˆp2 where ˆp1 = Ω−1 x∈Ω 1Γ1 (x), ˆp2 = Ω−1 x∈Ω 1Γ2 (x), ˆp12 = Ω−1 x∈Ω 1Γ1∩Γ2 (x). A testing procedure A model for colocalization Estimation Testing procedure Assume Γ1 and Γ2 are mdependent stationary random sets. If Γ1 is independent of Γ2, then as Ω tends to inﬁnity, T := Ω ˆp12 − ˆp1 ˆp2 x∈Ω y∈Ω ˆC1(x − y) ˆC2(x − y) → N(0, 1) where ˆC1 and ˆC2 are the empirical covariance functions of Γ1 ∩ Ω and Γ2 ∩ Ω respectively. Hence to test the null hypothesis of independence between Γ1 and Γ2 pvalue = 2(1 − Φ(T)) where Φ is the c.d.f. of the standard normal distribution. A testing procedure A model for colocalization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls A testing procedure A model for colocalization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls Independent case (and each color ∼ Poisson) Number of p−values < 0.05 over 100 realizations : 4. A testing procedure A model for colocalization Estimation Some simulations Dependent case (see later for the model) Number of p−values < 0.05 over 100 realizations : 100. A testing procedure A model for colocalization Estimation Some simulations Independent case, larger radii Number of p−values < 0.05 over 100 realizations : 5. A testing procedure A model for colocalization Estimation Some simulations Dependent case, larger radii and "small" dependence Number of p−values < 0.05 over 100 realizations : 97. A testing procedure A model for colocalization Estimation Real Data Depending on the preprocessing : T = 9.9 T = 17 p − value = 0 p − value = 0 A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a twotype (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. A testing procedure A model for colocalization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a twotype (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. Notation : (ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}. → viewed as a marked point, marked by R and i. xi : collection of all marked points with color i. Hence Γi = (ξ,R)i∈xi (ξ, R)i x = x1 ∪ x2 : collection of all marked points. A testing procedure A model for colocalization Estimation Example : three realizations of the reference process A testing procedure A model for colocalization Estimation The model We consider a density on any bounded domain Ω with respect to the reference model f(x) ∝ zn1 1 zn2 2 eθ Γ1∩ Γ2 where n1 : number of green balls and n2 : number of red balls. This density depends on 3 parameters z1 : rules the mean number of green balls z2 : rules the mean number of red balls θ : interaction parameter. If θ > 0 : attraction (colocalization) between Γ1 and Γ2 If θ = 0 : back to the reference model, up to the intensities (independence between Γ1 and Γ2). A testing procedure A model for colocalization Estimation Simulation Realizations can be generated by a standard birthdeath MetropolisHastings algorithm. Examples : A testing procedure A model for colocalization Estimation 1 A testing procedure 2 A model for colocalization 3 Estimation problem A testing procedure A model for colocalization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ Γ1∩ Γ2 , where c(z1, z2, θ) is the normalizing constant. A testing procedure A model for colocalization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ Γ1∩ Γ2 , where c(z1, z2, θ) is the normalizing constant. Issue : The number of balls n1 and n2 is not observed. ⇒ likelihood or pseudolikelihood based inference is not feasible. = A testing procedure A model for colocalization Estimation An equilibrium equation Consider, for any nonnegative function h, C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) and for i = 1, 2, Ii(θ; h) = Rmax Rmin Ω h((ξ, R)i, x) λ((ξ, R)i, x) 2zi dξ µ(dR). Denoting by z∗ 1 , z∗ 2 and θ∗ the true unknown values of the parameters, we know from the GeorgiiNguyenZessin equation that for any h E(C(z∗ 1 , z∗ 2 , θ∗ ; h)) = 0. A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must ﬁnd test functions hk such that S(h) is computable A testing procedure A model for colocalization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the TakacsFiksel estimator is deﬁned by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must ﬁnd test functions hk such that S(h) is computable How many ? At least K = 3 because 3 parameters to estimate. A testing procedure A model for colocalization Estimation A ﬁrst possibility : h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} where S(ξ, R) is the sphere {y, y − ξ = R}. ⇓ ⇓ ⇓ ⇓ A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = A testing procedure A model for colocalization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = ⇒ S(h1) = P(Γ1) (the perimeter of Γ1) A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). A testing procedure A model for colocalization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the TakacsFiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). Let h3((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2)c then S(h3) = P(Γ1 ∪ Γ2). A testing procedure A model for colocalization Estimation Simulations with test functions h1, h2 and h3 over 100 realizations θ = 0.2 (and small radii) θ = 0.05 (and large radii) Frequency 0.15 0.20 0.25 0.30 05101520 Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 010203040 A testing procedure A model for colocalization Estimation Real Data We assume the law of the radii is uniform on [Rmin, Rmax]. (each image is embedded in [0, 250] × [0, 280]) Rmin = 0.5, Rmax = 2.5 Rmin = 0.5, Rmax = 10 ˆθ = 0.45 ˆθ = 0.03 A testing procedure A model for colocalization Estimation Conclusion The testing procedure allows to detect colocalization between two binary images is easy and fast to implement does not depend too much on the image preprocessing The model for colocalization relies on geometric features (area of intersection) can be ﬁtted by the TakacsFiksel method allows to compare the degree of colocalization θ between two pairs of images if the laws of radii are similar
Roman Belavkin
Keywords =
Abstract
Asymmetric Topologies on Statistical Manifolds Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK GSI2015, October 28, 2015 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 1 / 16 Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 2 / 16 Sources and Consequences of Asymmetry Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 3 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances KullbackLeibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KLdivergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasipseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorﬀ T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two diﬀerent topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) KCauchy, weakly left (right) KCauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Practically all other results have to be reconsidered (e.g. Baire category theorem, AlaogluBourbaki, etc). (Cobzas, 2013). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} D∗[x, 0] = ex − 1 − x, z Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} 0 /∈ Int(dom Eq⊗p{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Method: Symmetric Sandwich Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 8 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µM◦ ≤ µ(−M◦ ) ∨ µM◦ µ(−M)co ∧ µM ≤ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µ(−M◦ )co ∧ µM◦ ≤ µM◦ µM ≤ µ(−M) ∨ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ uϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (x) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−x) ∈ ∆2 x∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(u) ∈ ∆2 ϕ−(u) = ϕ(−u) /∈ ∆2 uϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ uϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Results Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 11 / 16 Results KL Induces Hausdorﬀ (T2) Asymmetric Topology Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is Hausdorﬀ. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results KL Induces Hausdorﬀ (T2) Asymmetric Topology Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is Hausdorﬀ. Proof. u ϕ+ ≤ uϕ (resp. x ϕ− ≤ xϕ) implies (Y, · ϕ) (resp. (X, · ∗ ϕ)) is ﬁner than normed space (Y, · ϕ+) (resp. (X, · ∗ ϕ−)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · ϕ) (resp. (X, · ∗ ϕ)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · ϕ) (resp. (X, · ∗ ϕ)). Proof. ϕ+(u) = (1 + u) ln(1 + u) − u ∈ ∆2 (resp. ϕ∗ −(x) = e−x − 1 + x ∈ ∆2). Note that ϕ− /∈ ∆2 and ϕ∗ + /∈ ∆2. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. 3 Right Ksequentially complete: right KCauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · ϕ) (resp. (X, · ∗ ϕ)) is 1 BiComplete: ρsCauchy yn ρs → y. 2 ρsequentially complete: ρsCauchy yn ρ → y. 3 Right Ksequentially complete: right KCauchy yn ρ → y. Proof. ρs(y, z) = z − yϕ ∨ y − zϕ ≤ y − z ϕ−, where (Y, · ϕ−) is Banach. Then use theorems of Reilly et al. (1982) and Chen et al. (2007). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be reexamined. We have proved that topologies induced by the KLdivergence are: Hausdorﬀ. Bicomplete, ρsequentially complete and right Ksequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Other asymmetric information distances (e.g. Renyi divergence). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 References Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16 Results Borodin, P. A. (2001). The BanachMazur theorem for spaces with asymmetric norm. Mathematical Notes, 69(3–4), 298–305. Chen, S.A., Li, W., Zou, D., & Chen, S.B. (2007, Aug). Fixed point theorems in quasimetric spaces. In Machine learning and cybernetics, 2007 international conference on (Vol. 5, p. 24992504). IEEE. Cobzas, S. (2013). Functional analysis in asymmetric normed spaces. Birkh¨auser. Fletcher, P., & Lindgren, W. F. (1982). Quasiuniform spaces (Vol. 77). New York: Marcel Dekker. Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982). Cauchy sequences in quasipseudometric spaces. Monatshefte f¨ur Mathematik, 93, 127–140. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16
Pierre Calka
Keywords =
Abstract
Asymptotic properties of random polytopes Pierre Calka 2nd conference on Geometric Science of Information ´Ecole Polytechnique, ParisSaclay, 28 October 2015 default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Joint work with Joseph Yukich (Lehigh University, USA) & Tomasz Schreiber (Toru´n University, Poland) default Outline Random polytopes: an overview Uniform polytopes Gaussian polytopes Expectation asymptotics Main results: variance asymptotics Sketch of proof: Gaussian case default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K50, K ball K50, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K100, K ball K100, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K500, K ball K500, K square default Uniform polytopes Poissonian model K := convex body of Rd Pλ, λ > 0:= Poisson point process of intensity measure λdx Kλ := Conv(Pλ ∩ K) K500, K ball K500, K square default Gaussian polytopes Binomial model Φd (x) := 1 (2π)d/2 e− x 2/2, x ∈ Rd, d ≥ 2 (Xk, k ∈ N∗):= independent and with density Φd Kn := Conv(X1, · · · , Xn) Poissonian model Pλ, λ > 0:= Poisson point process of intensity measure λΦd(x)dx Kλ := Conv(Pλ) default Gaussian polytopes K50 K100 K500 default Gaussian polytopes: spherical shape K50 K100 K500 default Asymptotic spherical shape of the Gaussian polytope Geﬀroy (1961) : dH(Kn, B(0, 2 log(n))) → n→∞ 0 a.s. K50000 default Expectation asymptotics Considered functionals fk(·) := number of kdimensional faces, 0 ≤ k ≤ d Vol(·) := volume B. Efron’s relation (1965): Ef0(Kn) = n 1 − EVol(Kn−1) Vol(K) Uniform polytope, K smooth E[fk(Kλ)] ∼ λ→∞ cd,k ∂K κ 1 d+1 s ds λ d−1 d+1 κs := Gaussian curvature of ∂K Uniform polytope, K polytope E[fk(Kλ)] ∼ λ→∞ c′ d,kF(K) logd−1 (λ) F(K) := number of ﬂags of K Gaussian polytope E[fk(Kλ)] ∼ λ→∞ c′′ d,k log d−1 2 (λ) A. R´enyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Aﬀentranger & R. Schneider (1992) default Outline Random polytopes: an overview Main results: variance asymptotics Uniform model, K smooth Uniform model, K polytope Gaussian model Sketch of proof: Gaussian case default Uniform model, K smooth K := convex body of Rd with volume 1 and with a C3 boundary κ := Gaussian curvature of ∂K lim λ→∞ λ−(d−1)/(d+1) Var[fk(Kλ)] = ck,d ∂K κ(z)1/(d+1) dz lim λ→∞ λ(d+3)/(d+1) Var [Vol(Kλ)] = c′ d ∂K κ(z)1/(d+1) dz (ck,d , c′ d explicit positive constants) M. Reitzner (2005): Var[fk (Kλ)] = Θ(λ(d−1)/(d+1) ) default Uniform model, K polytope K := simple polytope of Rd with volume 1 i.e. each vertex of K is included in exactly d facets. lim λ→∞ log−(d−1) (λ)Var[fk(Kλ)] = cd,kf0(K) lim λ→∞ λ2 log−(d−1) (λ)Var[Vol(Kλ)] = c′ d,k f0(K) (ck,d , c′ k,d explicit positive constants) I. B´ar´any & M. Reitzner (2010): Var[fk (Kλ)] = Θ(log(d−1) (λ)) default Gaussian model lim λ→∞ log− d−1 2 (λ)Var[fk(Kλ)] = ck,d lim λ→∞ log−k+ d+3 2 (λ)Var[Vol(Kλ)] = c′ k,d E Vol(Kλ) Vol(B(0, 2 log(n))) = λ→∞ 1 − d log(log(λ)) 4 log(λ) + O 1 log(λ) (ck,d , c′ k,d explicit positive constants) D. Hug & M. Reitzner (2005), I. B´ar´any & V. Vu (2007): Var[fk (Kλ)] = Θ(log(d−1)/2 (λ)) default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Calculation of the expectation of fk(Kλ) Calculation of the variance of fk(Kλ) Scaling transform default Calculation of the expectation of fk(Kλ) 1. Decomposition: E[fk(Kλ)] = E x∈Pλ ξ(x, Pλ) ξ(x, Pλ) := 1 k+1 #kface containing x if x extreme 0 if not 2. MeckeSlivnyak formula E[fk(Kλ)] = λ E[ξ(x, Pλ ∪ {x})]Φd (x)dx 3. Limit of the expectation of one score default Calculation of the variance of fk(Kλ) Var[fk (Kλ)] = E x∈Pλ ξ2 (x, Pλ) + x=y∈Pλ ξ(x, Pλ)ξ(y, Pλ) − (E[fk (Kλ)]) 2 = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 E[ξ(x, Pλ ∪ {x, y})ξ(y, Pλ ∪ {x, y})]Φd (x)Φd (y)dxdy − λ2 E[ξ(x, Pλ ∪ {x})]E[ξ(y, Pλ ∪ {y})]Φd (x)Φd (y)dxdy = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 ”Cov”(ξ(x, Pλ ∪ {x}), ξ(y, Pλ ∪ {y}))Φd (x)Φd (y)dxdy default Scaling transform Question : Limits of E[ξ(x, Pλ)] and ”Cov”(ξ(x, Pλ), ξ(y, Pλ)) ? Answer : deﬁnition of limit scores in a new space ◮ Critical radius Rλ := 2 log λ − log(2 · (2π)d · log λ) ◮ Scaling transform : Tλ : Rd \ {0} −→ Rd−1 × R x −→ Rλ exp−1 d−1 x x, R2 λ(1 − x Rλ ) expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1 ◮ Image of a score : ξ(λ)(Tλ(x), Tλ(Pλ)) := ξ(x, Pλ) ◮ Convergence of Pλ : Tλ(Pλ) D → P o`u P : Poisson point process in Rd−1 × R of intensity measure ehdvdh default Action of the scaling transform Π↑ := {(v, h) ∈ Rd−1 × R : h ≥ v 2 2 } Π↓ := {(v, h) ∈ Rd−1 × R : h ≤ − v 2 2 } Halfspace Translate of Π↓ Sphere containing O Translate of ∂Π↑ Convexity Parabolic convexity Extreme point (x + Π↑) not fully covered kface of Kλ Parabolic kface RλVol Vol default Limiting picture Ψ := x∈P(x + Π↑) In red : image of the balls of diameter [0, x] where x is extreme default Limiting picture Φ := x∈Rd−1×R:x+Π↓∩P=∅(x + Π↓) In green : image of the boundary of the convex hull Kλ default Thank you for your attention!
Laurent Decreusefond, Aurélien Vasseur
Keywords = Ginibre point process, Poisson point process, Stein’s method, Stochastic geometry, βGinibre point process
Abstract
IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications 2nd conference on Geometric Science of Information Aurélien VASSEUR Asymptotics of some Point Processes Transformations Ecole Polytechnique, ParisSaclay, October 28, 2015 1/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Mobile network in Paris  Motivation −2000 0 2000 4000 100020003000 −2000 0 2000 4000 100020003000 Figure: On the left, positions of all BS in Paris. On the right, locations of BS for one frequency band. 2/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Table of Contents IGeneralities on point processes Correlation function, Papangelou intensity and repulsiveness Determinantal point processes IIKantorovichRubinstein distance Convergence dened by dKR dKR(PPP, Φ) ≤ "nice" upper bound IIIApplications to transformations of point processes Superposition Thinning Rescaling 3/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Framework Y a locally compact metric space µ a diuse and locally nite measure of reference on Y NY the space of congurations on Y NY the space of nite congurations on Y 4/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Correlation function  Papangelou intensity Correlation function ρ of a point process Φ: E[ α∈NY α⊂Φ f (α)] = +∞ k=0 1 k! ˆ Yk f · ρ({x1, . . . , xk})µ(dx1) . . . µ(dxk) ρ(α) ≈ probability of nding a point in at least each point of α Papangelou intensity c of a point process Φ: E[ x∈Φ f (x, Φ \ {x})] = ˆ Y E[c(x, Φ)f (x, Φ)]µ(dx) c(x, ξ) ≈ conditionnal probability of nding a point in x given ξ 5/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Point process Properties Intensity measure: A ∈ FY → ´ A ρ({x})µ(dx) ρ({x}) = E[c(x, Φ)] If Φ is nite, then: IP(Φ = 1) = ˆ Y c(x, ∅)µ(dx) IP(Φ = 0). 6/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Poisson point process Properties Φ PPP with intensity M(dy) = m(y)dy Correlation function: ρ(α) = x∈α m(x) Papangelou intensity: c(x, ξ) = m(x) 7/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Repulsive point process Denition Point process repulsive if φ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ) Point process weakly repulsive if c(x, ξ) ≤ c(x, ∅) 8/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Determinantal point process Denition Determinantal point process DPP(K, µ): ρ({x1, · · · , xk}) = det(K(xi , xj ), 1 ≤ i, j ≤ k) Proposition Papangelou intensity of DPP(K, µ): c(x0, {x1, · · · , xk}) = det(J(xi , xj ), 0 ≤ i, j ≤ k) det(J(xi , xj ), 1 ≤ i, j ≤ k) where J = (I − K)−1K. 9/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process Ginibre point process Denition Ginibre point process on B(0, R): K(x, y) = 1 π e−1 2 (x2 +y2 ) exy 1{x∈B(0,R)}1{y∈B(0,R)} βGinibre point process on B(0, R): Kβ(x, y) = 1 π e − 1 2β (x2 +y2 ) e 1 β xy 1{x∈B(0,R)} 1{y∈B(0,R)} 10/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Framework Determinantal point process βGinibre point processes 11/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications KantorovichRubinstein distance Total variation distance: dTV(ν1, ν2) := sup A∈FY ν1(A),ν2(A)<∞ ν1(A) − ν2(A) F : NY → IR is 1Lipschitz (F ∈ Lip1) if F(φ1) − F(φ2) ≤ dTV (φ1, φ2) for all φ1, φ2 ∈ NY KantorovichRubinstein distance: dKR(IP1, IP2) = sup F∈Lip1 ˆ NY F(φ) IP1(dφ) − ˆ NY F(φ) IP2(dφ) Convergence in K.R. distance =⇒ strictly Convergence in law 12/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Upper bound theorem Theorem (L. Decreusefond, AV) Φ a nite point process on Y ζM a PPP with nite control measure M(dy) = m(y)µ(dy). Then, we have: dKR(IPΦ, IPζM ) ≤ ˆ Y ˆ NY m(y) − c(y, φ)IPΦ(dφ)µ(dy). 13/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Superposition of weakly repulsive point processes Φn,1, . . . , Φn,n: n independent, nite and weakly repulsive point processes on Y Φn := n i=1 Φn,i Rn := ´ Y  n i=1 ρn,i (x) − m(x)µ(dx) ζM a PPP with control measure M(dx) = m(x)µ(dx) 14/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Superposition of weakly repulsive point processes Proposition (LD, AV) Φn = n i=1 Φn,i ζM a PPP with control measure M(dx) = m(x)µ(dx) dKR(IPΦn , IPζM ) ≤ Rn + max 1≤i≤n ˆ Y ρn,i (x)µ(dx) 15/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Consequence Corollary (LD, AV) f pdf on [0; 1] such that f (0+) := limx→0+ f (x) ∈ IR Λ compact subset of IR+ X1, . . . , Xn i.i.d. with pdf fn = 1 n f (1 n ·) Φn = {X1, . . . , Xn} ∩ Λ dKR(Φn, ζ) ≤ ˆ Λ f 1 n x − f (0+) dx + 1 n ˆ Λ f 1 n x dx where ζ is the PPP(f (0+)) reduced to Λ. 16/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning βGinibre point processes Proposition (LD, AV) Φn the βnGinibre process reduced to a compact set Λ ζ the PPP with intensity 1/π on Λ dKR(IPΦn , IPζ) ≤ Cβn 17/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Kallenberg's theorem Theorem (O. Kallenberg) Φn a nite point process on Y pn : Y → [0; 1) uniformly −−−−−→ 0 Φn the pnthinning of Φn γM a Cox process (pnΦn) law −−→ M ⇐⇒ (Φn) law −−→ γM 18/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Polish distance (fn) a sequence in the space of real continuous functions with compact support generating FY d∗(ν1, ν2) = n≥1 1 2n Ψ(ν1(fn) − ν2(fn)) with Ψ(x) = x 1 + x d∗ KR the KantorovichRubinstein distance associated to the distance d∗ 19/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning Thinned point processes Proposition (LD, AV) Φn a nite point process on Y pn : Y → [0; 1) Φn the pnthinning of Φn γM a Cox process Then, we have: d∗ KR(IPΦn , IPγM ) ≤ 2E[ x∈Φn p2 n(x)] + d∗ KR(IPM, IPpnΦn ). 20/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Application to superposition Application to βGinibre point processes Application to thinning References L.Decreusefond, and A.Vasseur, Asymptotics of superposition of point processes, 2015. H.O. Georgii, and H.J. Yoo, Conditional intensity and gibbsianness of determinantal point processes, J. Statist. Phys. (118), January 2004. J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P. Martins, and Wei Chen, A Case Study on Regularity in Cellular Network Deployment, IEEE Wireless Communications Letters, 2015. A.F. Karr, Point Processes and their Statistical Inference, Ann. Probab. 15 (1987), no. 3, 12261227. 21/22 Aurélien VASSEUR Télécom ParisTech IGeneralities on point processes IIKantorovichRubinstein distance IIIApplications Thank you ... ... for your attention. Questions? 22/22 Aurélien VASSEUR Télécom ParisTech
Nicolas Chenavier
Keywords = Extreme values, Poisson point process, Random tessellations
Abstract
Random tessellations Main problem Extremal index The extremal index for a random tessellation Nicolas Chenavier Université Littoral Côte d’Opale October 28, 2015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Plan 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Random tessellations Deﬁnition A (convex) random tessellation m in Rd is a partition of the Euclidean space into random polytopes (called cells). We will only consider the particular case where m is a : PoissonVoronoi tessellation ; PoissonDelaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index PoissonVoronoi tessellation X, Poisson point process in Rd ; ∀x ∈ X, CX(x) := {y ∈ Rd , y − x ≤ y − x , x ∈ X} (Voronoi cell with nucleus x) ; mPVT := {CX(x), x ∈ X}, PoissonVoronoi tessellation ; ∀CX(x) ∈ mPVT , we let z(CX(x)) := x. x CX(x) Mosaique de PoissonVoronoi Figure: PoissonVoronoi tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index PoissonDelaunay tessellation X, Poisson point process in Rd ; ∀x, x ∈ X, x and x deﬁne an edge if CX(x) ∩ CX(x ) = ∅ ; mPDT , PoissonDelaunay tessellation ; ∀C ∈ mPDT , we let z(C) as the circumcenter of C. x x z(C) Mosaique de PoissonDelaunay Figure: PoissonDelaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Typical cell Deﬁnition Let m be a stationary random tessellation. The typical cell of m is a random polytope C in Rd which distribution given as follows : for each bounded translationinvariant function g : {polytopes} → R, we have E [g(C)] := 1 N(B) E C∈m, z(C)∈B g(C) , where : B ⊂ R is any Borel subset with ﬁnite and nonempty volume ; N(B) is the mean number of cells with nucleus in B. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Main problem Framework : m = mPVT , mPDT ; Wρ := [0, ρ]d , with ρ > 0 ; g : {polytopes} → R, geometrical characteristic. Aim : asymptotic behaviour, when ρ → ∞, of Mg,ρ = max C∈m, z(C)∈Wρ g(C)? Figure: Voronoi cell maximizing the area in the square. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Objective and applications Objective : ﬁnd ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρt + bg,ρ converges, as ρ → ∞, for each t ∈ R. Applications : regularity of the tessellation ; discrimination of point processes and tessellations ; PoissonVoronoi approximation. Approximation de PoissonVoronoi Figure: PoissonVoronoi approximation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Asymptotics under a local correlation condition Notation : let vρ := ag,ρt + bρ be a threshold such that ρd · P (g(C) > vρ) −→ ρ→∞ τ, for some τ := τ(t) ≥ 0. Local Correlation Condition (LCC) ρd (log ρ)d · E (C1,C2)=∈m2, z(C1),z(C2)∈[0,log ρ]d 1g(C1)>vρ,g(C2)>vρ −→ ρ→∞ 0. Theorem Under (LCC), we have : P (Mg,ρ ≤ vρ) −→ ρ→∞ e−τ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Deﬁnition of the extremal index Proposition Assume that for all τ ≥ 0, there exists a threshold v (τ) ρ depending on ρ such that ρd · P(g(C) > v (τ) ρ ) −→ ρ→∞ τ. Then there exists θ ∈ [0, 1] such that, for all τ ≥ 0, lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ , provided that the limit exists. Deﬁnition According to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if, for each τ ≥ 0, we have : ρd · P g(C) > v(τ) ρ −→ ρ→∞ τ and lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 1 Framework : m := mPVT : PoissonVoronoi tessellation ; g(C) := r(C) : inradius of any cell C := CX(x) with x ∈ X, i.e. r(C) := r (CX(x)) := max{r ∈ R+ : B(x, r) ⊂ CX(x)}. rmin,PVT (ρ) := minx∈X∩Wρ r (CX(x)). Extremal index : θ = 1/2 for each d ≥ 1. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Minimum of inradius for a PoissonVoronoi tessellation (b) Typical Poisson−Voronoï cell with a small inradii x y −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 2 Framework : m := mPDT : PoissonDelaunay tessellation ; g(C) := R(C) : circumradius of any cell C, i.e. R(C) := min{r ∈ R+ : B(x, r) ⊃ C}. Rmax,PDT (ρ) := maxC∈mPDT :z(C)∈Wρ R(C). Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Maximum of circumradius for a PoissonDelaunay tessellation (d) Typical Poisson−Delaunay cell with a large circumradii x y −15 −10 −5 0 5 10 15 −15−10−5051015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Work in progress Joint work with C. Robert (ISFA, Lyon 1) : new characterization of the extremal index (not based on classical block and run estimators appearing in the classical Extreme Value Theory) ; simulation and estimation for the extremal index and cluster size distribution (for PoissonVoronoi and PoissonDelaunay tessellations). Nicolas Chenavier The extremal index for a random tessellation
Frank Nielsen, Gaëtan Hadjeres
Keywords =
Abstract
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen1 Ga¨etan Hadjeres2 ´Ecole Polytechnique 1 Sony Computer Science Laboratories, Inc 1,2 Conference on Geometric Science of Information c 2015 Frank Nielsen  Ga¨etan Hadjeres 1 The Minimum Enclosing Ball problem Finding the Minimum Enclosing Ball (or the 1center) of a ﬁnite point set P = {p1, . . . , pn} in the metric space (X, dX (., .)) consists in ﬁnding c ∈ X such that c = argminc ∈X max p∈P dX (c , p) Figure : A ﬁnite point set P and its minimum enclosing ball MEB(P) c 2015 Frank Nielsen  Ga¨etan Hadjeres 2 The approximating minimum enclosing ball problem In a euclidean setting, this problem is welldeﬁned: uniqueness of the center c∗ and radius R∗ of the MEB computationally intractable in high dimensions. We ﬁx an > 0 and focus on the Approximate Minimum Enclosing Ball problem of ﬁnding an approximation c ∈ X of MEB(P) such that dX (c, p) ≤ (1 + )R∗ ∀p ∈ P. c 2015 Frank Nielsen  Ga¨etan Hadjeres 3 The approximating minimum enclosing ball problem: prior work Approximate solution in the euclidean case are given by Badoiu and Clarkson’s algorithm [Badoiu and Clarkson, 2008]: Initialize center c1 ∈ P Repeat 1/ 2 times the following update: ci+1 = ci + fi − ci i + 1 where fi ∈ P is the farthest point from ci . How to deal with point sets whose underlying geometry is not euclidean ? c 2015 Frank Nielsen  Ga¨etan Hadjeres 4 The approximating minimum enclosing ball problem: prior work This algorithm has been generalized to dually ﬂat manifolds [Nock and Nielsen, 2005] Riemannian manifolds [Arnaudon and Nielsen, 2013] Applying these results to hyperbolic geometry give the existence and uniqueness of MEB(P), but give no explicit bounds on the number of iterations assume that we are able to precisely cut geodesics. c 2015 Frank Nielsen  Ga¨etan Hadjeres 5 The approximating minimum enclosing ball problem: our contribution We analyze the case of point sets whose underlying geometry is hyperbolic. Using a closedform formula to compute geodesic αmidpoints, we obtain a intrinsic (1 + )approximation algorithm to the approximate minimum enclosing ball problem a O(1/ 2) convergence time guarantee a oneclass clustering algorithm for speciﬁc subfamilies of normal distributions using their Fisher information metric c 2015 Frank Nielsen  Ga¨etan Hadjeres 6 Model of ddimensional hyperbolic geometry: The Poincar´e ball model The Poincar´e ball model (Bd , ρ(., .)) consists in the open unit ball Bd = {x ∈ Rd : x < 1} together with the hyperbolic distance ρ (p, q) = arcosh 1 + 2 p − q 2 (1 − p 2) (1 − q 2) , ∀p, q ∈ Bd . This distance induces on the metric space (Bd , ρ) a Riemannian structure. c 2015 Frank Nielsen  Ga¨etan Hadjeres 7 Geodesics in the Poincar´e ball model Shorter paths between two points (geodesics) are exactly straight (euclidean) lines passing through the origin circle arcs orthogonal to the unit sphere Figure : “Straight” lines in the Poincar´e ball model c 2015 Frank Nielsen  Ga¨etan Hadjeres 8 Circles in the Poincar´e ball model Circles in the Poincar´e ball model look like euclidean circles but with diﬀerent center Figure : Diﬀerence between euclidean MEB (in blue) and hyperbolic MEB (in red) for the set of blue points in hyperbolic Poincar´e disk (in black). The red cross is the hyperbolic center of the red circle while the pink one is its euclidean center. c 2015 Frank Nielsen  Ga¨etan Hadjeres 9 Translations in the Poincar´e ball model Tp (x) = 1 − p 2 x + x 2 + 2 x, p + 1 p p 2 x 2 + 2 x, p + 1 Figure : Tiling of the hyperbolic plane by squares c 2015 Frank Nielsen  Ga¨etan Hadjeres 10 Closedform formula for computing αmidpoints A point m is the αmidpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m veriﬁes ρ (p, mα) = αρ (p, q) . c 2015 Frank Nielsen  Ga¨etan Hadjeres 11 Closedform formula for computing αmidpoints A point m is the αmidpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m veriﬁes ρ (p, mα) = αρ (p, q) . For the special case p = (0, . . . , 0), q = (xq, 0, . . . , 0), we have p#αq := (xα, 0, . . . , 0) with xα = cα,q − 1 cα,q + 1 , where cα,q := eαρ(p,q) = 1 + xq 1 − xq α . c 2015 Frank Nielsen  Ga¨etan Hadjeres 11 Closedform formula for computing αmidpoints Noting that p#αq = Tp (T−p (p) #αT−p (q)) ∀p, q ∈ Bd we obtain a closedform formula for computing p#αq how to compute p#αq in linear time O(d) that these transformations are exact. c 2015 Frank Nielsen  Ga¨etan Hadjeres 12 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius For a ﬁxed radius r > R∗, we can ﬁnd c ∈ Bd such that ρ (c, P) ≤ (1 + )r ∀p ∈ P with Algorithm 1: (1 + )approximation of EHB(P, r) 1: c0 := p1 2: t := 0 3: while ∃p ∈ P such that p /∈ B (ct, (1 + ) r) do 4: let p ∈ P be such a point 5: α := ρ(ct ,p)−r ρ(ct ,p) 6: ct+1 := ct#αp 7: t := t+1 8: end while 9: return ct c 2015 Frank Nielsen  Ga¨etan Hadjeres 13 Idea of the proof By the hyperbolic law of cosines : ch (ρt) ≥ ch (h) ch (ρt+1) ch (ρ1) ≥ ch (h)T ≥ ch ( r)T . ct+1 ct c∗ pt h > r ρt+1 ρt r ≤ rr θ θ Figure : Update of ct c 2015 Frank Nielsen  Ga¨etan Hadjeres 14 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius The EHB(P, r) algorithm is a O(1/ 2)time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. c 2015 Frank Nielsen  Ga¨etan Hadjeres 15 (1+ )approximation of an hyperbolic enclosing ball of ﬁxed radius The EHB(P, r) algorithm is a O(1/ 2)time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. Our error with the true MEHB center c∗ veriﬁes ρ (c, c∗ ) ≤ arcosh ch ((1 + ) r) ch (R∗) c 2015 Frank Nielsen  Ga¨etan Hadjeres 15 (1 + + 2 /4)approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. c 2015 Frank Nielsen  Ga¨etan Hadjeres 16 (1 + + 2 /4)approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. This suggests to implement a dichotomic search in order to compute an approximation of the minimal hyperbolic enclosing ball. We obtain a O(1 + + 2/4)approximation of MEHB(P) in O N 2 log 1 iterations. c 2015 Frank Nielsen  Ga¨etan Hadjeres 16 (1 + + 2 /4)approximation of MEHB(P) algorithm Algorithm 2: (1 + )approximation of MEHB(P) 1: c := p1 2: rmax := ρ (c, P); rmin = rmax 2 ; tmax := +∞ 3: r := rmax; 4: repeat 5: ctemp := Alg1 P, r, 2 , interrupt if t > tmax in Alg1 6: if call of Alg1 has been interrupted then 7: rmin := r 8: else 9: rmax := r ; c := ctemp 10: end if 11: dr := rmax−rmin 2 ; r := rmin + dr ; tmax := log(ch(1+ /2)r)−log(ch(rmin)) log(ch(r /2)) 12: until 2dr < rmin 2 13: return c c 2015 Frank Nielsen  Ga¨etan Hadjeres 17 Experimental results The number of iterations does not depend on d. Figure : Number of αmidpoint calculations as a function of in logarithmic scale for diﬀerent values of d. c 2015 Frank Nielsen  Ga¨etan Hadjeres 18 Experimental results The running time is approximately O(dn 2 ) (vertical translation in logarithmic scale). Figure : execution time as a function of in logarithmic scale for diﬀerent values of d. c 2015 Frank Nielsen  Ga¨etan Hadjeres 19 Applications Hyperbolic geometry arises when considering certain subfamilies of multivariate normal distributions. For instance, the following subfamilies N µ, σ2In of nvariate normal distributions with scalar covariance matrix (In is the n × n identity matrix), N µ, diag σ2 1, . . . , σ2 n of nvariate normal distributions with diagonal covariance matrix N(µ0, Σ) of dvariate normal distributions with ﬁxed mean µ0 and arbitrary positive deﬁnite covariance matrix Σ are statistical manifolds whose Fisher information metric is hyperbolic. c 2015 Frank Nielsen  Ga¨etan Hadjeres 20 Applications In particular, our results apply to the twodimensional locationscale subfamily: Figure : MEHB (D) of probability density functions (left) in the (µ, σ) superior halfplane (right). P = {A, B, C}. c 2015 Frank Nielsen  Ga¨etan Hadjeres 21 Openings Plugging the EHB and MEHB algorithms to compute clusters centers in the approximation algorithm by [Gonzalez, 1985], we obtain approximate algorithms for covering in hyperbolic spaces the kcenter problem in O kNd 2 log 1 c 2015 Frank Nielsen  Ga¨etan Hadjeres 22 Algorithm 3: Gonzalez farthestﬁrst traversal approximation algo rithm 1: C1 := P, i = 0 2: while i ≤ k do 3: ∀j ≤ i, compute cj := MEB(Cj ) 4: ∀j ≤ i, set fj := argmaxp∈P ρ(p, cj ) 5: ﬁnd f ∈ {fj } whose distance to its cluster center is maximal 6: create cluster Ci containing f 7: add to Ci all points whose distance to f is inferior to the distance to their cluster center 8: increment i 9: end while 10: return {Ci }i c 2015 Frank Nielsen  Ga¨etan Hadjeres 23 Openings The computation of the minimum enclosing hyperbolic ball does not necessarily involve all points p ∈ P. Coresets in hyperbolic geometry the MEHB obtained by the algorithm is an coreset diﬀerences with the euclidean setting: coresets are of size at most 1/ [Badoiu and Clarkson, 2008] c 2015 Frank Nielsen  Ga¨etan Hadjeres 24 Thank you! c 2015 Frank Nielsen  Ga¨etan Hadjeres 25 Bibliography I Arnaudon, M. and Nielsen, F. (2013). On approximating the Riemannian 1center. Computational Geometry, 46(1):93–104. Badoiu, M. and Clarkson, K. L. (2008). Optimal coresets for balls. Comput. Geom., 40(1):14–22. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306. Nock, R. and Nielsen, F. (2005). Fitting the smallest enclosing Bregman ball. In Machine Learning: ECML 2005, pages 649–656. Springer. c 2015 Frank Nielsen  Ga¨etan Hadjeres 26
Germain Van Bever, Radka Sabolova, Frank Critchley, Paul Marriott
Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 2830 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of afﬁne connections { −1 , +1 } (mixture and exponential connections). These allow to deﬁne notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two afﬁne structures (maps on M) are used simultaneously: 1: Mixture afﬁne geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential afﬁne geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifoldbased systems to simplexbased geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efﬁcient algorithms in a computational framework. It is inherently ﬁnite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in 1space π1 π2 6 4 2 0 2 4 6 6420246 mixed geodesics in +1space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the loglikelihood function. The loglikelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an inﬁnitedimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {XX ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2simplex. Right: Embedded binomial (k = 3) in the 3simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefﬁcient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates wellknown difﬁculties: Identiﬁability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore deﬁned on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efﬁcient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows lowdimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a ndim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with lowdimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efﬁcient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple oneparameter binomial [...] models generally provides poor ﬁts to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Deﬁne the norm ππ0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Deﬁne the distance function as d(π(θ), π0) := inf π∈∪Si π(θ) − ππ0 . Theorem (AnayaIzquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N(ˆπG − ˆπNP MLE )ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary Highdimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, SI (1985), Differentialgeometrical methods in statistics, SpringerVerlag. AnayaIzquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 11971224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 24542471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 6976. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 7793. Germain Van Bever CIG for mixtures 19/19
Vahed Maroufy, Paul Marriott
Keywords = Computational information geometry, Computing boundaries, Embedded manifolds, Local mixture models, Polytopes, Ruled and developable surfaces
Abstract
Computing Boundaries in Local Mixture Models Computing Boundaries in Local Mixture Models Vahed Maroufy & Paul Marriott Department of Statistics and Actuarial Science University of Waterloo October 28 GSI 2015, Paris Computing Boundaries in Local Mixture Models Outline Outline 1 Inﬂuence of boundaries on parameter inference 2 Local mixture models (LMM) 3 Parameter space and boundaries Hard boundaries and Soft boundaries 4 Computing the boundaries for LMMs 5 Summary and future direction Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary inﬂuence When boundary exits: MLE does not exist =⇒ ﬁnd the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, loglinear and graphical models Geyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Deﬁnition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties AnayaIzquierdo and Marriott (2007) g is identiﬁable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a ﬁxed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Deﬁnition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties AnayaIzquierdo and Marriott (2007) g is identiﬁable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a ﬁxed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation The costs for all these good properties and ﬂexibility are Hard boundary =⇒ Positivity (boundary of Λµ) Soft boundary =⇒ Mixture behavior We compute them for two models here: Poisson and Normal We ﬁx k = 4 Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ  1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of halfspaces so convex Hard boundary is constructed by a set of (hyper)planes Soft boundary Deﬁnition For a density function f (x; µ) with k ﬁnite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M deﬁne C = convhull{Mr (f )µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ  1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of halfspaces so convex Hard boundary is constructed by a set of (hyper)planes Soft boundary Deﬁnition For a density function f (x; µ) with k ﬁnite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M deﬁne C = convhull{Mr (f )µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ  A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a ﬁnite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ  A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a ﬁnite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Normal model let y = x−µ σ2 Λµ = {λ  (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}. We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Ruled and developable surfaces Deﬁnition Ruled surface: Γ(x, γ) = α(x) + γ · β(x), x ∈ I ⊂ R, γ ∈ Rk Developable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Deﬁnition The family of planes, A = {λ ∈ R3  a(x) · λ + d(x) = 0, x ∈ R}, each determined by an x ∈ R, is called a oneparameter inﬁnite family of planes. Each element of the set {λ ∈ R3 a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R} is called a characteristic line of the surface at x and the union is called the envelope of the family. A characteristic line is the intersection of two consecutive planes The envelope is a developable surface Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Hard boundary of for Normal LMM (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 = 0, ∀y ∈ R . λ2 λ3 λ4 λ4 λ3 λ2 Figure : Left: The hard boundary for the normal LMM (shaded) as a subset of a self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Soft boundary of for Normal LMM recap : Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). For visualization purposes let k = 3, (µ ∈ M, ﬁx σ) M3(f ) = (µ, µ2 + σ2 , µ3 + 3µσ2 ), M3(g) = (µ, µ2 + σ2 + 2λ2, µ3 + 3µσ2 + 6µλ2 + 6λ3). Figure : the 3D curve ϕ(µ); Middle: the bounding ruled surface γa(µ, u); Right: the convex subspace restricted to soft boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Ruled surface parametrization Two boundary surfaces, each constructed by a curve and a set of lines attached to it. γa(µ, u) = ϕ(µ) + u La(µ) γb(µ, u) = ϕ(µ) + u Lb(µ) where for M = [a, b] and ϕ(µ) = M3(f ) La(µ): lines between ϕ(a) and ϕ(µ) Lb(µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture Models Summary Summary Understanding these boundaries is important if we want to exploit the nice statistical properties of LMM The boundaries described in this paper have both discrete aspects and smooth aspects The two example discussed represent the structure for almost all exponential family models It is a interesting problem to design optimization algorithms on these boundaries for ﬁnding boundary maximizers of likelihood Computing Boundaries in Local Mixture Models References AnayaIzquierdo, K., Critchley, F., and Marriott, P. (2013). when are ﬁrst order asymptotics adequate? a diagnostic. Stat, 3(1):17–22. AnayaIzquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640. Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, rnt078. Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20. Boroczky, K. and Fodor, F. (2008). Approximating 3dimensional convex bodies by polytopes with a restricted number of edges. Contributions to Algebra and Geometry, 49(1):177–193. Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272. Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal of Statistics, 3:259–289. Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of Diﬀerential Geometry, 57(2):239–271. Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492. Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93. Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture Models END Thank You
Emmanuel Kalunga, Sylvain Chevallier, Quentin Barthélemy, Karim Djouani, Yskandar Hamam, Eric Monacelli
Keywords = Brain Computer, Information geometry, Interfaces, Riemannian means, Steady State, Visually Evoked Potentials
Abstract
From Euclidean to Riemannian Means: Information Geometry for SSVEP Classiﬁcation Emmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al. F’SATI  Tshawne University of Technology (South Africa) LISV  Université de Versailles SaintQuentin (France) Mensia Technologies (France) sylvain.chevallier@uvsq.fr 28 October 2015 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Cerebral interfaces Context Rehabilitation and disability compensation ) Outofthelab solutions ) Open to a wider population Problem Intrasubject variabilities ) Online methods, adaptative algorithms Intersubject variabilities ) Good generalization, fast convergence Opportunities New generation of BCI (Congedo & Barachant) • Growing interest in EEG community • Large community, available datasets • Challenging situations and problems S. Chevallier 28/10/2015 GSI 2 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Outline BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances S. Chevallier 28/10/2015 GSI 3 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction based on brain activity BrainComputer Interface (BCI) for nonmuscular communication • Medical applications • Possible applications for wider population Recording at what scale ? • Neuron !LFP • Neuronal group !ECoG !SEEG • Brain !EEG !MEG !IRMf !TEP S. Chevallier 28/10/2015 GSI 4 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback S. Chevallier 28/10/2015 GSI 5 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Electroencephalography Most BCI rely on EEG ) Eﬃcient to capture brain waves • Lightweight system • Low cost • Mature technologies • High temporal resolution • No trepanation S. Chevallier 28/10/2015 GSI 6 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Origins of EEG • Local ﬁeld potentials • Electric potential diﬀerence between dendrite and soma • Maxwell’s equation • Quasistatic approximation • Volume conduction eﬀect • Sensitive to conductivity of brain skull • Sensitive to tissue anisotropies S. Chevallier 28/10/2015 GSI 7 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental paradigms Diﬀerent brain signals for BCI : • Motor imagery : (de)synchronization in premotor cortex • Evoked responses : low amplitude potentials induced by stimulus SteadyState Visually Evoked Potentials 8 electrodes in occipital region SSVEP stimulation LEDs 13 Hz 17 Hz 21 Hz • Neural synchronization with visual stimulation • No learning required, based on visual attention • Strong induced activation S. Chevallier 28/10/2015 GSI 8 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances BCI Challenges Limitations • Data scarsity ) A few sources are nonlinearly mixed on all electrodes • Individual variabilities ) Eﬀect of mental fatigue • Intersession variabilities ) Electronic impedances, localizations of electrodes • Interindividual variabilities ) State of the art approaches fail with 20% of subjects Desired properties : • Online systems ) Continously adapt to the user’s variations • No calibration phase ) Non negligible cognitive load, raises fatigue • Generic model classiﬁers and transfert learning ) Use data from one subject to enhance the results for another S. Chevallier 28/10/2015 GSI 9 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Spatial covariance matrices Common approach : spatial ﬁltering • Eﬃcient on clean datasets • Speciﬁc to each user and session ) Require user calibration • Two step training with feature selection ) Overﬁtting risk, curse of dimensionality Working with covariance matrices • Good generalization across subjects • Fast convergence • Existing online algorithms • Eﬃcient implementations S. Chevallier 28/10/2015 GSI 10 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG • An EEG trial : X 2 RC⇥N , C electrodes, N time samples • Assuming that X ⇠ N(0, ⌃) • Covariance matrices ⌃ belong to MC = ⌃ 2 RC⇥C : ⌃ = ⌃ and x ⌃x > 0, 8x 2 RC \0 • Mean of the set {⌃i }i=1,...,I is ¯⌃ = argmin⌃2MC PI i=1 dm (⌃i , ⌃) • Each EEG class is represented by its mean • Classiﬁcation based on those means • How to obtain a robust and eﬃcient algorithm ? Congedo, 2013 S. Chevallier 28/10/2015 GSI 11 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Minimum distance to Riemannian mean Simple and robust classiﬁer • Compute the center ⌃ (k) E of each of the K classes • Assign a given unlabelled ˆ⌃ to the closest class k⇤ = argmin k (ˆ⌃, ⌃ (k) E ) Trajectories on tangent space at mean of all trials ¯⌃µ −4 −2 0 2 4 −4 −2 0 2 4 6 Resting class 13Hz class 21Hz class 17Hz class Delay S. Chevallier 28/10/2015 GSI 12 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Riemannian potato Removing outliers and artifacts Reject any ⌃i that lies too far from the mean of all trials ¯⌃µ z( i ) = i µ > zth , i is d(⌃i , ¯⌃), µ and are the mean and standard deviation of distances { i } I i=1 Raw matrices Riemannian potato ﬁltering S. Chevallier 28/10/2015 GSI 13 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEGbased BCI Riemannian approaches in BCI : • Achieve state of the art results ! performing like spatial ﬁltering or sensorspace methods • Rely on simpler algorithms ! less errorprone, computationally eﬃcient What are the reason of this success ? • Invariances embedded with Riemannian distances ! invariance to rescaling, normalization, whitening ! invariance to electrode permutation or positionning • Equivalent to working in an optimal source space ! spatial ﬁltering are sensitive to outliers and userspeciﬁc ! no question on "sensors or sources" methods ) What are the most desirable invariances for EEG ? S. Chevallier 28/10/2015 GSI 14 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Considered distances and divergences Euclidean dE(⌃1, ⌃2) = k⌃1 ⌃2kF LogEuclidean dLE(⌃1, ⌃2) = klog(⌃1) log(⌃2)kF V. Arsigny et al., 2006, 2007 Aﬃneinvariant dAI(⌃1, ⌃2) = klog(⌃ 1 1 ⌃2)kF T. Fletcher & S. Joshi, 2004 , M. Moakher, 2005 ↵divergence d↵ D(⌃1, ⌃2) 1<↵<1 = 4 1 ↵2 log det( 1 ↵ 2 ⌃1+ 1+↵ 2 ⌃2) det(⌃1) 1 ↵ 2 det(⌃2) 1+↵ 2 Z. Chebbi & M. Moakher, 2012 Bhattacharyya dB(⌃1, ⌃2) = ⇣ log det 1 2 (⌃1+⌃2) (det(⌃1) det(⌃2))1/2 ⌘1/2 Z. Chebbi & M. Moakher, 2012 S. Chevallier 28/10/2015 GSI 15 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental results • Euclidean distances yield the lowest results ! Usually attributed to the invariance under inversion that is not guaranteed ! Displays swelling eﬀect • Riemannian approaches outperform stateoftheart methods (CCA+SVM) • ↵divergence shows the best performances ! but requires a costly optimisation to ﬁnd the best ↵ value • Bhattacharyya has the lowest computational cost and a good accuracy −1 −0.5 0 0.5 1 20 30 40 50 60 70 80 90 Accuracy(%) Alpha values (α) −1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 CPUtime(s) S. Chevallier 28/10/2015 GSI 16 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Conclusion Working with covariance matrices in BCI • Achieves very good results • Simple algorithms work well : MDM, Riemannian potato • Need for robust and online methods Interesting applications for IG : • Many freely available datasets • Several competitions • Many open source toolboxes for manipulating EEG Several open questions : • Handling electrodes misplacements and others artifacts • Missing data and covariance matrices of lower rank • Inter and intraindividual variabilities S. Chevallier 28/10/2015 GSI 17 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Thank you ! S. Chevallier 28/10/2015 GSI 18 / 19 BrainComputer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback First systems in early ’70 S. Chevallier 28/10/2015 GSI 19 / 19
Paul Marriott, Radka Sabolova, Germain Van Bever, Frank Critchley
Keywords =
Abstract
Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of GoodnessofFit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodnessofﬁt testing in large sparse extended multinomial contexts 2 CressieRead power divergence λfamily  equivalent to Amari’s αfamily asymptotic properties of two test statistics: Pearson’s χ2test and deviance simulation study for other statistics within power divergence family 3 kasymptotics instead of Nasymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, HighDimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, ﬁnance, document classiﬁcation, astronomy and atmospheric science. continuous data  High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis  example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in LogLinear Models Despite the widespread usage of these [loglinear] models, the applicability and statistical properties of loglinear models under sparse settings are still very poorly understood. As a result, even though highdimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difﬁcult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodnessofﬁt test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary  example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary  theory Theorem For k > 1 and N ≥ 6, the ﬁrst three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for ﬁxed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Deﬁne the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i N∗ = N) ∼ Mult(N, πi) deﬁne S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i N∗ = N) ∼ Mult(N, πi) deﬁne S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) deﬁne ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions  Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of kasymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does kasymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp k i=0 ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for ﬁxed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for ﬁxed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 CressieRead recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 CressieRead, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= 1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= 3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary  key points 1 goodnessofﬁt testing in large sparse extended multinomial contexts 2 kasymptotics instead of Nasymptotics 3 CressieRead power divergence λfamily asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. AnayaIzquierdo, F. Critchley, and P. Marriott (2014): When are ﬁrst order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. AnayaIzquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in loglinear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efﬁciency for certain goodnesofﬁt tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling
Hiroto Inoue
Keywords =
Abstract
Group Theoretical Study on Geodesics for the Elliptical Models Hiroto Inoue Kyushu University, Japan October 28, 2015 GSI2015, ´Ecole Polytechnique, ParisSaclay, France Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 1 / 14 Overview 1 Eriksen’s construction of geodesics on normal model Problem 2 Reconsideration of Eriksen’s argument Embedding Nn → Sym+ n+1(R) 3 Geodesic equation on Elliptical model 4 Future work Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 2 / 14 Eriksen’s construction of geodesics on normal model Let Sym+ n (R) be the set of ndimensional positivedeﬁnite matrices. The normal model Nn = (M, ds2) is a Riemannian manifold deﬁned by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 ). The geodesic equation on Nn is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0. (1) The solution of this geodesic equation has been obtained by Eriksen. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 3 / 14 Theorem ([Eriksen 1987]) For any x ∈ Rn, B ∈ Symn(R), deﬁne a matrix exponential Λ(t) by Λ(t) = ∆ δ Φ tδ tγ tΦ γ Γ := exp(−tA), A := B x 0 tx 0 −tx 0 −x −B ∈ Mat2n+1. (2) Then, the curve (µ(t), Σ(t)) := (−∆−1δ, ∆−1) is the geodesic on Nn satisﬁying the initial condition (µ(0), Σ(0)) = (0, In), ( ˙µ(0), ˙Σ(0)) = (x, B). (proof) We see that by the deﬁnition, (µ(t), Σ(t)) satisﬁes the geodesic equation. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 4 / 14 Problem 1 Explain Eriksen’s theorem, to clarify the relation between the normal model and symmetric spaces. 2 Extend Eriksen’s theorem to the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 5 / 14 Reconsideration of Eriksen’s argument Sym+ n+1(R) Notice that the positivedeﬁnite symmetric matrices Sym+ n+1(R) is a symmetric space by G/K Sym+ n+1(R) gK → g · tg, where G = GLn+1(R), K = O(n + 1). This space G/K has the Ginvariant Riemannian metric ds2 = 1 2 tr (S−1 dS)2 . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 6 / 14 Embedding Nn → Sym+ n+1(R) Put an aﬃne subgroup GA := P µ 0 1 P ∈ GLn(R), µ ∈ Rn ⊂ GLn+1(R). Deﬁne a Riemannian submanifold as the orbit GA · In+1 = {g · t g g ∈ GA} ⊂ Sym+ n+1(R). Theorem (Ref. [Calvo, Oller 2001]) We have the following isometry Nn ∼ −→ GA · In+1 ⊂ Sym+ n+1(R), (Σ, µ) → Σ + µtµ µ tµ 1 . (3) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 7 / 14 Embedding Nn → Sym+ n+1(R) By using the above embedding, we get a simpler expression of the metric and the geodesic equation. Nn ∼= GA · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σ + µtµ µ tµ 1 metric ds2 = (tdµ)Σ−1(dµ) +1 2tr((Σ−1dΣ)2) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0 ⇔ (In, 0)(S−1 ˙S) = (B, x) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 8 / 14 Reconsideration of Eriksen’s argument We can interpret the Eriksen’s argument as follows. Diﬀerential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (B, x) A = B x 0 t x 0 −t x 0 −x −B −→ e−tA = ∆ δ ∗ t δ ∗ ∗ ∗ ∗ −→ S := ∆ δ t δ −1 ∈ ∈ ∈ {A : JAJ = −A} −→ {Λ : JΛJ = Λ−1 } −→ Essential! Nn ∼= GA · In+1 ∩ ∩ ∩ sym2n+1(R) −→ exp Sym+ 2n+1(R) −→ projection Sym+ n+1(R) Here J = In 1 In . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 9 / 14 Geodesic equation on Elliptical model Deﬁnition Let us deﬁne a Riemannian manifold En(α) = (M, ds2) by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 )+ 1 2 dα tr(Σ−1 dΣ) 2 . (4) where dα = (n + 1)α2 + 2α, α ∈ C. Then En(0) = Nn. The geodesic equation on En(α) is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ− dα ndα + 1 t ˙µΣ−1 ˙µΣ = 0. (5) This is equivalent to the geodesic equation on the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 10 / 14 Geodesic equation on Elliptical model The manifold En(α) is also embedded into positivedeﬁnite symmetric matrices Sym+ n+1(R), ref. [Calvo, Oller 2001], and we have simpler expression of the geodesic equation. En(α) ∼= ∃GA(α) · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σα Σ + µtµ µ tµ 1 metric (4) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. (5) ⇔ (In, 0)(S−1 ˙S) = (C, x) − α(log S) (In, 0) A = det A Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 11 / 14 Geodesic equation on Elliptical model But, in general, we do not ever construct any submanifold N ⊂ Sym+ 2n+1(R) such that its projection is En(α): Diﬀerential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (C, x) − α(log S) (In, 0) Λ(t) −→ S(t) ∈ ∈ N −→ En(α) ∼= GA(α) · In+1 ∩ ∩ Sym+ 2n+1(R) −→ projection Sym+ n+1(R) The geodesic equation on elliptical model has not been solved. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 12 / 14 Future work 1 Extend Eriksen’s theorem for elliptical models (ongoing) 2 Find Eriksen type theorem for general symmetric spaces G/K Sketch of the problem: For a projection p : G/K → G/K, ﬁnd a geodesic submanifold N ⊂ G/K, such that pN maps all the geodesics to the geodesics: ∀Λ(t): Geodesic −→ p(Λ(t)): Geodesic ∈ ∈ N −→ pN p(N) ∩ ∩ G/K −→ p:projection G/K Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 13 / 14 References Calvo, M., Oller, J.M. A distance between elliptical distributions based in an embedding into the Siegel group, J. Comput. Appl. Math. 145, 319–334 (2002). Eriksen, P.S. Geodesics connected with the Fisher metric on the multivariate normal manifold, pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987). Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 14 / 14
Shinto Eguchi, Osamu Komori
Keywords =
Abstract
Path connectedness on a space of probability density functions Osamu Komori1 , Shinto Eguchi2 University of Fukui1 , Japan The Institute of Statistical Mathematics2 , Japan Ecole Polytechnique, ParisSaclay (France) October 28, 2015 Komori, O. (University of Fukui) GSI2015 October 28, 2015 1 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 2 / 18 Setting Terminology . . X : data space P : probability measure on X FP: space of probability density functions associated with P We consider a path connecting f and g, where f, g ∈ FP, and investigate the property from a viewpoint of information geometry. Komori, O. (University of Fukui) GSI2015 October 28, 2015 3 / 18 KolmogorovNagumo (KN) average Let ϕ : (0, ∞) → R be an monotonic increasing and concave continuous function. Then for f and g in Fp The KolmogorovNagumo (KN) average . . ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) for 0 ≤ t ≤ 1. Remark 1 . . ϕ−1 is monotone increasing, convex and continuous on (0, ∞) Komori, O. (University of Fukui) GSI2015 October 28, 2015 4 / 18 ϕpath Based on KN average, we consider ϕpath connecting f and g in FP: ϕpath . . ft(x, ϕ) = ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) , where κt ≤ 0 is a normalizing factor, where the equality holds if t = 0 or t = 1. Komori, O. (University of Fukui) GSI2015 October 28, 2015 5 / 18 Existence of κt Theorem 1 . . There uniquely exists κt such that ∫ X ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) dP(x) = 1 Proof From the convexity of ϕ−1 , we have 0 ≤ ∫ ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) dP(x) ≤ ∫ {(1 − t)f(x) + tg(x)}dP(x) ≤ 1 And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing. Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equation above. Komori, O. (University of Fukui) GSI2015 October 28, 2015 6 / 18 Illustration of ϕpath Komori, O. (University of Fukui) GSI2015 October 28, 2015 7 / 18 Examples of ϕpath Example 1 . 1 ϕ0(x) = log(x). The ϕ0path is given by ft(x, ϕ0) = exp((1 − t) log f(x) + t log g(x) − κt), where κt = log ∫ exp((1 − t) log f(x) + t log g(x))dP(x). 2 ϕη(x) = log(x + η) with η ≥ 0. The ϕηpath is given by ft(x, ϕη) = exp [ (1 − t) log{ f(x) + η} + t log{g(x) + η} − κt ] , where κt = log [ ∫ exp{(1 − t) log{f(x) + η} + t log{g(x) + η}}dP(x) − η ] . 3 ϕβ(x) = (xβ − 1)/β with β ≤ 1. The ϕβpath is given by ft(x, ϕβ) = {(1 − t)f(x)β + tg(x)β − κt} 1 β , where κt does not have an explicit form. Komori, O. (University of Fukui) GSI2015 October 28, 2015 8 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 9 / 18 Extended expectation For a function a(x): X → R, we consider Extended expectation . . E(ϕ) f {a(X)} = ∫ X 1 ϕ′(f(x)) a(x)dP(x) ∫ X 1 ϕ′(f(x)) dP(x) , where ϕ: (0, ∞) → R is a generator function. Remark 2 If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation. Komori, O. (University of Fukui) GSI2015 October 28, 2015 10 / 18 Properties of extended expectation We note that 1 E(ϕ) f (c) = c for any constant c. 2 E(ϕ) f {ca(X)} = cE(ϕ) f {a(X)} for any constant c. 3 E(ϕ) f {a(X) + b(X)} = E(ϕ) f {a(X)} + E(ϕ) f {b(X)}. 4 E(ϕ) f {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 for Palmost everywhere x in X. Remark 3 If we deﬁne f(ϕ) (x) = 1/ϕ′ ( f(x))/ ∫ X 1/ϕ′ (f(x))dP(x), then E(ϕ) f {a(X)} = Ef(ϕ) {a(X)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 11 / 18 Tangent space of FP Let Hf be a Hilbert space with the inner product deﬁned by ⟨a, b⟩f = E(ϕ) f {a(X)b(X)}, and the tangent space Tangent space associated with extended expectation . . Tf = {a ∈ Hf : ⟨a, 1⟩f = 0}. For a statistical model M = { fθ(x)}θ∈Θ we have E(ϕ) fθ {∂iϕ(fθ(X))} = 0 for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi)i=1,··· ,p. Further, E(ϕ) fθ {∂i∂jϕ(fθ(X))} = E(ϕ) fθ { ϕ′′ ( fθ(X)) ϕ′(fθ(X))2 ∂iϕ(fθ(X))∂iϕ(fθ(X)) } . Komori, O. (University of Fukui) GSI2015 October 28, 2015 12 / 18 Parallel displacement A(ϕ) t Deﬁne A(ϕ) t (x) in Tft by the solution for a differential equation ˙A(ϕ) t (x) − E(ϕ) ft { A(ϕ) t ˙ft ϕ′′ ( ft) ϕ′(ft) } = 0, where ft is a path connecting f and g such that f0 = f and f1 = g. ˙A(ϕ) t (x) is the derivative of A(ϕ) t (x) with respect to t. Theorem 2 The geodesic curve {ft}0≤t≤1 by the parallel displacement A(ϕ) t is the ϕpath. Komori, O. (University of Fukui) GSI2015 October 28, 2015 13 / 18 Contents 1 KolmogorovNagumo (KN) average 2 parallel displacement A(ϕ) t characterizing ϕpath 3 Udivergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 14 / 18 Udivergence Assume that U(s) is a convex and increasing function of a scalar s and let ξ(t) = argmaxs{st − U(s)} . Then we have Udivergence . . DU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP − ∫ {U(ξ(f)) − fξ( f)}dP. In fact, Udivergence is the difference of the cross entropy CU( f, g) with the diagonal entropy CU( f, f), where CU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP. Komori, O. (University of Fukui) GSI2015 October 28, 2015 15 / 18 Connections based on Udivergence For a manifold of ﬁnite dimension M = { fθ(x) : θ ∈ Θ} and vector ﬁelds X and Y on M, the Riemannian metric is G(U) (X, Y)(f) = ∫ X f Yξ( f)dP for f ∈ M and linear connections ∇(U) and ∇∗(U) are G(U) (∇(U) X Y, Z)(f) = ∫ XY f Zξ(f)dP and G(U) (∇∗ X (U) Y, Z)(f) = ∫ Z f XYξ(f)dP. See Eguchi (1992) for details. Komori, O. (University of Fukui) GSI2015 October 28, 2015 16 / 18 Equivalence between ∇∗ geodesic and ξpath Let ∇(U) and ∇∗(U) be linear connections associated with Udivergence DU, and let C(ϕ) = {ft(x, ϕ) : 0 ≤ t ≤ 1} be the ϕ path connecting f and g of FP. Then, we have Theorem 3 A ∇(U) geodesic curve connecting f and g is equal to C(id) , where id denotes the identity function; while a ∇∗(U) geodesic curve connecting f and g is equal to C(ξ) , where ξ(t) = argmaxs{st − U(s)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 17 / 18 Summary 1 We consider ϕpath based on KolmogorovNagumo average. 2 The relation between Udivergence and ϕpath was investigated (ϕ corresponds to ξ). 3 The idea of ϕpath can be applied to probability density estimation as well as classiﬁcation problems. 4 Divergence associated with ϕpath can be considered, where a special case would be Bhattacharyya divergence. Komori, O. (University of Fukui) GSI2015 October 28, 2015 18 / 18
Monta Sakamoto, Hiroshi Matsuzoe
Keywords =
Abstract
A generalization of independence and multivariate Student’s tdistributions MATSUZOE Hiroshi Nagoya Institute of Technology joint works with SAKAMOTO Monta (Efrei, Paris) 1 Deformed exponential family 2 Nonadditive diﬀerentials and expectation functionals 3 Geometry of deformed exponential families 4 Generalization of independence 5 qindependence and Student’s tdistributions 6 Appendix Notions of expectations, independence are determined from the choice of statistical models. Introduction: Geometry and statistics • Geometry for the sample space • Geometry for the parameter space • Wasserstein geometry • Optimal transport theory • A pdf is regarded as a distribution of mass • Information geometry • Convexity of entropy and free energy • Duality of estimating function
Tomonari Sei, Ushio Tanaka
Keywords =
Abstract
What is textile plot? Textile set Main result Other results Summary Geometric Properties of textile plot Tomonari SEI and Ushio TANAKA University of Tokyo and Osaka Prefecture University at ´Ecole Polytechnique, Oct 28, 2015 1 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a diﬀerential geometrical point of view. It is shown that the textile set is written as the union of two diﬀerentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a diﬀerential geometrical point of view. It is shown that the textile set is written as the union of two diﬀerentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary 1 What is textile plot? 2 Textile set 3 Main result 4 Other results 5 Summary 3 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a locationscale transformation. Categorical data is quantiﬁed. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a locationscale transformation. Categorical data is quantiﬁed. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coeﬃcients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Coeﬃcients a = (aj ) and b = (bj ) are the solution of the following minimization problem: Minimize a,b n∑ t=1 p∑ j=1 (ytj − ¯yt·)2 subject to yj = aj + bj xj , p∑ j=1 yj 2 = 1. Intuition: as horizontal as possible. Solution: a = 0 and b is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of X. yt1 yt2 yt3 yt4 yt5 yt. 6 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = 100, p = 4) X ∈ R100×4. Each row ∼ N(0, Σ), Σ = 1 −0.6 0.5 0.1 −0.6 1 −0.6 −0.2 0.5 −0.6 1 0.0 0.1 −0.2 0.0 1 . −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 (a) raw data X (b) textile plot Y 7 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisﬁes two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following deﬁnition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Deﬁnition The textile set is deﬁned by Tn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is deﬁned by Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2)  y1 = y2 = 1/ √ 2}, B = {(y1, y2)  y1 − y2 = y1 + y2 = 1}, each of which is diﬀeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2)  y1 = y2 = 1/ √ 2}, B = {(y1, y2)  y1 − y2 = y1 + y2 = 1}, each of which is diﬀeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = p = 2) T2,2 ⊂ R4 is the union of two tori, glued along O(2). θ φ ξ η T2,2 = { 1 √ 2 ( cos θ cos φ sin θ sin φ )} ∪ { 1 2 ( cos ξ + cos η cos ξ − cos η sin ξ + sin η sin ξ − sin η )} 11 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we deﬁne two concepts: noncompact Stiefel manifold and canonical form. Deﬁnition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices: V ∗ := { Y ∈ Rn×p  rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the GramSchmidt orthonormalization, the quotient space V ∗/O(n) is identiﬁed with uppertriangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Deﬁnition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0 , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Deﬁnition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0 , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: noncompact Stiefel manifold, V ∗∗: set of canonical forms. Deﬁnition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identiﬁed with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary Main theorem The diﬀerential geometrical property of U∗∗ n,p is given as follows: Theorem Let n ≥ p ≥ 3. Then we have the following decomposition U∗∗ n,p = M1 ∪ M2, where each Mi is a diﬀerentiable manifold, the dimensions of which are given by dim M1 = p(p + 1) 2 − (p − 1), dim M2 = p(p + 1) 2 − p, respectively. M2 is connected while M1 may not. 16 / 23 What is textile plot? Textile set Main result Other results Summary Example U∗∗ 3,3 is the union of 4dim and 3dim manifolds. We look at a cross section with y11 = y22 = 1: y12 y13 y33 Union of a surface and a vertical line. 17 / 23 What is textile plot? Textile set Main result Other results Summary Corollary Let n ≥ p ≥ 3. Then we have U∗ n,p = π−1 (M1) ∪ π−1 (M2), where π denotes the map of GramSchmidt orthonormalization. The dimensions are dim π−1 (M1) = np − (p − 1), dim π−1 (M2) = np − p. 18 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) := ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1 . Lemma We have a classiﬁcation of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) := ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1 . Lemma We have a classiﬁcation of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Diﬀerential geometrical characterization of fλ −1 (O) Lastly, we state a characterization of fλ −1 (O) from the viewpoint of diﬀerential geometry. Theorem Let λ ≥ 0. fλ −1 (O) is a regular submanifold of Rn×p with codimension p + 1 whenever λ > 0, y11yjj − y1j yj1 = 0, j = 2, . . . , p, ∃ ∈ { 2, . . . , p }; p∑ j=2 yij + yi (1 − 2λ) = 0, i = 1, . . . , n. 21 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We deﬁned the textile set Tn,p and ﬁnd its geometric properties. Present and future study: . 1 Characterize the classiﬁcation fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate diﬀerential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one ﬁnd statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary References . 1 Absil, P.A., Mahony, R., and Sepulchre, R. (2008), Optimization Algorithms on Matrix Manifolds, Princeton University Press. . 2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot, Proceedings of the Institute of Statistical Mathematics, 55, 69–83. . 3 Inselberg, A. (2009), Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Springer. 4 Kumasaka, N. and Shibata, R. (2008), Highdimensional data visualisation: The textile plot, Computational Statistics and Data Analysis, 52, 3616–3644. 23 / 23
Damiano Brigo, John Armstrong
Keywords =
Abstract
Stochastic PDE projection on manifolds: AssumedDensity and Galerkin Filters GSI 2015, Oct 28, 2015, Paris Damiano Brigo Dept. of Mathematics, Imperial College, London www.damianobrigo.it — Joint work with John Armstrong Dept. of Mathematics, King’s College, London — Full paper to appear in MCSS, see also arXiv.org D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 1 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a mdim manifold (?) S (S1/2). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a mdim manifold (?) S (S1/2). The topology & differential structure in the chart is the L2 structure, but two possibilities: S : d2(p1, p2) = p1 − p2 (L2 direct distance), p1,2 ∈ L2 S1/2 : dH( √ p1, √ p2) = √ p1 − √ p2 (Hellinger distance), p1,2 ∈ L1 where · is the norm of Hilbert space L2. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is deﬁned (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous FisherRao matrix (dH) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is deﬁned (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous FisherRao matrix (dH) d2 ort. projection: Πγ θ [v] = m i=1 [ m j=1 γij (θ) v, ∂p(·, θ) ∂θj ] ∂p(·, θ) ∂θi (dH proj. analogous inserting √ · and replacing γ with g) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) The nonlinear ﬁltering problem consists in ﬁnding the conditional probability distribution πt of the state Xt given the observations up to time t, i.e. πt (dx) := P[Xt ∈ dx  Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t). Assume πt has a density pt : then pt satisﬁes the Str SPDE: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a ﬁnite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear ﬁltering problem The nonlinear ﬁltering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [bt 2 − Ept {bt 2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any ﬁnite dim p(·, θ) [19]. We need ﬁnite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a ﬁnite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). With Ito calculus we would have terms ∂2p(·,θ) ∂θi ∂θj d θi, θj (not tang vec) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Projection Filters Projection ﬁlter in the metrics h (L2) and g (Fisher) dθi t = m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Nonlinear Projection Filtering Projection Filters Projection ﬁlter in the metrics h (L2) and g (Fisher) dθi t = m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . Instead, using the Hellinger distance & the Fisher metric with projection Πg dθi t = m j=1 gij (θt ) L∗ t p(x, θt ) p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 gij (θt ) 1 2 bt (x)2 ∂p ∂θj dx dt + d k=1 [ m j=1 gij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection ﬁlter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Yfunction b among c(x) exponents makes ﬁlter correction step (projection of dY term) exact One can deﬁne both a local and global ﬁltering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). Projection ﬁlter in η coincides with classical approx ﬁlter: assumed density ﬁlter (based on generalized “moment matching”) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? The answer is afﬁrmative, and this is the mixture family. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the ﬁlter equations are simpler? The answer is afﬁrmative, and this is the mixture family. We deﬁne a simple mixture family as follows. Given m + 1 ﬁxed squared integrable probability densities q = [q1, q2, . . . , qm+1]T , deﬁne ˆθ(θ) := [θ1, θ2, . . . , θm, 1 − θ1 − θ2 − . . . − θm]T for all θ ∈ Rm. We write ˆθ instead of ˆθ(θ). Mixture family (simplex): SM (q) = {ˆθ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1} D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). The L2 metric does not depend on the speciﬁc point θ of the manifold. The same holds for the tangent space at p(·, θ), which is given by span{q1 − qm+1, q2 − qm+1, · · · , qm − qm+1} Also the L2 projection becomes particularly simple. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection ﬁlter that is the same as approximate ﬁltering via Galerkin [5] methods. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection ﬁlter that is the same as approximate ﬁltering via Galerkin [5] methods. See the full paper for the details. Summing up: Family → Exponential Basic Mixture Metric ↓ Hellinger dH Good Nothing special Fisher g(θ) ∼ADF ≈ local moment matching Direct L2 d2 Nothing special Good matrix γ(θ) (∼Galerkin) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not ﬁxed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, ﬁlter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not ﬁxed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 We are now going to illustrate the Gaussian mixture projection ﬁlter (GMPF) in a fundamental example. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (pink) vs EKF (N) (blue) vs exact (green, ﬁnite diff. method, grid 1000 state & 5000 time) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 0 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 13 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 1 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 14 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 2 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 15 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 3 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 16 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 4 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 17 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 5 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 18 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 6 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 19 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 7 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 20 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 8 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 21 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 9 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 22 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 8 6 4 2 0 2 4 6 8 X Distribution at time 10 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 23 / 37 Mixture Projection Filter The quadratic sensor Comparing local approximation errors (L2 residuals) εt ε2 t = (pexact,t (x) − papprox,t (x))2 dx papprox,t (x): three possible choices. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (blue) vs EKF (N) (green) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 24 / 37 Mixture Projection Filter The quadratic sensor L2 residuals for the quadratic sensor 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 25 / 37 Mixture Projection Filter The quadratic sensor Comparing local approx errors (Prokhorov residuals) εt εt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) + ∀x} with F the CDF of p’s. LevyProkhorov metric works well with singular densities like particles where L2 metric not ideal. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (green) vs best three particles (blue) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 26 / 37 Mixture Projection Filter The quadratic sensor L´evy residuals for the quadratic sensor 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 1 2 3 4 5 6 7 8 9 10 Time ProkhorovResiduals Prokhorov Residual (L2NM) Prokhorov Residual (HE) Best possible residual (3Deltas) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 27 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular The solution is to dynamically change the parameterization and even the dimension of the manifold. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate ﬁnitedimensional ﬁltering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler ﬁlter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? Optimality: introducing new projections (forthcoming J. Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Thanks With thanks to the organizing committee. Thank you for your attention. Questions and comments welcome D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 30 / 37 Conclusions and References References I [1] J. Aggrawal: Sur l’information de Fisher. In: Theories de l’Information (J. Kampe de Feriet, ed.), SpringerVerlag, Berlin–New York 1974, pp. 111117. [2] Amari, S. Differentialgeometrical methods in statistics, Lecture notes in statistics, SpringerVerlag, Berlin, 1985 [3] Armstrong, J., and Brigo, D. (2016). Nonlinear ﬁltering via stochastic PDE projection on mixture manifolds in L2 direct metric, Mathematics of Control, Signals and Systems, 2016, accepted. [4] Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W. (1999). Nonlinear Projection Filter based on Galerkin approximation. AIAA Journal of Guidance Control and Dynamics, 22 (2): 258266. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 31 / 37 Conclusions and References References II [5] Beard, R. and Gunther, J. (1997). Galerkin Approximations of the Kushner Equation in Nonlinear Estimation. Working Paper, Brigham Young University. [6] BarndorffNielsen, O.E. (1978). Information and Exponential Families. John Wiley and Sons, New York. [7] Brigo, D. Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. BarndorffNielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientiﬁc, 1999 [8] Brigo, D, On SDEs with marginal laws evolving in ﬁnitedimensional exponential families, STAT PROBABIL LETT, 2000, Vol: 49, Pages: 127 – 134 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 32 / 37 Conclusions and References References III [9] Brigo, D. (2011). The direct L2 geometric structure on a manifold of probability densities with applications to Filtering. Available on arXiv.org and damianobrigo.it [10] Brigo, D, Hanzon, B, LeGland, F, A differential geometric approach to nonlinear ﬁltering: The projection ﬁlter, IEEE T AUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252 [11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear ﬁltering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 – 534 [12] D. Brigo, Filtering by Projection on the Manifold of Exponential Densities, PhD Thesis, Free University of Amsterdam, 1996. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 33 / 37 Conclusions and References References IV [13] Brigo, D., and Pistone, G. (1996). Projecting the FokkerPlanck Equation onto a ﬁnite dimensional exponential family. Available at arXiv.org [14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbook of Nonlinear Filtering, Oxford University Press. [15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear ﬁltering, in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: The Mathematics of Filtering and Identiﬁcation and Applications (Reidel, Dordrecht, 1981) 53–75. [16] Elworthy, D. (1982). Stochastic Differential Equations on Manifolds. LMS Lecture Notes. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 34 / 37 Conclusions and References References V [17] Hanzon, B. A differentialgeometric approach to approximate nonlinear ﬁltering. In C.T.J. Dodson, Geometrization of Statistical Theory, pages 219 – 223,ULMD Publications, University of Lancaster, 1987. [18] B. Hanzon, Identiﬁability, recursive identiﬁcation and spaces of linear dynamical systems, CWI Tracts 63 and 64, CWI, Amsterdam, 1989 [19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence of ﬁnite dimensional ﬁlters for conditional statistics of the cubic sensor problem, Systems and Control Letters 3 (1983) 331–340. [20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften, vol. 288 (1987), SpringerVerlag, Berlin, D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 35 / 37 Conclusions and References References VI [21] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochastic differential equations for the non linear ﬁltering problem. Osaka J. Math. Volume 9, Number 1 (1972), 1940. [23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets of Probability Distributions. Presented at the 1st International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 29 June  2 July 1999 [24] R. Z. Khasminskii (1980). Stochastic Stability of Differential Equations. Alphen aan den Reijn [25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I, General Theory (Springer Verlag, Berlin, 1978). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 36 / 37 Conclusions and References References VII [26] M. Murray and J. Rice  Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993. [27] D. Ocone, E. Pardoux, A Lie algebraic criterion for nonexistence of ﬁnite dimensionally computable ﬁlters, Lecture notes in mathematics 1390, 197–204 (Springer Verlag, 1989) [28] Pistone, G., and Sempi, C. (1995). An Inﬁnite Dimensional Geometric Structure On the space of All the Probability Measures Equivalent to a Given one. The Annals of Statistics 23(5), 1995 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 37 / 37
Ali MohammadDjafari
Keywords =
Abstract
. Variational Bayesian Approximation method for Classiﬁcation and Clustering with a mixture of Studentt model Ali MohammadDjafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRSCentraleSup´elecUNIV PARIS SUD SUPELEC, 91192 GifsurYvette, France http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr http://djafari.free.fr http://publicationslist.org/djafari A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 1/20 Contents 1. Mixture models 2. Diﬀerent problems related to classiﬁcation and clustering Training Supervised classiﬁcation Semisupervised classiﬁcation Clustering or unsupervised classiﬁcation 3. Mixture of Studentt 4. Variational Bayesian Approximation 5. VBA for Mixture of Studentt 6. Conclusion A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 2/20 Mixture models General mixture model p(xa, Θ, K) = K k=1 ak pk(xkθk), 0 < ak < 1 Same family pk(xkθk) = p(xkθk), ∀k Gaussian p(xkθk) = N(xkµk, Σk) with θk = (µk, Σk) Data X = {xn, n = 1, · · · , N} where each element xn can be in one of these classes cn. ak = p(cn = k), a = {ak, k = 1, · · · , K}, Θ = {θk, k = 1, · · · , K} p(Xn, cn = ka, θ) = N n=1 p(xn, cn = ka, θ). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 3/20 Diﬀerent problems Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classiﬁcation: Given a sample xm and the parameters K, a and Θ determine its class k∗ = arg max k {p(cm = kxm, a, Θ, K)} . Semisupervised classiﬁcation (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k∗ = arg max k {p(cm = kxm, Θ, K)} . Clustering or unsupervised classiﬁcation (Number of classes K is not known): Given a set of data X, determine K and c. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 4/20 Training Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): (a, Θ) = arg max (a,Θ) {p(X, ca, Θ, K)} . Bayesian: Assign priors p(aK) and p(ΘK) = K k=1 p(θk) and write the expression of the joint posterior laws: p(a, ΘX, c, K) = p(X, ca, Θ, K) p(aK) p(ΘK) p(X, cK) where p(X, cK) = p(X, ca, ΘK)p(aK) p(ΘK) da dΘ Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 5/20 Supervised classiﬁcation Given a sample xm and the parameters K, a and Θ determine p(cm = kxm, a, Θ, K) = p(xm, cm = ka, Θ, K) p(xma, Θ, K) where p(xm, cm = ka, Θ, K) = akp(xmθk) and p(xma, Θ, K) = K k=1 ak p(xmθk) Best class k∗: k∗ = arg max k {p(cm = kxm, a, Θ, K)} A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 6/20 Semisupervised classiﬁcation Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = kxm, Θ, K) = p(xm, cm = kΘ, K) p(xmΘ, K) where p(xm, cm = kΘ, K) = p(xm, cm = ka, Θ, K)p(aK) da and p(xmΘ, K) = K k=1 p(xm, cm = kΘ, K) Best class k∗, for example the MAP solution: k∗ = arg max k {p(cm = kxm, Θ, K)} . A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 7/20 Clustering or nonsupervised classiﬁcation Given a set of data X, determine K and c. Determination of the number of classes: p(K = LX) = p(X, K = L) p(X) = p(XK = L) p(K = L) p(X) and p(X) = L0 L=1 p(K = L) p(XK = L), where L0 is the a priori maximum number of classes and p(XK = L) = n L k=1 akp(xn, cn = kθk)p(aK) p(ΘK) da dΘ When K and c are determined, we can also determine the characteristics of those classes a and Θ. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 8/20 Mixture of Studentt model Studentt and its Inﬁnite Gaussian Scaled Model (IGSM): T (xν, µ, Σ) = ∞ 0 N(xµ, z−1 Σ) G(z ν 2 , ν 2 ) dz where N(xµ, Σ)= 2πΣ−1 2 exp −1 2(x − µ) Σ−1 (x − µ) = 2πΣ−1 2 exp −1 2Tr (x − µ)Σ−1 (x − µ) and G(zα, β) = βα Γ(α) zα−1 exp [−βz] . Mixture of Studentt: p(x{νk, ak, µk, Σk, k = 1, · · · , K}, K) = K k=1 ak T (xnνk, µk, Σk). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 9/20 Mixture of Studentt model Introducing znk, zk = {znk, n = 1, · · · , N}, Z = {znk}, c = {cn, n = 1, · · · , N}, θk = {νk, ak, µk, Σk}, Θ = {θk, k = 1, · · · , K} Assigning the priors p(Θ) = k p(θk), we can write: p(X, c, Z, ΘK) = n k akN(xnµk, z−1 n,k Σk) G(znkνk 2 , νk 2 ) p(θk) Joint posterior law: p(c, Z, ΘX, K) = p(X, c, Z, ΘK) p(XK) . The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classiﬁcation or clustering. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA) Main idea: to propose easy computational approximation q(c, Z, Θ) for p(c, Z, ΘX, K). Criterion: KL(q : p) Interestingly, by noting that p(c, Z, ΘX, K) = p(X, c, Z, ΘK)/p(XK) we have: KL(q : p) = −F(q) + ln p(XK) where F(q) = − ln p(X, c, Z, ΘK) q is called free energy of q and we have the following properties: – Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(XK). – When the optimum q∗ is obtained, F(q∗) can be used as a criterion for model selection. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 11/20 VBA: choosing the good families Using KL(q : p) has the very interesting property that using q to compute the means we obtain the same values if we have used p (Conservation of the means). Unfortunately, this is not the case for variances or other moments. If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 12/20 Hierarchical graphical model ξ0 d d © αk βk znk E γ0, Σ0 c Σk µ0, η0 c µk k0 c a d d © d d © ¨ ¨¨¨ ¨¨%xn E Figure : Graphical representation of the model. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 13/20 VBA for mixture of Studentt In our case, noting that p(X, c, Z, ΘK) = n k p(xn, cn, znkak, µk, Σk, νk) k [p(αk) p(βk) p(µkΣk) p(Σk)] with p(xn, cn, znkak, µk, Σk, νk) = N(xnµk, z−1 n,k Σk) G(znkαk, βk) is separable, in one side for [c, Z] and in other size in components of Θ, we propose to use q(c, Z, Θ) = q(c, Z) q(Θ). A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 14/20 VBA for mixture of Studentt With this decomposition, the expression of the KullbackLeibler divergence becomes: KL(q1(c, Z)q2(Θ) : p(c, Z, ΘX, K) = c q1(c, Z)q2(Θ) ln q1(c, Z)q2(Θ) p(c, Z, ΘX, K) dΘ dZ The expression of the Free energy becomes: F(q1(c, Z)q2(Θ)) = c q1(c, Z)q2(Θ) ln p(X, c, ZΘ, K)p(ΘK) q1(c, Z)q2(Θ) dΘ dZ A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Studentt priors model Using a generalized Studentt obtained by replacing G(zn,kνk 2 , νk 2 ) by G(zn,kαk, βk) it will be easier to propose conjugate priors for αk, βk than for νk. p(xn, cn = k, znkak, µk, Σk, αk, βk, K) = ak N(xnµk, z−1 n,k Σk) G(zn,kαk, βk). In the following, noting by Θ = {(ak, µk, Σk, αk, βk), k = 1, · · · , K}, we propose to use the factorized prior laws: p(Θ) = p(a) k [p(αk) p(βk) p(µkΣk) p(Σk)] with the following components: p(a) = D(ak0), k0 = [k0, · · · , k0] = k01 p(αk) = E(αkζ0) = G(αk1, ζ0) p(βk) = E(βkζ0) = G(αk1, ζ0) p(µkΣk) = N(µkµ01, η−1 0 Σk) p(Σk) = IW(Σkγ0, γ0Σ0) A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Studentt priors model where D(ak) = Γ( l kk) l Γ(kl ) l akl −1 l is the Dirichlet pdf, E(tζ0) = ζ0 exp [−ζ0t] is the Exponential pdf, G(ta, b) = ba Γ(a) ta−1 exp [−bt] is the Gamma pdf and IW(Σγ, γ∆) = 1 2∆γ/2 exp −1 2Tr ∆Σ−1 ΓD(γ/2)Σ γ+D+1 2 . is the inverse Wishart pdf. With these prior laws and the likelihood: joint posterior law: pk(c, Z, ΘX) = p(X, c, Z, Θ) p(X) . A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 17/20 Expressions of q q(c, Z, Θ) = q(c, Z) q(Θ) = n k[q(cn = kznk) q(znk)] k[q(αk) q(βk) q(µkΣk) q(Σk)] q(a). with: q(a) = D(a˜k), ˜k = [˜k1, · · · , ˜kK ] q(αk) = G(αk˜ζk, ˜ηk) q(βk) = G(βk˜ζk, ˜ηk) q(µkΣk) = N(µkµ, ˜η−1Σk) q(Σk) = IW(Σk˜γ, ˜γ ˜Σ) With these choices, we have F(q(c, Z, Θ)) = ln p(X, c, Z, ΘK) q(c,Z,Θ) = k n F1kn + k F2k F1kn = ln p(xn, cn, znk, θk) q(cn=kznk )q(znk ) F2k = ln p(xn, cn, znk, θk) q(θk )A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 18/20 VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) ﬁxed, we obtain the expression of q(cn = kznk) = ˜ak, q(znk) = G(znkαk, βk). M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) ﬁxed, we obtain the expression of q(a) = D(a˜k), ˜k = [˜k1, · · · , ˜kK ], q(αk) = G(αk˜ζk, ˜ηk), q(βk) = G(βk˜ζk, ˜ηk), q(µkΣk) = N(µkµ, ˜η−1Σk), and q(Σk) = IW(Σk˜γ, ˜γ ˜Σ), which gives the updating algorithm for the corresponding tilded parameters. F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. Final value of F(q) for each value of K, noted Fk, can be used as a criterion for model selection, i.e.; the determination of the number of clusters. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 19/20 Conclusions Clustering and classiﬁcation of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. Mixture models and in particular Mixture of Gaussians are classical models for these tasks. We proposed to use a mixture of generalised Studentt distribution model for the data via a hierarchical graphical model. To obtain fast algorithms and be able to handle large data sets, we used conjugate priors everywhere it was possible. The proposed algorithm has been used for clustering, classiﬁcation and discriminant analysis of some biological data (Cancer research related), but in this paper, we only presented the main algorithm. A. MohammadDjafari, VBA for Classiﬁcation and Clustering..., GSI2015, October 2830, 2015, Polytechnique, France 20/20
Barbara Opozda
Keywords = Affine connection, Curvature tensor, Laplacian Bochner’s technique, Ricci tensor, Sectional curvature
Abstract
Curvatures of statistical structures Barbara Opozda Paris, October 2015 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 1 / 29 Statistical structures  statistical setting M  open subset of Rn Λ  probability space with a ﬁxed σalgebra p : M × Λ (x, λ) → p(x, λ) ∈ R  smooth relative to x such that px (λ) := p(x, λ) is a probability measure on Λ — probability distribution (x, λ) := log(p(x, λ)) gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to the probability px ∀x ∈ M, ∂1, ..., ∂n  the canonical frame on M g – Fisher information metric tensor ﬁeld on M Cijk(x) = Ex [(∂i )(∂j )(∂k )]  cubic form (g, C) – statistical structure on M Barbara Opozda () Curvatures of statistical structures Paris, October 2015 2 / 29 Statistical structures (Codazzi structures)– geometric setting; three equivalent deﬁnitions M – manifold, dim M = n I) (g, C), C  totally symmetric (0, 3)tensor ﬁeld on M, that is, C(X, Y , Z) = C(Y , X, Z) = C(Y , Z, X) ∀X, Y , Z ∈ Tx M, x ∈ M C – cubic form II) (g, K), K – symmetric (1, 2)tensor ﬁeld (i.e., K(X, Y ) = K(Y , X)) and symmetric relative to g, that is, g(X, K(Y , Z)) = g(Y , K(X, Z)) is symmetric for all arguments. C(X, Y , Z) = g(X, K(Y , Z)) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 3 / 29 III) (g, ),  torsionfree connection such that ( X g)(Y , Z) = ( Y g)(X, Z) (1) — statistical connection T – any tensor ﬁeld of type (p, q) on M, T – of type (p, q + 1) T(X, Y1, ..., Yq) = ( X T)(Y1, ..., Yq) In particular, g(X, Y , Z) = ( X g)(Y , Z) (1) ⇔ g is a symmetric cubic form ˆ  LeviCivita connection for g K(X, Y ) := X Y − ˆ X Y K – diﬀerence tensor g(X, Y , Z) = −2g(X, K(Y , Z)) = −2C(X, Y , Z) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 4 / 29 A statistical structure is trivial if and only if K = 0 or equivalently C = 0 or equivalently = ˆ . KX Y := K(X, Y ) E := tr g K = K(e1, e1) + ... + K(en, en) = (tr Ke1 )e1 + ... + (tr Ken )en E – mean diﬀerence vector ﬁeld E = 0 ⇔ tr KX = 0 ∀X ∈ TM ⇔ tr g C(X, ·, ·) = 0 ∀X ∈ TM E = 0 ⇒ tracefree statistical structure Fact. (g, ) – tracefree if and only if νg = 0, where νg – volume form determined by g Barbara Opozda () Curvatures of statistical structures Paris, October 2015 5 / 29 Examples Riemannian geometry of the second fundamental form M – locally strongly hypersurface in Rn+1 – the second fundamental form h satisﬁes the Codazzi equation h(X, Y , Z) = h(Y , X, Z), where is the induced connection (the LeviCivita connection of the ﬁrst fundamental form) (h, )  statistical structure Similarly one gets statistical structures on hypersurfaces in space forms. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 6 / 29 Equiaﬃne geometry of hypersurfaces in the standard aﬃne space Rn+1 M – locally strongly convex hypersurface in Rn+1 ξ – a transversal vector ﬁeld D – standard ﬂat connection on Rn+1, X, Y ∈ X(M), ξ  transversal vector ﬁeld DX Y = X Y + h(X, Y )ξ − Gauss formula – induced connection, h – second fundamental form (metric tensor ﬁeld) DX ξ = −SX + τ(X)ξ − Weingarten formula If τ = 0, ξ is called equiaﬃne. In this case the Codazzi equation is satisﬁed h(X, Y , Z) = h(Y , X, Z) (h, ) – statistical structure Barbara Opozda () Curvatures of statistical structures Paris, October 2015 7 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Geometry of Lagrangian submanifolds in Kaehler manifolds N – Kaehler manifold of real dimension 2n and with complex structure J M – Lagrangian submanifold of N  ndimensional submanifold such that JTM orthogonal to TM, i.e. JTM is the normal bundle (in the metric sense) for M ⊂ N D – the Kaehler connection on N DX Y = X Y + JK(X, Y ) g – induced metric tensor ﬁeld on M (g, K) – statistical structure It is tracefree ⇔ M is minimal in N. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 11 / 29 Most of statistical structures are outside the three classes of examples. For instance, in order that a statistical structure is locally realizable on an equiaﬃne hypersurface it is necessary that is projectively ﬂat. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 12 / 29 Dual connections, curvature tensors g – metric tensor ﬁeld on M, – any connection Xg(Y , Z) = g( X Y , Z) + g(Y , X Z) (2) – dual connection (g, ) – statistical structure if and only if (g, ) – statistical structure R(X, Y )Z – (1, 3)  curvature tensor for If R = 0 the structure is called Hessian R(X, Y )Z – curvature tensor for g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) (3) In particular, R = 0 ⇔ R = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 13 / 29 ˆ – LeviCivita connection for g, = ˆ + K, = ˆ − K ˆR – curvature tensor for ˆ R(X, Y ) = ˆR(X, Y ) + ( ˆ X K)Y − ( ˆ Y K)X + [KX , KY ] (4) , where [KX , KY ] = KX KY − KY KX R(X, Y ) = ˆR(X, Y ) − ( ˆ X K)Y + ( ˆ Y K)X + [KX , KY ] (5) R(X, Y ) + R(X, Y ) = 2ˆR(X, Y ) + 2[KX , KY ] (6) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 14 / 29 Sectional curvatures R does not have to be skewsymmetric relative to g, i.e. g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z), in general. Lemma * The following conditions are equivalent: 1) g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) ∀X, Y , Z, W 2) R = R 3) ˆ K is symmetric, that is, ( ˆ K)(X, Y , Z) = ( ˆ X K)(Y , Z) = ( ˆ Y K)(X, Z) = ( ˆ K)(Y , X, Z) ∀X, Y , Z. For hypersurfaces in Rn+1 each of the above conditions describes an aﬃne sphere Barbara Opozda () Curvatures of statistical structures Paris, October 2015 15 / 29 R := R+R 2 [K, K](X, Y )Z := [KX , KY ]Z R(X, Y )Z and [K, K](X, Y )Z are Riemanncurvaturelike tensors – they are skewsymmetric in X, Y , satisfy the ﬁrst Bianchi identity, R(X, Y ), [K, K](X, Y ) are skewsymmetric relative to g ∀X, Y π – vector plane in Tx M, X, Y – orthonormal basis of π sectional curvature for g – ˆk(π) := g(ˆR(X, Y )Y , X) sectional Kcurvature – k(π) := g([K, K](X, Y )Y , X) sectional curvature – k (π) := g(R(X, Y )Y , X) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 16 / 29 In general, Schur’s lemma does not hold for k and k. We have, however, Lemma Assume that M is connected, dim M > 2 and the sectional  curvature (the sectional Kcurvature) is pointwise constant. If one of the equivalent conditions in Lemma * holds then the sectional curvature (the sectional Kcurvature) is constant on M. sectional Kcurvature The easiest situation which should be taken into account is when the sectional Kcurvature is constant for all vector planes in Tx M. In this respect we have Barbara Opozda () Curvatures of statistical structures Paris, October 2015 17 / 29 Theorem If the sectional Kcurvature is constant and equal to A for all vector planes in Tx M then there is an orthonormal basis e1, ..., en of Tx M and numbers λ1, ..., λn, µ1, ..., µn−1 such that Ke1 = λ1 µ1 ... µ1 Kei = µ1 ... µi−1 µ1 · · · µi−1 λi µi ... µi Ken = µ1 ... µn−1 µ1 · · · µn−1 λn Barbara Opozda () Curvatures of statistical structures Paris, October 2015 18 / 29 continuation of the theorem Moreover µi = λi − λ2 i − 4Ai−1 2 , Ai = Ai−1 − µ2 i , for i = 1, ..., n − 1 where A0 = A. The above representation of K is not unique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µi for i = 1, ..., n − 1 are expressed as follows λi = (n − i) −Ai−1 n − i + 1 , µi = − −Ai−1 n − i + 1 . In particular, in the last case the numbers λi , µi depend only on A and the dimension of M. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 19 / 29 Example 1. Ke1 = λ λ/2 ... λ/2 Kei = λ/2 ... 0 λ/2 · · · 0 0 0 ... 0 Ken = λ/2 ... 0 λ/2 · · · 0 0 The sectional Kcurvature is constant = λ2/4 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 20 / 29 Example 2. Kcurvature vanishes, i.e. [K, K] = 0. There is an orthonormal frame e1, ..., e1 such that Ke1 = λ1 0 ... 0 Kei = 0 ... 0 0 · · · 0 λi 0 ... 0 Ken = 0 ... 0 0 · · · 0 λn Barbara Opozda () Curvatures of statistical structures Paris, October 2015 21 / 29 Some theorems on the sectional Kcurvature (g, K) – tracefree if E = tr g K = 0 Theorem Let (g, K) be a tracefree statistical structure on M with symmetric ˆ K. If the sectional Kcurvature is constant then either K = 0 (the statistical structure is trivial) or ˆR = 0 and ˆ K = 0. Theorem Let ˆ K = 0. Each of the following conditions implies that ˆR = 0: 1) the sectional Kcurvature is negative, 2) [K,K]=0 and K is nondegenerate, i.e. X → KX is a monomorphism. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 22 / 29 Theorem K is as in Example 1. at each point of M, ˆ K is symmetric, div E is constant on M (E = tr g K). Then the sectional curvature for g by any plane containing E is nonpositive. Moreover, if M is connected it is constant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g) by any plane containing E vanishes. Theorem If the sectional Kcurvature is nonpositive on M and [K, K] · K = 0 then the sectional Kcurvature vanishes on M. Corollary If (g, K) is a Hessian structure on M with nonnegative sectional curvature of g and such that ˆR · K = 0 then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 23 / 29 Theorem The sectional Kcurvature is negative on M, ˆR · K = 0. Then ˆR = 0. Theorem Let M be a Lagrangian submanifold of N, where N is a Kaehler manifold of constant holomorphic curvature 4c, the sectional curvature of the ﬁrst fundamental form g on M is smaller than c on M and ˆR · K = 0, where K is the second fundamental tensor of M ⊂ N. Then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 24 / 29 sectional curvature All aﬃne spheres are statistical manifolds of constant sectional curvature A Riemann curvaturelike tensor deﬁnes the curvature operator. For instance, for the curvature tensor R = (R + R)/2 we have the curvature operator R : Λ2TM → Λ2TM given by g(R(X ∧ Y ), Z ∧ W ) = g(R(Z, W )Y , X) A curvature operator is symmetric relative to the canonical extension of g to the bundle Λ2TM. Hence it is diagonalizable. In particular, it can be positive deﬁnite, negative deﬁnite etc. The assumption that R is positive deﬁnite is stronger than the assumption that the sectional curvature is positive. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 25 / 29 Theorem Let M be a connected compact oriented manifold and (g, ) be a tracefree statistical structure on M. If R = R and the curvature operator determined by the curvature tensor ˆR is positive deﬁnite on M then the sectional curvature is constant. Theorem Let M be a connected compact oriented manifold and (g, ) be a tracefree statistical structure on M. If the curvature operator for R = R+R 2 is positive on M then the Betti numbers b1(M) = ... = bn−1(M) = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 26 / 29 sectional curvature for g ˆk(π) = g(ˆR(X, Y )Y , X), X, Y – an orthonormal basis for π Theorem Let M be a compact manifold equipped with a tracefree statistical structure (g, ) such that R = R. If the sectional curvature ˆk for g is positive on M then the structure is trivial, that is = ˆ . In the 2dimensional case we have Theorem Let M be a compact surface equipped with a tracefree statistical structure (g, ). If M is of genus 0 and R = R then the structure is trivial. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 27 / 29 B. Opozda, Bochner’s technique for statistical manifolds, Annals of Global Analysis and Geometry, DOI 10.1007/s104550159475z B. Opozda, A sectional curvature for statistical structures, arXiv:1504.01279[math.DG] Barbara Opozda () Curvatures of statistical structures Paris, October 2015 28 / 29 Hessian structures (g, ) – Hessian if R = 0. Then R = 0 and ˆR = −[K, K]. (g, ) is Hessian if and only if ˆ K is symmetric and ˆR = −[K, K]. All Hessian structure are locally realizable on aﬃne hypersurfaces in Rn+1 equipped with Calabi’s structure. If they are tracefree they are locally realizable on improper aﬃne spheres. If the diﬀerence tensor is as in Example 1. and the structure is Hessian then K = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 29 / 29
Hideyuki Ishi
Keywords = Hessian metric, Homogeneous cone, Leftsymmetric algebra
Abstract
Michel Boyom, Jamali Mohammed, Shahid Hasan
Keywords =
Abstract
News
Information SEE 120 participants already! 
Communiqué de presse GSI2015 registration is open! 
Venue
Ecole Polytechnique, ParisSaclay (France)
91128 Palaiseau
France
How to attend GSI’15 at Ecole Polytechnique on ParisSaclay Campus :
 Maps & Directions: http://www.polytechnique.edu/en/mapsanddirections
 Campus Map: https://gargantua.polytechnique.fr/siatelweb/linkto/mICYYYS(p9Y6
 Amphitheaters : https://gargantua.polytechnique.fr/siatelweb/linkto/mICYYYSJ8RY
 ParisSaclay Campus: http://www.polytechnique.edu/en/parissaclay