GSI2015

About

LIX Colloquium 2015 conferences

As for GSI’13, the objective of this SEE Conference GSI’15, hosted by Ecole Polytechnique, is to bring together pure/applied mathematicians and engineers, with common interest for Geometric tools and their applications for Information analysis.
It emphasizes an active participation of young researchers to discuss emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.
Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for industry.
The Conference will be therefore held in areas of priority/focused themes and topics of mutual interest with the aim to:
  • Provide an overview on the most recent state-of-the-art
  • Exchange mathematical information/knowledge/expertise in the area
  • Identify research areas/applications for future collaboration
  • Identify academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will unify skills from Geometry, Probability and Information Theory. The conference proceedings are published in Springer's Lecture Note in Computer Science (LNCS) series. 

Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI

Provisional Topics of Special Sessions:

  • Manifold/Topology Learning
  • Riemannian Geometry in Manifold Learning
  • Optimal Transport theory and applications in Imagery/Statistics
  • Shape Space & Diffeomorphic mappings
  • Geometry of distributed optimization
  • Random Geometry/Homology
  • Hessian Information Geometry
  • Topology and Information
  • Information Geometry Optimization
  • Divergence Geometry
  • Optimization on Manifold
  • Lie Groups and Geometric Mechanics/Thermodynamics
  • Quantum Information Geometry
  • Infinite Dimensional Shape spaces
  • Geometry on Graphs
  • Bayesian and Information geometry for inverse problems
  • Geometry of Time Series and Linear Dynamical Systems
  • Geometric structure of Audio Processing  
  • Lie groups in Structural Biology
  • Computational Information Geometry

Committees

Secrétaire

Webmestre

Program chairs

Scientific committee

Sponsors and Organizers

Documents

XLS

Opening Session (chaired by Frédéric Barbaresco)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International

Geometric Science of Information SEE/SMAI GSI’15 Conference LIX Colloquium 2015 Frédéric BARBARESCO* & Frank Nielsen** GSI’15 General Chairmen (*) President of SEE ISIC Club (Ingéniérie des Systèmes d’Information de Communications) (**) LIX Department, Ecole Polytechnique Société de l'électricité, de l'électronique et des technologies de l'information et de la communication Flash-back GSI’13 Ecole des Mines de Paris Hirohiko Shima Jean-Louis Koszul Shin-Ichi Amari SEE at a glance • Meeting place for science, industry and society • An officialy recognised non-profit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 19 «Clubs techniques» and 12 «Groupes régionaux» • Organizes conferences and seminars • Initiates/attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, Blanc-Lapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 3 periodical publications (REE, …) & 3 monographs each year • Web: http://www.see.asso.fr and LinkedIn SEE group • SEE Presidents: Louis de Broglie, Paul Langevin, … 1883-2015: From SIE & SFE to SEE: 132 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 Louis de Broglie Paul Langevin GSI’15 Sponsors GSI Logo: Adelard of Bath • He left England toward the end of the 11th century for Tours in France • Adelard taught for a time at Laon, leaving Laon for travel no later than 1109. • After Laon, he travelled to Southern Italy and Sicily no later than 1116. • Adelard also travelled extensively throughout the "lands of the Crusades": Greece, West Asia, Sicily, Spain, and potentially Palestine. The frontispiece of an Adelard of Bath Latin translation of Euclid's Elements, c. 1309– 1316; the oldest surviving Latin translation of the Elements is a 12th-century translation by Adelard from an Arabic version Adelard of Bath was the first to translate Euclid’s Elements in Latin Adelard of Bath has introduced the word « Algorismus » in Latin after his translation of Al Khuwarizmi SMAI/SEE GSI’15 • More than 150 attendees from 15 different countries • 85 scientific presentations on 3 days • 3 keynote speakers • Mathilde MARCOLLI (CallTech): “From Geometry and Physics to Computational Linguistics” • Tudor RATIU (EPFL): “Symmetry methods in geometric mechanics” • Marc ARNAUDON (Bordeaux University): “Stochastic Euler-Poincaré reduction” • 1 Short Course • Chaired by Roger BALIAN • Dominique SPEHNER (Grenoble University): “Geometry on the set of quantum states and quantum correlations” • 1 Guest speaker • Charles-Michel MARLE (UPMC): “Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems” • Social events: • Welcome cocktail at Ecole Polytechnique • Diner in Versailles Palace Gardens GSI’15 Topics • GSI’15 federates skills from Geometry, Probability and Information Theory: • Dimension reduction on Riemannian manifolds • Optimal Transport and applications in Imagery/Statistics • Shape Space & Diffeomorphic mappings • Random Geometry/Homology • Hessian Information Geometry • Topological forms and Information • Information Geometry Optimization • Information Geometry in Image Analysis • Divergence Geometry • Optimization on Manifold • Lie Groups and Geometric Mechanics/Thermodynamics • Computational Information Geometry • Lie Groups: Novel Statistical and Computational Frontiers • Geometry of Time Series and Linear Dynamical systems • Bayesian and Information Geometry for Inverse Problems • Probability Density Estimation GSI’15 Program GSI’15 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 9389 (800 pages), ISBN 978-3-319-25039-7 • http://www.springer.com/us/book/9783319250397 GSI’15 Special Issue • Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI • http://www.mdpi.com/journal/entropy/special_issues/entropy-statistics • A book could be edited by MDPI: e.g. Ecole Polytechnique • Special thanks to « LIX » Department A product of the French Revolution and the Age of Enlightenment, École Polytechnique has a rich history that spans over 220 years. https://www.polytechnique.edu/en/history Henri Poincaré – X1873 Paris-Saclay University in Top 8 World Innovation Hubs http://www.technologyreview.com/news/517626/ infographic-the-worlds-technology-hubs/ A new Grammar of Information “Mathematics is the art of giving the same name to different things” – Henri Poincaré GROUP EVERYWHERE Elie Cartan Henri Poincaré METRIC EVERYWHERE Maurice Fréchet Misha Gromov “the problems addressed by Elie Cartan are among the most important, most abstract and most general dealing with mathematics; group theory is, so to speak, the whole mathematics, stripped of its material and reduced to pure form. This extreme level of abstraction has probably made my presentation a little dry; to assess each of the results, I would have had virtually render him the material which he had been stripped; but this refund can be made in a thousand different ways; and this is the only form that can be found as well as a host of various garments, which is the common link between mathematical theories that are often surprised to find so near” H. Poincaré Elie Cartan: Group Everywhere (Henri Poincaré review of Cartan’s Works) Maurice Fréchet: Metric Everywhere • Maurice Fréchet made major contributions to the topology of point sets and introduced the entire concept of metric spaces. • His dissertation opened the entire field of functionals on metric spaces and introduced the notion of compactness. • He has extended Probability in Metric space 1948 (Annales de l’IHP) Les éléments aléatoires de nature quelconque dans un espace distancié Extension of Probability/Statistic in abstract/Metric space GSI’15 & Geometric Mechanics • The master of geometry during the last century, Elie Cartan, was the son of Joseph Cartan who was the village blacksmith. • Elie recalled that his childhood had passed under “blows of the anvil, which started every morning from dawn”. • We can imagine easily that the child, Elie Cartan, watching his father Joseph “coding curvature” on metal between the hammer and the anvil, insidiously influencing Elie’s mind with germinal intuition of fundamental geometric concepts. • The etymology of the word “Forge”, that comes from the late XIV century, “a smithy”, from Old French forge “forge, smithy” (XII century), earlier faverge, from Latin fabrica “workshop, smith’s shop”, from faber (genitive fabri) “workman in hard materials, smith”. HAMMER = The CoderANVIL = Curvature Libraries Bigorne Bicorne Venus at the Forge of Vulcan, Le Nain Brothers, Musée Saint-Denis, Reims From Homo Sapiens to Homo Faber “Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson Into the Flaming Forge of Vulcan, Diego Velázquez, Museo Nacional del Prado Geometric Thermodynamics & Statistical Physics Enjoy all « Geometries » (Dinner at Versailles Palace Gardens) Restaurant of GSI’15 Gala Dinner André Le Nôtre Landscape Geometer of Versailles the Apex of “Le Jardin à la française” Louis XIV Patron of Science The Royal Academy of Sciences was established in 1666 On 1st September 1715, 300 years ago, Louis XIV passed away at the age of 77, having reigned for 72 years Keynote Speakers Prof. Mathilde MARCOLLI (CALTECH, USA) From Geometry and Physics to Computational Linguistics Abstact: I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic structure and language evolution, within the context of Chomsky's Principles and Parameters framework. Biography: • Laurea in Physics, University of Milano, 1993 • Master of Science, Mathematics, University of Chicago, 1994 • PhD, Mathematics, University of Chicago, 1997 • Moore Instructor, Massachusetts Institute of Technology, 1997-2000 • Associate Professor (C3), Max Planck Institute for Mathematics, 2000-2008 • Professor, California Institute of Technology, 2008-present • Distinguished Visiting Research Chair, Perimeter Institute for Theoretical Physics, 2013-present . Talk chaired by Daniel Bennequin Keynote Speakers Prof. Marc ARNAUDON (Bordeaux University, France) Stochastic Euler-Poincaré reduction Abstact: We will prove a Euler-Poincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the Navier-Stokes equation on a bounded domain and the Camassa-Holm equation. Biography: Marc Arnaudon was born in France in 1965. He graduated from Ecole Normale Supérieure de Paris, France, in 1991. He received the PhD degree in mathematics and the Habilitation à diriger des Recherches degree from Strasbourg University, France, in January 1994 and January 1998 respectively. After postdoctoral research and teaching at Strasbourg, he began in September 1999 a full professor position in the Department of Mathematics at Poitiers University, France, where he was the head of the Probability Research Group. In January 2013 he left Poitiers and joined the Department of Mathematics of Bordeaux University, France, where he is a full professor in mathematics. Talk chaired by Frank Nielsen Keynote Speakers Prof. Tudor RATIU (EPFL, Switzerland) Symmetry methods in geometric mechanics Abstact: The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. Biography: • BA in Mathematics, University of Timisoara, Romania, 1973 • MA in Applied Mathematics, University of Timisoara, Romania, 1974 • Ph.D. in Mathematics, University of California, Berkeley, 1980 • T.H. Hildebrandt Research Assistant Professor, University of Michigan, Ann Arbor, USA 1980-1983 • Associate Professor of Mathematics, University of Arizona, Tuscon, USA 1983- 1988 • Professor of Mathematics, University of California, Santa Cruz, USA, 1988-2001 • Chaired Professor of Mathematics, Ecole Polytechnique Federale de Lausanne, Switzerland, 1998 - present • Professor of Mathematics, Skolkovo Institute of Science and Technonology, Moscow, Russia, 2014 - present Talk chaired by Xavier Pennec Short Course Prof. Dominique SPEHNER (Grenoble University) Geometry on the set of quantum states and quantum correlations Abstact: I will show that the set of states of a quantum system with a finite- dimensional Hilbert space can be equipped with various Riemannian distances having nice properties from a quantum information viewpoint, namely they are contractive under all physically allowed operations on the system. The corresponding metrics are quantum analogs of the Fisher metric and have been classified by D. Petz. Two distances are particularly relevant physically: the Bogoliubov-Kubo-Mori distance studied by R. Balian, Y. Alhassid and H. Reinhardt, and the Bures distance studied by A. Uhlmann and by S.L. Braunstein and C.M. Caves. The latter gives the quantum Fisher information playing an important role in quantum metrology. A way to measure the amount of quantum correlations (entanglement or quantum discord) in bipartite systems (that is, systems composed of two parties) with the help of these distances will be also discussed. Biography: • Diplôme d'Études Approfondies (DEA) in Theoretical Physics at the École Normale Supérieure de Lyon, 1994 • Civil Service (Service National de la Coopération), Technion Institute of Technology, Haifa, Israel, 1995-1996 • PhD in Theoretical Physics, Université Paul Sabatier, Toulouse, France, 1996- 2000. • Postdoctoral fellow, Pontificia Universidad Católica, Santiago, Chile, 2000-2001 • Research Associate, University of Duisburg-Essen, Germany, 2001-2005 • Maître de Conférences, Université Joseph Fourier, Grenoble, France, 2005-present • Habilitation à diriger des Recherches (HDR), Université Grenoble Alpes, 2015 • Member of the Institut Fourier (since 2005) and the Laboratoire de Physique et Modélisation des Milieux Condensés (since 2013) of the university Grenoble Alpes, France Talk chaired by Roger Balian Guest Speakers Prof. Charles-Michel MARLE (UPMC, France) Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems Abstact: I will present some tools in Symplectic and Poisson Geometry in view of their applications in Geometric Mechanics and Mathematical Physics. Lie group and Lie algebra actions on symplectic and Poisson manifolds, momentum maps and their equivariance properties, first integrals associated to symmetries of Hamiltonian systems will be discussed. Reduction methods taking advantage of symmetries will be discussed. Biography: Charles-Michel Marle was born in 1934; He studied at Ecole Polytechnique (1953-1955), Ecole Nationale Supérieure des Mines de Paris (1957-1958) and Ecole Nationale Supérieure du Pétrole et des Moteurs (1957-1958). He obtained a doctor's degree in Mathematics at the University of Paris in 1968. From 1959 to 1969 he worked as a research engineer at the Institut Français du Pétrole. He joined the Université de Besançon as Associate Professor in 1969, and the Université Pierre et Marie Curie, first as Associate Professor (1975) and then as full Professor (1981). His resarch works were first about fluid flows through porous media, then about Differential Geometry, Hamiltonian systems and applications in Mechanics and Mathematical Physics. Talk chaired by Frédéric Barbaresco

Keynote speach Matilde Marcolli (chaired by Daniel Bennequin)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14257

Authors = Matilde Marcolli
Keywords =
Abstract
I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic 


Voir la vidéo
From Geometry and Physics to Computational Linguistics

From Geometry and Physics to Computational Linguistics Matilde Marcolli Geometric Science of Information, Paris, October 2015 Matilde Marcolli Geometry, Physics, Linguistics A Mathematical Physicist’s adventures in Linguistics Based on: 1 Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 2 Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 3 Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 4 Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you Matilde Marcolli Geometry, Physics, Linguistics What is Linguistics? • Linguistics is the scientific study of language - What is Language? (langage, lenguaje, ...) - What is a Language? (lange, lengua,...) Similar to ‘What is Life?’ or ‘What is an organism?’ in biology • natural language as opposed to artificial (formal, programming, ...) languages • The point of view we will focus on: Language is a kind of Structure - It can be approached mathematically and computationally, like many other kinds of structures - The main purpose of mathematics is the understanding of structures Matilde Marcolli Geometry, Physics, Linguistics • How are di↵erent languages related? What does it mean that they come in families? • How do languages evolve in time? Phylogenetics, Historical Linguistics, Etymology • How does the process of language acquisition work? (Neuroscience) • Semiotic viewpoint (mathematical theory of communication) • Discrete versus Continuum (probabilistic methods, versus discrete structures) • Descriptive or Predictive? to be predictive, a science needs good mathematical models Matilde Marcolli Geometry, Physics, Linguistics A language exists at many di↵erent levels of structure An Analogy: Physics looks very di↵erent at di↵erent scales: General Relativity and Cosmology ( 1010 m) Classical Physics (⇠ 1 m) Quantum Physics ( 10 10 m) Quantum Gravity (10 35 m) Despite dreams of a Unified Theory, we deal with di↵erent mathematical models for di↵erent levels of structure Matilde Marcolli Geometry, Physics, Linguistics Similarly, we view language at di↵erent “scales”: units of sound (phonology) words (morphology) sentences (syntax) global meaning (semantics) We expect to be dealing with di↵erent mathematical structures and di↵erent models at these various di↵erent levels Main level I will focus on: Syntax Matilde Marcolli Geometry, Physics, Linguistics Linguistics view of syntax kind of looks like this... Alexander Calder, Mobile, 1960 Matilde Marcolli Geometry, Physics, Linguistics Modern Syntactic Theory: • grammaticality: judgement on whether a sentence is well formed (grammatical) in a given language, i-language gives people the capacity to decide on grammaticality • generative grammar: produce a set of rules that correctly predict grammaticality of sentences • universal grammar: ability to learn grammar is built in the human brain, e.g. properties like distinction between nouns and verbs are universal ... is universal grammar a falsifiable theory? Matilde Marcolli Geometry, Physics, Linguistics Principles and Parameters (Government and Binding) (Chomsky, 1981) • principles: general rules of grammar • parameters: binary variables (on/o↵ switches) that distinguish languages in terms of syntactic structures • Example of parameter: head-directionality (head-initial versus head-final) English is head-initial, Japanese is head-final VP= verb phrase, TP= tense phrase, DP= determiner phrase Matilde Marcolli Geometry, Physics, Linguistics ...but not always so clear-cut: German can use both structures auf seine Kinder stolze Vater (head-final) or er ist stolz auf seine Kinder (head-initial) AP= adjective phrase, PP= prepositional phrase • Corpora based statistical analysis of head-directionality (Haitao Liu, 2010): a continuum between head-initial and head-final Matilde Marcolli Geometry, Physics, Linguistics Examples of Parameters Head-directionality Subject-side Pro-drop Null-subject Problems • Interdependencies between parameters • Diachronic changes of parameters in language evolution Matilde Marcolli Geometry, Physics, Linguistics Dependent parameters • null-subject parameter: can drop subject Example: among Latin languages, Italian and Spanish have null-subject (+), French does not (-) it rains, piove, llueve, il pleut • pro-drop parameter: can drop pronouns in sentences • Pro-drop controls Null-subject How many independent parameters? Geometry of the space of syntactic parameters? Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntax • Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 Databases of Syntactic Parameters of World Languages: 1 Syntactic Structures of World Languages (SSWL) http://sswl.railsplayground.net/ 2 TerraLing http://www.terraling.com/ 3 World Atlas of Language Structures (WALS) http://wals.info/ Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Data Sets how data cluster around topological shapes at di↵erent scales Matilde Marcolli Geometry, Physics, Linguistics Vietoris–Rips complexes • set X = {x↵} of points in Euclidean space EN, distance d(x, y) = kx yk = ( PN j=1(xj yj )2)1/2 • Vietoris-Rips complex R(X, ✏) of scale ✏ over field K: Rn(X, ✏) is K-vector space spanned by all unordered (n + 1)-tuples of points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distances d(x↵i , x↵j )  ✏ Matilde Marcolli Geometry, Physics, Linguistics • inclusion maps R(X, ✏1) ,! R(X, ✏2) for ✏1 < ✏2 induce maps in homology by functoriality Hn(X, ✏1) ! Hn(X, ✏2) barcode diagrams: births and deaths of persistent generators Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntactic Parameters • Data: 252 languages from SSWL with 115 parameters • if consider all world languages together too much noise in the persistent topology: subdivide by language families • Principal Component Analysis: reduce dimensionality of data • compute Vietoris–Rips complex and barcode diagrams Persistent H0: clustering of data in components – language subfamilies Persistent H1: clustering of data along closed curves (circles) – linguistic meaning? Matilde Marcolli Geometry, Physics, Linguistics Sources of Persistent H1 • “Hopf bifurcation” type phenomenon • two di↵erent branches of a tree closing up in a loop two di↵erent types of phenomena of historical linguistic development within a language family Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Indo-European Languages • Two persistent generators of H0 (Indo-Iranian, European) • One persistent generator of H1 Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Niger–Congo Languages • Three persistent components of H0 (Mande, Atlantic-Congo, Kordofanian) • No persistent H1 Matilde Marcolli Geometry, Physics, Linguistics The origin of persistent H1 of Indo-European Languages? Naive guess: the Anglo-Norman bridge ... but lexical not syntactic Matilde Marcolli Geometry, Physics, Linguistics Answer: No, it is not the Anglo-Norman bridge! Persistent topology of the Germanic+Latin languages Matilde Marcolli Geometry, Physics, Linguistics Answer: It’s all because of Ancient Greek! Persistent topology with Hellenic (and Indo-Iranic) branch removed Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters as Dynamical Variables • Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSV Very uneven distribution across world languages Matilde Marcolli Geometry, Physics, Linguistics • Word order distribution: a neuroscience explanation? - D. Kemmerer, The cross-linguistic prevalence of SOV and SVO word orders reflects the sequential and hierarchical representation of action in Broca’s area, Language and Linguistics Compass, 6 (2012) N.1, 50–66. • Internal reasons for diachronic switch? - F.Antinucci, A.Duranti, L.Gebert, Relative clause structure, relative clause perception, and the change from SOV to SVO, Cognition, Vol.7 (1979) N.2 145–176. Matilde Marcolli Geometry, Physics, Linguistics Changes over time in Word Order • Ancient Greek: switched from Homeric to Classical - A. Taylor, The change from SOV to SVO in Ancient Greek, Language Variation and Change, 6 (1994) 1–37 • Sanskrit: di↵erent word orders allowed, but prevalent one in Vedic Sanskrit is SOV (switched at least twice by influence of Dravidian languages) - F.J. Staal, Word Order in Sanskrit and Universal Grammar, Springer, 1967 • English: switched from Old English (transitional between SOV and SVO) to Middle English (SVO) - J. McLaughlin, Old English Syntax: a handbook, Walter de Gruyter, 1983. Syntactic Parameters are Dynamical in Language Evolution Matilde Marcolli Geometry, Physics, Linguistics Spin Glass Models of Syntax • Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 – focus on linguistic change caused by language interactions – think of syntactic parameters as spin variables – spin interaction tends to align (ferromagnet) – strength of interaction proportional to bilingualism (MediaLab) – role of temperature parameter: probabilistic interpretation of parameters – not all parameters are independent: entailment relations – Metropolis–Hastings algorithm: simulate evolution Matilde Marcolli Geometry, Physics, Linguistics The Ising Model of spin systems on a graph G • configurations of spins s : V (G) ! {±1} • magnetic field B and correlation strength J: Hamiltonian H(s) = J X e2E(G):@(e)={v,v0} sv sv0 B X v2V (G) sv • first term measures degree of alignment of nearby spins • second term measures alignment of spins with direction of magnetic field Matilde Marcolli Geometry, Physics, Linguistics Equilibrium Probability Distribution • Partition Function ZG ( ) ZG ( ) = X s:V (G)!{±1} exp( H(s)) • Probability distribution on the configuration space: Gibbs measure PG, (s) = e H(s) ZG ( ) • low energy states weight most • at low temperature (large ): ground state dominates; at higher temperature ( small) higher energy states also contribute Matilde Marcolli Geometry, Physics, Linguistics Average Spin Magnetization MG ( ) = 1 #V (G) X s:V (G)!{±1} X v2V (G) sv P(s) • Free energy FG ( , B) = log ZG ( , B) MG ( ) = 1 #V (G) 1 ✓ @FG ( , B) @B ◆ |B=0 Ising Model on a 2-dimensional lattice • 9 critical temperature T = Tc where phase transition occurs • for T > Tc equilibrium state has m(T) = 0 (computed with respect to the equilibrium Gibbs measure PG, • demagnetization: on average as many up as down spins • for T < Tc have m(T) > 0: spontaneous magnetization Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters and Ising/Potts Models • characterize set of n = 2N languages Li by binary strings of N syntactic parameters (Ising model) • or by ternary strings (Potts model) if take values ±1 for parameters that are set and 0 for parameters that are not defined in a certain language • a system of n interacting languages = graph G with n = #V (G) • languages Li = vertices of the graph (e.g. language that occupies a certain geographic area) • languages that have interaction with each other = edges E(G) (geographical proximity, or high volume of exchange for other reasons) Matilde Marcolli Geometry, Physics, Linguistics graph of language interaction (detail) from Global Language Network of MIT MediaLab, with interaction strengths Je on edges based on number of book translations (or Wikipedia edits) Matilde Marcolli Geometry, Physics, Linguistics • if only one syntactic parameter, would have an Ising model on the graph G: configurations s : V (G) ! {±1} set the parameter at all the locations on the graph • variable interaction energies along edges (some pairs of languages interact more than others) • magnetic field B and correlation strength J: Hamiltonian H(s) = X e2E(G):@(e)={v,v0} NX i=1 Je sv,i sv0,i • if N parameters, configurations s = (s1, . . . , sN) : V (G) ! {±1}N • if all N parameters are independent, then it would be like having N non-interacting copies of a Ising model on the same graph G (or N independent choices of an initial state in an Ising model on G) Matilde Marcolli Geometry, Physics, Linguistics Metropolis–Hastings • detailed balance condition P(s)P(s ! s0) = P(s0)P(s0 ! s) for probabilities of transitioning between states (Markov process) • transition probabilities P(s ! s0) = ⇡A(s ! s0) · ⇡(s ! s0) with ⇡(s ! s0) conditional probability of proposing state s0 given state s and ⇡A(s ! s0) conditional probability of accepting it • Metropolis–Hastings choice of acceptance distribution (Gibbs) ⇡A(s ! s0 ) = ⇢ 1 if H(s0) H(s)  0 exp( (H(s0) H(s))) if H(s0) H(s) > 0. satisfying detailed balance • selection probabilities ⇡(s ! s0) single-spin-flip dynamics • ergodicity of Markov process ) unique stationary distribution Matilde Marcolli Geometry, Physics, Linguistics Example: Single parameter dynamics Subject-Verb parameter Initial configuration: most languages in SSWL have +1 for Subject-Verb; use interaction energies from MediaLab data Matilde Marcolli Geometry, Physics, Linguistics Equilibrium: low temperature all aligned to +1; high temperature: Temperature: fluctuations in bilingual users between di↵erent structures (“code-switching” in Linguistics) Matilde Marcolli Geometry, Physics, Linguistics Entailment relations among parameters • Example: {p1, p2} = {Strong Deixis, Strong Anaphoricity} p1 p2 `1 +1 +1 `2 1 0 `3 +1 +1 `4 +1 1 {`1, `2, `3, `4} = {English, Welsh, Russian, Bulgarian} Matilde Marcolli Geometry, Physics, Linguistics Modeling Entailment • variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} and Y`,p2 = |S`,p2 | 2 {0, 1} • Hamiltonian H = HE + HV HE = Hp1 + Hp2 = X `,`02languages J``0 ⇣ S`,p1 ,S`0,p1 + S`,p2 ,S`0,p2 ⌘ HV = X ` HV ,` = X ` J` X`,p1 ,Y`,p2 J` > 0 anti-ferromagnetic • two parameters: temperature as before and coupling energy of entailment • if freeze p1 and evolution for p2: Potts model with external magnetic field Matilde Marcolli Geometry, Physics, Linguistics Acceptance probabilities ⇡A(s ! s ± 1 (mod 3)) = ⇢ 1 if H  0 exp( H) if H > 0. H := min{H(s + 1 (mod 3)), H(s 1 (mod 3))} H(s) Equilibrium configuration (p1, p2) HT/HE HT/LE LT/HE LT/LE `1 (+1, 0) (+1, 1) (+1, +1) (+1, 1) `2 (+1, 1) ( 1, 1) (+1, +1) (+1, 1) `3 ( 1, 0) ( 1, +1) (+1, +1) ( 1, 0) `4 (+1, +1) ( 1, 1) (+1, +1) ( 1, 0) Matilde Marcolli Geometry, Physics, Linguistics Average value of spin p1 left and p2 right in low entailment energy case Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters in Kanerva Networks • Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 – Address two issues: relative prevalence of di↵erent syntactic parameters and “degree of recoverability” (as sign of underlying relations between parameters) – If corrupt information about one parameter in data of group of languages can recover it from the data of the other parameters? – Answer: di↵erent parameters have di↵erent degrees of recoverability – Used 21 parameters and 165 languages from SSWL database Matilde Marcolli Geometry, Physics, Linguistics Kanerva networks (sparse distributed memories) • P. Kanerva, Sparse Distributed Memory, MIT Press, 1988. • field F2 = {0, 1}, vector space FN 2 large N • uniform random sample of 2k hard locations with 2k << 2N • median Hamming distance between hard locations • Hamming spheres of radius slightly larger than median value (access sphere) • writing to network: storing datum X 2 FN 2 , each hard location in access sphere of X gets i-th coordinate (initialized at zero) incremented depending on i-th entry ot X • reading at a location: i-th entry determined by majority rule of i-th entries of all stored data in hard locations within access sphere Kanerva networks are good at reconstructing corrupted data Matilde Marcolli Geometry, Physics, Linguistics Procedure • 165 data points (languages) stored in a Kanerva Network in F21 2 (choice of 21 parameters) • corrupting one parameter at a time: analyze recoverability • language bit-string with a single corrupted bit used as read location and resulting bit string compared to original bit-string (Hamming distance) • resulting average Hamming distance used as score of recoverability (lowest = most easily recoverable parameter) Matilde Marcolli Geometry, Physics, Linguistics Parameters and frequencies 01 Subject-Verb (0.64957267) 02 Verb-Subject (0.31623933) 03 Verb-Object (0.61538464) 04 Object-Verb (0.32478634) 05 Subject-Verb-Object (0.56837606) 06 Subject-Object-Verb (0.30769232) 07 Verb-Subject-Object (0.1923077) 08 Verb-Object-Subject (0.15811966) 09 Object-Subject-Verb (0.12393162) 10 Object-Verb-Subject (0.10683761) 11 Adposition-Noun-Phrase (0.58974361) 12 Noun-Phrase-Adposition (0.2905983) 13 Adjective-Noun (0.41025642) 14 Noun-Adjective (0.52564102) 15 Numeral-Noun (0.48290598) 16 Noun-Numeral (0.38034189) 17 Demonstrative-Noun (0.47435898) 18 Noun-Demonstrative (0.38461539) 19 Possessor-Noun (0.38034189) 20 Noun-Possessor (0.49145299) A01 Attributive-Adjective-Agreement (0.46581197) Matilde Marcolli Geometry, Physics, Linguistics Matilde Marcolli Geometry, Physics, Linguistics Overall e↵ect related to relative prevalence of a parameter Matilde Marcolli Geometry, Physics, Linguistics More refined e↵ect after normalizing for prelavence (syntactic dependencies) Matilde Marcolli Geometry, Physics, Linguistics • Overall e↵ect relating recoverability in a Kanerva Network to prevalence of a certain parameter among languages (depends only on frequencies: see in random data with assigned frequencies) • Additional e↵ects (that deviate from random case) which detect possible dependencies among syntactic parameters: increased recoverability beyond what e↵ect based on frequency • Possible neuroscience implications? Kanerva Networks as models of human memory (parameter prevalence linked to neuroscience models) • More refined data if divided by language families? Matilde Marcolli Geometry, Physics, Linguistics Phylogenetic Linguistics (WORK IN PROGRESS) • Constructing family trees for languages (sometimes possibly graphs with loops) • Main information about subgrouping: shared innovation a specific change with respect to other languages in the family that only happens in a certain subset of languages - Example: among Mayan languages: Huastecan branch characterized by initial w becoming voiceless before a vowel and ts becoming t, q becoming k, ... Quichean branch by velar nasal becoming velar fricative, ´c becoming ˇc (prepalatal a↵ricate to palato-alveolar)... Known result by traditional Historical Linguistics methods: Matilde Marcolli Geometry, Physics, Linguistics Mayan Language Tree Matilde Marcolli Geometry, Physics, Linguistics Computational Methods for Phylogenetic Linguistics • Peter Foster, Colin Renfrew, Phylogenetic methods and the prehistory of languages, McDonald Institute Monographs, 2006 • Several computational methods for constructing phylogenetic trees available from mathematical and computational biology • Phylogeny Programs http://evolution.genetics.washington.edu/phylip/software.html • Standardized lexical databases: Swadesh list (100 words, or 207 words) Matilde Marcolli Geometry, Physics, Linguistics • Use Swadesh lists of languages in a given family to look for cognates: - without additional etymological information (keep false positives) - with additional etymological information (remove false positives) • Two further choices about loan words: - remove loan words - keep loan words • Keeping loan words produces graphs that are not trees • Without loan words it should produce trees, but small loops still appear due to ambiguities (di↵erent possible trees matching same data) ... more precisely: coding of lexical data ... Matilde Marcolli Geometry, Physics, Linguistics Coding of lexical data • After compiling lists of cognate words for pairs of languages within a given family (with/without lexical information and loan words) • Produce a binary string S(L1, L2) = (s1, . . . , sN) for each pair of languages L1, L2, with entry 0 or 1 at the i-th word of the lexical list of N words if cognates for that meaning exist in the two languages or not (important to pay attention to synonyms) • lexical Hamming distance between two languages d(L1, L2) = #{i 2 {1, . . . , N} | si = 1} counts words in the list that do not have cognates in L1 and L2 Matilde Marcolli Geometry, Physics, Linguistics Distance-matrix method of phylogenetic inference • after producing a measure of “genetic distance” Hamming metric dH(La, Lb) • hierarchical data clustering: collecting objects in clusters according to their distance • simplest method of tree construction: neighbor joining (1) - create a (leaf) vertex for each index a (ranging over languages in given family) (2) - given distance matrix D = (Dab) distances between each pair Dab = dH(La, Lb) construct a new matrix Q-test Q = (Qab) with Qab = (n 2)Dab nX k=1 Dak nX k=1 Dbk this matrix Q decides first pairs of vertices to join Matilde Marcolli Geometry, Physics, Linguistics (3) - identify entries Qab with lowest values: join each such pair (a, b) of leaf vertices to a newly created vertex vab (4) - set distances to new vertex by d(a, vab) = 1 2 Dab + 1 2(n 2) nX k=1 Dak nX k=1 Dbk ! d(b, vab) = Dab d(a, vab) d(k, vab) = 1 2 (Dak + Dbk Dab) (5) - remove a and b and keep vab and all the remaining vertices and the new distances, compute new Q matrix and repeat until tree is completed Matilde Marcolli Geometry, Physics, Linguistics Neighborhood-Joining Method for Phylogenetic Inference Matilde Marcolli Geometry, Physics, Linguistics Example of a neighbor-joining lexical linguistic phylogenetic tree from Delmestri-Cristianini’s paper Matilde Marcolli Geometry, Physics, Linguistics N. Saitou, M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol. Vol.4 (1987) N. 4, 406-425. R. Mihaescu, D. Levy, L. Pachter, Why neighbor-joining works, arXiv:cs/0602041v3 A. Delmestri, N. Cristianini, Linguistic Phylogenetic Inference by PAM-like Matrices, Journal of Quantitative Linguistics, Vol.19 (2012) N.2, 95-120. F. Petroni, M. Serva, Language distance and tree reconstruction, J. Stat. Mech. (2008) P08012 Matilde Marcolli Geometry, Physics, Linguistics Syntactic Phylogenetic Trees (instead of lexical) • instead of coding lexical data based on cognate words, use binary variables of syntactic parameters • Hamming distance between binary string of parameter values • shown recently that one gets an accurate reconstruction of the phylogenetic tree of Indo-European languages from syntactic parameters only • G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, Towards a syntactic phylogeny of modern Indo-European languages, Journal of Historical Linguistics 3 (2013) N.1, 122–152. • G. Longobardi, C. Guardiano, Evidence for syntax as a signal of historical relatedness, Lingua 119 (2009) 1679–1706. Matilde Marcolli Geometry, Physics, Linguistics Work in Progress • Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you – Assembled a phylogenetic tree of world languages using the SSWL database of syntactic parameters – Ongoing comparison with specific historical linguistic reconstruction of phylogenetic trees – Comparison with Computational Linguistic reconstructions based on lexical data (Swadesh lists) and on phonetical analysis – not all linguistic families have syntactic parameters mapped with same level of completeness... di↵erent levels of accuracy in reconstruction Matilde Marcolli Geometry, Physics, Linguistics

Random Geometry/Homology (chaired by Laurent Decreusefond/Frédéric Chazal)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14258
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_19
Authors = Nicolas Chenavier
Keywords = Extreme values, Poisson point process, Random tessellations
Abstract
Let m be a random tessellation in R d , d ≥ 1, observed in the window W p = ρ1/d[0, 1] d , ρ > 0, and let f be a geometrical characteristic. We investigate the asymptotic behaviour of the maximum of f(C) over all cells C ∈ m with nucleus W p as ρ goes to infinity.When the normalized maximum converges, we show that its asymptotic distribution depends on the so-called extremal index. Two examples of extremal indices are provided for Poisson-Voronoi and Poisson-Delaunay tessellations.


Voir la vidéo
The extremal index for a random tessellation

Random tessellations Main problem Extremal index The extremal index for a random tessellation Nicolas Chenavier Université Littoral Côte d’Opale October 28, 2015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Plan 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Random tessellations Definition A (convex) random tessellation m in Rd is a partition of the Euclidean space into random polytopes (called cells). We will only consider the particular case where m is a : Poisson-Voronoi tessellation ; Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Voronoi tessellation X, Poisson point process in Rd ; ∀x ∈ X, CX(x) := {y ∈ Rd , |y − x| ≤ |y − x |, x ∈ X} (Voronoi cell with nucleus x) ; mPVT := {CX(x), x ∈ X}, Poisson-Voronoi tessellation ; ∀CX(x) ∈ mPVT , we let z(CX(x)) := x. x CX(x) Mosaique de Poisson-Voronoi Figure: Poisson-Voronoi tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Delaunay tessellation X, Poisson point process in Rd ; ∀x, x ∈ X, x and x define an edge if CX(x) ∩ CX(x ) = ∅ ; mPDT , Poisson-Delaunay tessellation ; ∀C ∈ mPDT , we let z(C) as the circumcenter of C. x x z(C) Mosaique de Poisson-Delaunay Figure: Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Typical cell Definition Let m be a stationary random tessellation. The typical cell of m is a random polytope C in Rd which distribution given as follows : for each bounded translation-invariant function g : {polytopes} → R, we have E [g(C)] := 1 N(B) E     C∈m, z(C)∈B g(C)     , where : B ⊂ R is any Borel subset with finite and non-empty volume ; N(B) is the mean number of cells with nucleus in B. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Main problem Framework : m = mPVT , mPDT ; Wρ := [0, ρ]d , with ρ > 0 ; g : {polytopes} → R, geometrical characteristic. Aim : asymptotic behaviour, when ρ → ∞, of Mg,ρ = max C∈m, z(C)∈Wρ g(C)? Figure: Voronoi cell maximizing the area in the square. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Objective and applications Objective : find ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρt + bg,ρ converges, as ρ → ∞, for each t ∈ R. Applications : regularity of the tessellation ; discrimination of point processes and tessellations ; Poisson-Voronoi approximation. Approximation de Poisson-Voronoi Figure: Poisson-Voronoi approximation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Asymptotics under a local correlation condition Notation : let vρ := ag,ρt + bρ be a threshold such that ρd · P (g(C) > vρ) −→ ρ→∞ τ, for some τ := τ(t) ≥ 0. Local Correlation Condition (LCC) ρd (log ρ)d · E      (C1,C2)=∈m2, z(C1),z(C2)∈[0,log ρ]d 1g(C1)>vρ,g(C2)>vρ      −→ ρ→∞ 0. Theorem Under (LCC), we have : P (Mg,ρ ≤ vρ) −→ ρ→∞ e−τ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Definition of the extremal index Proposition Assume that for all τ ≥ 0, there exists a threshold v (τ) ρ depending on ρ such that ρd · P(g(C) > v (τ) ρ ) −→ ρ→∞ τ. Then there exists θ ∈ [0, 1] such that, for all τ ≥ 0, lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ , provided that the limit exists. Definition According to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if, for each τ ≥ 0, we have : ρd · P g(C) > v(τ) ρ −→ ρ→∞ τ and lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 1 Framework : m := mPVT : Poisson-Voronoi tessellation ; g(C) := r(C) : inradius of any cell C := CX(x) with x ∈ X, i.e. r(C) := r (CX(x)) := max{r ∈ R+ : B(x, r) ⊂ CX(x)}. rmin,PVT (ρ) := minx∈X∩Wρ r (CX(x)). Extremal index : θ = 1/2 for each d ≥ 1. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Minimum of inradius for a Poisson-Voronoi tessellation (b) Typical Poisson−Voronoï cell with a small inradii x y −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 2 Framework : m := mPDT : Poisson-Delaunay tessellation ; g(C) := R(C) : circumradius of any cell C, i.e. R(C) := min{r ∈ R+ : B(x, r) ⊃ C}. Rmax,PDT (ρ) := maxC∈mPDT :z(C)∈Wρ R(C). Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Maximum of circumradius for a Poisson-Delaunay tessellation (d) Typical Poisson−Delaunay cell with a large circumradii x y −15 −10 −5 0 5 10 15 −15−10−5051015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Work in progress Joint work with C. Robert (ISFA, Lyon 1) : new characterization of the extremal index (not based on classical block and run estimators appearing in the classical Extreme Value Theory) ; simulation and estimation for the extremal index and cluster size distribution (for Poisson-Voronoi and Poisson-Delaunay tessellations). Nicolas Chenavier The extremal index for a random tessellation

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14259
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_20
Authors = Charles Kervrann, Frederic Lavancier
Keywords =
Abstract
A model of two-type (or two-color) interacting random balls is introduced. Each colored random set is a union of random balls and the interaction relies on the volume of the intersection between the two random sets. This model is motivated by the detection and quantification of co-localization between two proteins. Simulation and inference are discussed. Since all individual balls cannot been identified, e.g. a ball may contain another one, standard methods of inference as likelihood or pseudolikelihood are not available and we apply the Takacs-Fiksel method with a specific choice of test functions.


Voir la vidéo
A two-color interacting random balls model for co-localization analysis of proteins

A testing procedure A model for co-localization Estimation A two-color interacting random balls model for co-localization analysis of proteins. Frédéric Lavancier, Laboratoire de Mathématiques Jean Leray, Nantes INRIA Rennes, Serpico team Joint work with C. Kervrann (INRIA Rennes, Serpico team). GSI’15, 28-30 October 2015. A testing procedure A model for co-localization Estimation Introduction : some data Vesicular trafficking analysis and colocalization quantification by TIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA] ? =⇒ Langerin proteins (left) and Rab11 GTPase proteins (right). Is there colocalization ? ⇔ Is there some spatial dependencies between the two types of proteins ? A testing procedure A model for co-localization Estimation Image pre-processing After segmentation Superposition : ? ⇒ After a Gaussian weights thresholding Superposition : ? ⇒ A testing procedure A model for co-localization Estimation The problem of co-localization can be described as follows : We observe two binary images in a domain Ω : First image (green) : realization of a random set Γ1 ∩ Ω Second image (red) : realization of a random set Γ2 ∩ Ω −→ Is there some dependencies between Γ1 and Γ2 ? −→ If so, can we quantify/model this dependency ? A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A natural measure of departure from independency is ˆp12 − ˆp1 ˆp2 where ˆp1 = |Ω|−1 x∈Ω 1Γ1 (x), ˆp2 = |Ω|−1 x∈Ω 1Γ2 (x), ˆp12 = |Ω|−1 x∈Ω 1Γ1∩Γ2 (x). A testing procedure A model for co-localization Estimation Testing procedure Assume Γ1 and Γ2 are m-dependent stationary random sets. If Γ1 is independent of Γ2, then as |Ω| tends to infinity, T := |Ω| ˆp12 − ˆp1 ˆp2 x∈Ω y∈Ω ˆC1(x − y) ˆC2(x − y) → N(0, 1) where ˆC1 and ˆC2 are the empirical covariance functions of Γ1 ∩ Ω and Γ2 ∩ Ω respectively. Hence to test the null hypothesis of independence between Γ1 and Γ2 p-value = 2(1 − Φ(|T|)) where Φ is the c.d.f. of the standard normal distribution. A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls Independent case (and each color ∼ Poisson) Number of p−values < 0.05 over 100 realizations : 4. A testing procedure A model for co-localization Estimation Some simulations Dependent case (see later for the model) Number of p−values < 0.05 over 100 realizations : 100. A testing procedure A model for co-localization Estimation Some simulations Independent case, larger radii Number of p−values < 0.05 over 100 realizations : 5. A testing procedure A model for co-localization Estimation Some simulations Dependent case, larger radii and "small" dependence Number of p−values < 0.05 over 100 realizations : 97. A testing procedure A model for co-localization Estimation Real Data Depending on the pre-processing : T = 9.9 T = 17 p − value = 0 p − value = 0 A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. Notation : (ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}. → viewed as a marked point, marked by R and i. xi : collection of all marked points with color i. Hence Γi = (ξ,R)i∈xi (ξ, R)i x = x1 ∪ x2 : collection of all marked points. A testing procedure A model for co-localization Estimation Example : three realizations of the reference process A testing procedure A model for co-localization Estimation The model We consider a density on any bounded domain Ω with respect to the reference model f(x) ∝ zn1 1 zn2 2 eθ |Γ1∩ Γ2| where n1 : number of green balls and n2 : number of red balls. This density depends on 3 parameters z1 : rules the mean number of green balls z2 : rules the mean number of red balls θ : interaction parameter. If θ > 0 : attraction (co-localization) between Γ1 and Γ2 If θ = 0 : back to the reference model, up to the intensities (independence between Γ1 and Γ2). A testing procedure A model for co-localization Estimation Simulation Realizations can be generated by a standard birth-death Metropolis-Hastings algorithm. Examples : A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. Issue : The number of balls n1 and n2 is not observed. ⇒ likelihood or pseudo-likelihood based inference is not feasible. = A testing procedure A model for co-localization Estimation An equilibrium equation Consider, for any non-negative function h, C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) and for i = 1, 2, Ii(θ; h) = Rmax Rmin Ω h((ξ, R)i, x) λ((ξ, R)i, x) 2zi dξ µ(dR). Denoting by z∗ 1 , z∗ 2 and θ∗ the true unknown values of the parameters, we know from the Georgii-Nguyen-Zessin equation that for any h E(C(z∗ 1 , z∗ 2 , θ∗ ; h)) = 0. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable How many ? At least K = 3 because 3 parameters to estimate. A testing procedure A model for co-localization Estimation A first possibility : h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} where S(ξ, R) is the sphere {y, ||y − ξ|| = R}. ⇓ ⇓ ⇓ ⇓ A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = ⇒ S(h1) = P(Γ1) (the perimeter of Γ1) A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). Let h3((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2)c then S(h3) = P(Γ1 ∪ Γ2). A testing procedure A model for co-localization Estimation Simulations with test functions h1, h2 and h3 over 100 realizations θ = 0.2 (and small radii) θ = 0.05 (and large radii) Frequency 0.15 0.20 0.25 0.30 05101520 Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 010203040 A testing procedure A model for co-localization Estimation Real Data We assume the law of the radii is uniform on [Rmin, Rmax]. (each image is embedded in [0, 250] × [0, 280]) Rmin = 0.5, Rmax = 2.5 Rmin = 0.5, Rmax = 10 ˆθ = 0.45 ˆθ = 0.03 A testing procedure A model for co-localization Estimation Conclusion The testing procedure allows to detect co-localization between two binary images is easy and fast to implement does not depend too much on the image pre-processing The model for co-localization relies on geometric features (area of intersection) can be fitted by the Takacs-Fiksel method allows to compare the degree of co-localization θ between two pairs of images if the laws of radii are similar

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14260
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_21
Authors = Aurélien Vasseur, Laurent Decreusefond
Keywords = Ginibre point process, Poisson point process, Stein’s method, Stochastic geometry, β-Ginibre point process
Abstract
The characteristic independence property of Poisson point processes gives an intuitive way to explain why a sequence of point processes becoming less and less repulsive can converge to a Poisson point process. The aim of this paper is to show this convergence for sequences built by superposing, thinning or rescaling determinantal processes. We use Papangelou intensities and Stein’s method to prove this result with a topology based on total variation distance.


Voir la vidéo
Asymptotics of superposition of point processes

I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications 2nd conference on Geometric Science of Information Aurélien VASSEUR Asymptotics of some Point Processes Transformations Ecole Polytechnique, Paris-Saclay, October 28, 2015 1/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Mobile network in Paris - Motivation −2000 0 2000 4000 100020003000 −2000 0 2000 4000 100020003000 Figure: On the left, positions of all BS in Paris. On the right, locations of BS for one frequency band. 2/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Table of Contents I-Generalities on point processes Correlation function, Papangelou intensity and repulsiveness Determinantal point processes II-Kantorovich-Rubinstein distance Convergence dened by dKR dKR(PPP, Φ) ≤ "nice" upper bound III-Applications to transformations of point processes Superposition Thinning Rescaling 3/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Framework Y a locally compact metric space µ a diuse and locally nite measure of reference on Y NY the space of congurations on Y NY the space of nite congurations on Y 4/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Correlation function - Papangelou intensity Correlation function ρ of a point process Φ: E[ α∈NY α⊂Φ f (α)] = +∞ k=0 1 k! ˆ Yk f · ρ({x1, . . . , xk})µ(dx1) . . . µ(dxk) ρ(α) ≈ probability of nding a point in at least each point of α Papangelou intensity c of a point process Φ: E[ x∈Φ f (x, Φ \ {x})] = ˆ Y E[c(x, Φ)f (x, Φ)]µ(dx) c(x, ξ) ≈ conditionnal probability of nding a point in x given ξ 5/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Point process Properties Intensity measure: A ∈ FY → ´ A ρ({x})µ(dx) ρ({x}) = E[c(x, Φ)] If Φ is nite, then: IP(|Φ| = 1) = ˆ Y c(x, ∅)µ(dx) IP(|Φ| = 0). 6/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Poisson point process Properties Φ PPP with intensity M(dy) = m(y)dy Correlation function: ρ(α) = x∈α m(x) Papangelou intensity: c(x, ξ) = m(x) 7/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Repulsive point process Denition Point process repulsive if φ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ) Point process weakly repulsive if c(x, ξ) ≤ c(x, ∅) 8/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Determinantal point process Denition Determinantal point process DPP(K, µ): ρ({x1, · · · , xk}) = det(K(xi , xj ), 1 ≤ i, j ≤ k) Proposition Papangelou intensity of DPP(K, µ): c(x0, {x1, · · · , xk}) = det(J(xi , xj ), 0 ≤ i, j ≤ k) det(J(xi , xj ), 1 ≤ i, j ≤ k) where J = (I − K)−1K. 9/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Ginibre point process Denition Ginibre point process on B(0, R): K(x, y) = 1 π e−1 2 (|x|2 +|y|2 ) exy 1{x∈B(0,R)}1{y∈B(0,R)} β-Ginibre point process on B(0, R): Kβ(x, y) = 1 π e − 1 2β (|x|2 +|y|2 ) e 1 β xy 1{x∈B(0,R)} 1{y∈B(0,R)} 10/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process β-Ginibre point processes 11/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Kantorovich-Rubinstein distance Total variation distance: dTV(ν1, ν2) := sup A∈FY ν1(A),ν2(A)<∞ |ν1(A) − ν2(A)| F : NY → IR is 1-Lipschitz (F ∈ Lip1) if |F(φ1) − F(φ2)| ≤ dTV (φ1, φ2) for all φ1, φ2 ∈ NY Kantorovich-Rubinstein distance: dKR(IP1, IP2) = sup F∈Lip1 ˆ NY F(φ) IP1(dφ) − ˆ NY F(φ) IP2(dφ) Convergence in K.-R. distance =⇒ strictly Convergence in law 12/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Upper bound theorem Theorem (L. Decreusefond, AV) Φ a nite point process on Y ζM a PPP with nite control measure M(dy) = m(y)µ(dy). Then, we have: dKR(IPΦ, IPζM ) ≤ ˆ Y ˆ NY |m(y) − c(y, φ)|IPΦ(dφ)µ(dy). 13/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Φn,1, . . . , Φn,n: n independent, nite and weakly repulsive point processes on Y Φn := n i=1 Φn,i Rn := ´ Y | n i=1 ρn,i (x) − m(x)|µ(dx) ζM a PPP with control measure M(dx) = m(x)µ(dx) 14/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Proposition (LD, AV) Φn = n i=1 Φn,i ζM a PPP with control measure M(dx) = m(x)µ(dx) dKR(IPΦn , IPζM ) ≤ Rn + max 1≤i≤n ˆ Y ρn,i (x)µ(dx) 15/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Consequence Corollary (LD, AV) f pdf on [0; 1] such that f (0+) := limx→0+ f (x) ∈ IR Λ compact subset of IR+ X1, . . . , Xn i.i.d. with pdf fn = 1 n f (1 n ·) Φn = {X1, . . . , Xn} ∩ Λ dKR(Φn, ζ) ≤ ˆ Λ f 1 n x − f (0+) dx + 1 n ˆ Λ f 1 n x dx where ζ is the PPP(f (0+)) reduced to Λ. 16/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning β-Ginibre point processes Proposition (LD, AV) Φn the βn-Ginibre process reduced to a compact set Λ ζ the PPP with intensity 1/π on Λ dKR(IPΦn , IPζ) ≤ Cβn 17/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Kallenberg's theorem Theorem (O. Kallenberg) Φn a nite point process on Y pn : Y → [0; 1) uniformly −−−−−→ 0 Φn the pn-thinning of Φn γM a Cox process (pnΦn) law −−→ M ⇐⇒ (Φn) law −−→ γM 18/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Polish distance (fn) a sequence in the space of real continuous functions with compact support generating FY d∗(ν1, ν2) = n≥1 1 2n Ψ(|ν1(fn) − ν2(fn)|) with Ψ(x) = x 1 + x d∗ KR the Kantorovich-Rubinstein distance associated to the distance d∗ 19/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Thinned point processes Proposition (LD, AV) Φn a nite point process on Y pn : Y → [0; 1) Φn the pn-thinning of Φn γM a Cox process Then, we have: d∗ KR(IPΦn , IPγM ) ≤ 2E[ x∈Φn p2 n(x)] + d∗ KR(IPM, IPpnΦn ). 20/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning References L.Decreusefond, and A.Vasseur, Asymptotics of superposition of point processes, 2015. H.O. Georgii, and H.J. Yoo, Conditional intensity and gibbsianness of determinantal point processes, J. Statist. Phys. (118), January 2004. J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P. Martins, and Wei Chen, A Case Study on Regularity in Cellular Network Deployment, IEEE Wireless Communications Letters, 2015. A.F. Karr, Point Processes and their Statistical Inference, Ann. Probab. 15 (1987), no. 3, 12261227. 21/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Thank you ... ... for your attention. Questions? 22/22 Aurélien VASSEUR Télécom ParisTech

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14355
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_22
Authors = Pierre Calka
Keywords =
Abstract
Random polytopes have constituted some of the central objects of stochastic geometry for more than 150 years. They are in general generated as convex hulls of a random set of points in the Euclidean space. The study of such models requires the use of ingredients coming from both convex geometry and probability theory. In the last decades, the study has been focused on their asymptotic properties and in particular expectation and variance estimates. In several joint works with Tomasz Schreiber and J. E. Yukich, we have investigated the scaling limit of several models (uniform model in the unit-ball, uniform model in a smooth convex body, Gaussian model) and have deduced from it limiting variances for several geometric characteristics including the number of k-dimensional faces and the volume. In this paper, we survey the most recent advances on these questions and we emphasize the particular cases of random polytopes in the unit-ball and Gaussian polytopes.


Voir la vidéo
Asymptotic properties of random polytopes

Asymptotic properties of random polytopes Pierre Calka 2nd conference on Geometric Science of Information ´Ecole Polytechnique, Paris-Saclay, 28 October 2015 default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Joint work with Joseph Yukich (Lehigh University, USA) & Tomasz Schreiber (Toru´n University, Poland) default Outline Random polytopes: an overview Uniform polytopes Gaussian polytopes Expectation asymptotics Main results: variance asymptotics Sketch of proof: Gaussian case default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K50, K ball K50, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K100, K ball K100, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K500, K ball K500, K square default Uniform polytopes Poissonian model K := convex body of Rd Pλ, λ > 0:= Poisson point process of intensity measure λdx Kλ := Conv(Pλ ∩ K) K500, K ball K500, K square default Gaussian polytopes Binomial model Φd (x) := 1 (2π)d/2 e− x 2/2, x ∈ Rd, d ≥ 2 (Xk, k ∈ N∗):= independent and with density Φd Kn := Conv(X1, · · · , Xn) Poissonian model Pλ, λ > 0:= Poisson point process of intensity measure λΦd(x)dx Kλ := Conv(Pλ) default Gaussian polytopes K50 K100 K500 default Gaussian polytopes: spherical shape K50 K100 K500 default Asymptotic spherical shape of the Gaussian polytope Geffroy (1961) : dH(Kn, B(0, 2 log(n))) → n→∞ 0 a.s. K50000 default Expectation asymptotics Considered functionals fk(·) := number of k-dimensional faces, 0 ≤ k ≤ d Vol(·) := volume B. Efron’s relation (1965): Ef0(Kn) = n 1 − EVol(Kn−1) Vol(K) Uniform polytope, K smooth E[fk(Kλ)] ∼ λ→∞ cd,k ∂K κ 1 d+1 s ds λ d−1 d+1 κs := Gaussian curvature of ∂K Uniform polytope, K polytope E[fk(Kλ)] ∼ λ→∞ c′ d,kF(K) logd−1 (λ) F(K) := number of flags of K Gaussian polytope E[fk(Kλ)] ∼ λ→∞ c′′ d,k log d−1 2 (λ) A. R´enyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Affentranger & R. Schneider (1992) default Outline Random polytopes: an overview Main results: variance asymptotics Uniform model, K smooth Uniform model, K polytope Gaussian model Sketch of proof: Gaussian case default Uniform model, K smooth K := convex body of Rd with volume 1 and with a C3 boundary κ := Gaussian curvature of ∂K lim λ→∞ λ−(d−1)/(d+1) Var[fk(Kλ)] = ck,d ∂K κ(z)1/(d+1) dz lim λ→∞ λ(d+3)/(d+1) Var [Vol(Kλ)] = c′ d ∂K κ(z)1/(d+1) dz (ck,d , c′ d explicit positive constants) M. Reitzner (2005): Var[fk (Kλ)] = Θ(λ(d−1)/(d+1) ) default Uniform model, K polytope K := simple polytope of Rd with volume 1 i.e. each vertex of K is included in exactly d facets. lim λ→∞ log−(d−1) (λ)Var[fk(Kλ)] = cd,kf0(K) lim λ→∞ λ2 log−(d−1) (λ)Var[Vol(Kλ)] = c′ d,k f0(K) (ck,d , c′ k,d explicit positive constants) I. B´ar´any & M. Reitzner (2010): Var[fk (Kλ)] = Θ(log(d−1) (λ)) default Gaussian model lim λ→∞ log− d−1 2 (λ)Var[fk(Kλ)] = ck,d lim λ→∞ log−k+ d+3 2 (λ)Var[Vol(Kλ)] = c′ k,d E Vol(Kλ) Vol(B(0, 2 log(n))) = λ→∞ 1 − d log(log(λ)) 4 log(λ) + O 1 log(λ) (ck,d , c′ k,d explicit positive constants) D. Hug & M. Reitzner (2005), I. B´ar´any & V. Vu (2007): Var[fk (Kλ)] = Θ(log(d−1)/2 (λ)) default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Calculation of the expectation of fk(Kλ) Calculation of the variance of fk(Kλ) Scaling transform default Calculation of the expectation of fk(Kλ) 1. Decomposition: E[fk(Kλ)] = E   x∈Pλ ξ(x, Pλ)   ξ(x, Pλ) := 1 k+1 #k-face containing x if x extreme 0 if not 2. Mecke-Slivnyak formula E[fk(Kλ)] = λ E[ξ(x, Pλ ∪ {x})]Φd (x)dx 3. Limit of the expectation of one score default Calculation of the variance of fk(Kλ) Var[fk (Kλ)] = E   x∈Pλ ξ2 (x, Pλ) + x=y∈Pλ ξ(x, Pλ)ξ(y, Pλ)   − (E[fk (Kλ)]) 2 = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 E[ξ(x, Pλ ∪ {x, y})ξ(y, Pλ ∪ {x, y})]Φd (x)Φd (y)dxdy − λ2 E[ξ(x, Pλ ∪ {x})]E[ξ(y, Pλ ∪ {y})]Φd (x)Φd (y)dxdy = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 ”Cov”(ξ(x, Pλ ∪ {x}), ξ(y, Pλ ∪ {y}))Φd (x)Φd (y)dxdy default Scaling transform Question : Limits of E[ξ(x, Pλ)] and ”Cov”(ξ(x, Pλ), ξ(y, Pλ)) ? Answer : definition of limit scores in a new space ◮ Critical radius Rλ := 2 log λ − log(2 · (2π)d · log λ) ◮ Scaling transform : Tλ : Rd \ {0} −→ Rd−1 × R x −→ Rλ exp−1 d−1 x |x|, R2 λ(1 − |x| Rλ ) expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1 ◮ Image of a score : ξ(λ)(Tλ(x), Tλ(Pλ)) := ξ(x, Pλ) ◮ Convergence of Pλ : Tλ(Pλ) D → P o`u P : Poisson point process in Rd−1 × R of intensity measure ehdvdh default Action of the scaling transform Π↑ := {(v, h) ∈ Rd−1 × R : h ≥ v 2 2 } Π↓ := {(v, h) ∈ Rd−1 × R : h ≤ − v 2 2 } Half-space Translate of Π↓ Sphere containing O Translate of ∂Π↑ Convexity Parabolic convexity Extreme point (x + Π↑) not fully covered k-face of Kλ Parabolic k-face RλVol Vol default Limiting picture Ψ := x∈P(x + Π↑) In red : image of the balls of diameter [0, x] where x is extreme default Limiting picture Φ := x∈Rd−1×R:x+Π↓∩P=∅(x + Π↓) In green : image of the boundary of the convex hull Kλ default Thank you for your attention!

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14261
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_23
Authors = Roman Belavkin
Keywords =
Abstract
Asymmetric information distances are used to define asymmetric norms and quasimetrics on the statistical manifold and its dual space of random variables. Quasimetric topology, generated by the Kullback-Leibler (KL) divergence, is considered as the main example, and some of its topological properties are investigated.


Voir la vidéo
Asymmetric Topologies on Statistical Manifolds

Asymmetric Topologies on Statistical Manifolds Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK GSI2015, October 28, 2015 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 1 / 16 Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 2 / 16 Sources and Consequences of Asymmetry Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 3 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α|(p − q)|, q] ≤ 1} sup x {Ep−q{x} : Eq{e|x| − 1 − |x|} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Practically all other results have to be reconsidered (e.g. Baire category theorem, Alaoglu-Bourbaki, etc). (Cobzas, 2013). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} D∗[x, 0] = ex − 1 − x, z Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} 0 /∈ Int(dom Eq⊗p{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Method: Symmetric Sandwich Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 8 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µM◦ ≤ µ(−M◦ ) ∨ µM◦ µ(−M)co ∧ µM ≤ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µ(−M◦ )co ∧ µM◦ ≤ µM◦ µM ≤ µ(−M) ∨ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Results Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 11 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Proof. u ϕ+ ≤ u|ϕ (resp. x ϕ− ≤ x|ϕ) implies (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is finer than normed space (Y, · ϕ+) (resp. (X, · ∗ ϕ−)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Proof. ϕ+(u) = (1 + |u|) ln(1 + |u|) − |u| ∈ ∆2 (resp. ϕ∗ −(x) = e−|x| − 1 + |x| ∈ ∆2). Note that ϕ− /∈ ∆2 and ϕ∗ + /∈ ∆2. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Proof. ρs(y, z) = z − y|ϕ ∨ y − z|ϕ ≤ y − z ϕ−, where (Y, · ϕ−) is Banach. Then use theorems of Reilly et al. (1982) and Chen et al. (2007). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Other asymmetric information distances (e.g. Renyi divergence). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 References Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16 Results Borodin, P. A. (2001). The Banach-Mazur theorem for spaces with asymmetric norm. Mathematical Notes, 69(3–4), 298–305. Chen, S.-A., Li, W., Zou, D., & Chen, S.-B. (2007, Aug). Fixed point theorems in quasi-metric spaces. In Machine learning and cybernetics, 2007 international conference on (Vol. 5, p. 2499-2504). IEEE. Cobzas, S. (2013). Functional analysis in asymmetric normed spaces. Birkh¨auser. Fletcher, P., & Lindgren, W. F. (1982). Quasi-uniform spaces (Vol. 77). New York: Marcel Dekker. Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982). Cauchy sequences in quasi-pseudo-metric spaces. Monatshefte f¨ur Mathematik, 93, 127–140. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16

Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14262
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_61
Authors = Frank Critchley, Germain Van Bever, Paul Marriott, Radka Sabolova
Keywords =
Abstract
We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.


Voir la vidéo
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 Cressie-Read power divergence λ-family - equivalent to Amari’s α-family asymptotic properties of two test statistics: Pearson’s χ2-test and deviance simulation study for other statistics within power divergence family 3 k-asymptotics instead of N-asymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, High-Dimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. continuous data - High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis - example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in Log-Linear Models Despite the widespread usage of these [log-linear] models, the applicability and statistical properties of log-linear models under sparse settings are still very poorly understood. As a result, even though high-dimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difficult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodness-of-fit test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - theory Theorem For k > 1 and N ≥ 6, the first three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for fixed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Define the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) define ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions - Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of k-asymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does k-asymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp   k i=0    ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h      Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 Cressie-Read recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 Cressie-Read, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= -1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= -3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary - key points 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 k-asymptotics instead of N-asymptotics 3 Cressie-Read power divergence λ-family asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. Anaya-Izquierdo, F. Critchley, and P. Marriott (2014): When are first order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. Anaya-Izquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in log-linear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efficiency for certain goodnes-of-fit tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14263
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_62
Authors = Paul Marriott, Vahed Maroufy
Keywords = Computational information geometry, Computing boundaries, Embedded manifolds, Local mixture models, Polytopes, Ruled and developable surfaces
Abstract
Local mixture models give an inferentially tractable but still flexible alternative to general mixture models. Their parameter space naturally includes boundaries; near these the behaviour of the likelihood is not standard. This paper shows how convex and differential geometries help in characterising these boundaries. In particular the geometry of polytopes, ruled and developable surfaces is exploited to develop efficient inferential algorithms.


Voir la vidéo
Computing Boundaries in Local Mixture Models

Computing Boundaries in Local Mixture Models Computing Boundaries in Local Mixture Models Vahed Maroufy & Paul Marriott Department of Statistics and Actuarial Science University of Waterloo October 28 GSI 2015, Paris Computing Boundaries in Local Mixture Models Outline Outline 1 Influence of boundaries on parameter inference 2 Local mixture models (LMM) 3 Parameter space and boundaries Hard boundaries and Soft boundaries 4 Computing the boundaries for LMMs 5 Summary and future direction Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation The costs for all these good properties and flexibility are Hard boundary =⇒ Positivity (boundary of Λµ) Soft boundary =⇒ Mixture behavior We compute them for two models here: Poisson and Normal We fix k = 4 Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Normal model let y = x−µ σ2 Λµ = {λ | (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}. We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Ruled and developable surfaces Definition Ruled surface: Γ(x, γ) = α(x) + γ · β(x), x ∈ I ⊂ R, γ ∈ Rk Developable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Definition The family of planes, A = {λ ∈ R3 | a(x) · λ + d(x) = 0, x ∈ R}, each determined by an x ∈ R, is called a one-parameter infinite family of planes. Each element of the set {λ ∈ R3 |a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R} is called a characteristic line of the surface at x and the union is called the envelope of the family. A characteristic line is the intersection of two consecutive planes The envelope is a developable surface Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Hard boundary of for Normal LMM (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 = 0, ∀y ∈ R . λ2 λ3 λ4 λ4 λ3 λ2 Figure : Left: The hard boundary for the normal LMM (shaded) as a subset of a self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Soft boundary of for Normal LMM recap : Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). For visualization purposes let k = 3, (µ ∈ M, fix σ) M3(f ) = (µ, µ2 + σ2 , µ3 + 3µσ2 ), M3(g) = (µ, µ2 + σ2 + 2λ2, µ3 + 3µσ2 + 6µλ2 + 6λ3). Figure : the 3-D curve ϕ(µ); Middle: the bounding ruled surface γa(µ, u); Right: the convex subspace restricted to soft boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Ruled surface parametrization Two boundary surfaces, each constructed by a curve and a set of lines attached to it. γa(µ, u) = ϕ(µ) + u La(µ) γb(µ, u) = ϕ(µ) + u Lb(µ) where for M = [a, b] and ϕ(µ) = M3(f ) La(µ): lines between ϕ(a) and ϕ(µ) Lb(µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture Models Summary Summary Understanding these boundaries is important if we want to exploit the nice statistical properties of LMM The boundaries described in this paper have both discrete aspects and smooth aspects The two example discussed represent the structure for almost all exponential family models It is a interesting problem to design optimization algorithms on these boundaries for finding boundary maximizers of likelihood Computing Boundaries in Local Mixture Models References Anaya-Izquierdo, K., Critchley, F., and Marriott, P. (2013). when are first order asymptotics adequate? a diagnostic. Stat, 3(1):17–22. Anaya-Izquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640. Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, rnt078. Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20. Boroczky, K. and Fodor, F. (2008). Approximating 3-dimensional convex bodies by polytopes with a restricted number of edges. Contributions to Algebra and Geometry, 49(1):177–193. Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272. Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal of Statistics, 3:259–289. Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of Differential Geometry, 57(2):239–271. Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492. Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93. Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture Models END Thank You

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14264
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_63
Authors = Frank Nielsen, Gaëtan Hadjeres
Keywords =
Abstract
We generalize the O(dnϵ2)-time (1 + ε)-approximation algorithm for the smallest enclosing Euclidean ball [2,10] to point sets in hyperbolic geometry of arbitrary dimension. We guarantee a O(1/ϵ2) convergence time by using a closed-form formula to compute the geodesic α-midpoint between any two points. Those results allow us to apply the hyperbolic k-center clustering for statistical location-scale families or for multivariate spherical normal distributions by using their Fisher information matrix as the underlying Riemannian hyperbolic metric.


Voir la vidéo
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen1 Ga¨etan Hadjeres2 ´Ecole Polytechnique 1 Sony Computer Science Laboratories, Inc 1,2 Conference on Geometric Science of Information c 2015 Frank Nielsen - Ga¨etan Hadjeres 1 The Minimum Enclosing Ball problem Finding the Minimum Enclosing Ball (or the 1-center) of a finite point set P = {p1, . . . , pn} in the metric space (X, dX (., .)) consists in finding c ∈ X such that c = argminc ∈X max p∈P dX (c , p) Figure : A finite point set P and its minimum enclosing ball MEB(P) c 2015 Frank Nielsen - Ga¨etan Hadjeres 2 The approximating minimum enclosing ball problem In a euclidean setting, this problem is well-defined: uniqueness of the center c∗ and radius R∗ of the MEB computationally intractable in high dimensions. We fix an > 0 and focus on the Approximate Minimum Enclosing Ball problem of finding an -approximation c ∈ X of MEB(P) such that dX (c, p) ≤ (1 + )R∗ ∀p ∈ P. c 2015 Frank Nielsen - Ga¨etan Hadjeres 3 The approximating minimum enclosing ball problem: prior work Approximate solution in the euclidean case are given by Badoiu and Clarkson’s algorithm [Badoiu and Clarkson, 2008]: Initialize center c1 ∈ P Repeat 1/ 2 times the following update: ci+1 = ci + fi − ci i + 1 where fi ∈ P is the farthest point from ci . How to deal with point sets whose underlying geometry is not euclidean ? c 2015 Frank Nielsen - Ga¨etan Hadjeres 4 The approximating minimum enclosing ball problem: prior work This algorithm has been generalized to dually flat manifolds [Nock and Nielsen, 2005] Riemannian manifolds [Arnaudon and Nielsen, 2013] Applying these results to hyperbolic geometry give the existence and uniqueness of MEB(P), but give no explicit bounds on the number of iterations assume that we are able to precisely cut geodesics. c 2015 Frank Nielsen - Ga¨etan Hadjeres 5 The approximating minimum enclosing ball problem: our contribution We analyze the case of point sets whose underlying geometry is hyperbolic. Using a closed-form formula to compute geodesic α-midpoints, we obtain a intrinsic (1 + )-approximation algorithm to the approximate minimum enclosing ball problem a O(1/ 2) convergence time guarantee a one-class clustering algorithm for specific subfamilies of normal distributions using their Fisher information metric c 2015 Frank Nielsen - Ga¨etan Hadjeres 6 Model of d-dimensional hyperbolic geometry: The Poincar´e ball model The Poincar´e ball model (Bd , ρ(., .)) consists in the open unit ball Bd = {x ∈ Rd : x < 1} together with the hyperbolic distance ρ (p, q) = arcosh 1 + 2 p − q 2 (1 − p 2) (1 − q 2) , ∀p, q ∈ Bd . This distance induces on the metric space (Bd , ρ) a Riemannian structure. c 2015 Frank Nielsen - Ga¨etan Hadjeres 7 Geodesics in the Poincar´e ball model Shorter paths between two points (geodesics) are exactly straight (euclidean) lines passing through the origin circle arcs orthogonal to the unit sphere Figure : “Straight” lines in the Poincar´e ball model c 2015 Frank Nielsen - Ga¨etan Hadjeres 8 Circles in the Poincar´e ball model Circles in the Poincar´e ball model look like euclidean circles but with different center Figure : Difference between euclidean MEB (in blue) and hyperbolic MEB (in red) for the set of blue points in hyperbolic Poincar´e disk (in black). The red cross is the hyperbolic center of the red circle while the pink one is its euclidean center. c 2015 Frank Nielsen - Ga¨etan Hadjeres 9 Translations in the Poincar´e ball model Tp (x) = 1 − p 2 x + x 2 + 2 x, p + 1 p p 2 x 2 + 2 x, p + 1 Figure : Tiling of the hyperbolic plane by squares c 2015 Frank Nielsen - Ga¨etan Hadjeres 10 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . For the special case p = (0, . . . , 0), q = (xq, 0, . . . , 0), we have p#αq := (xα, 0, . . . , 0) with xα = cα,q − 1 cα,q + 1 , where cα,q := eαρ(p,q) = 1 + xq 1 − xq α . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints Noting that p#αq = Tp (T−p (p) #αT−p (q)) ∀p, q ∈ Bd we obtain a closed-form formula for computing p#αq how to compute p#αq in linear time O(d) that these transformations are exact. c 2015 Frank Nielsen - Ga¨etan Hadjeres 12 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius For a fixed radius r > R∗, we can find c ∈ Bd such that ρ (c, P) ≤ (1 + )r ∀p ∈ P with Algorithm 1: (1 + )-approximation of EHB(P, r) 1: c0 := p1 2: t := 0 3: while ∃p ∈ P such that p /∈ B (ct, (1 + ) r) do 4: let p ∈ P be such a point 5: α := ρ(ct ,p)−r ρ(ct ,p) 6: ct+1 := ct#αp 7: t := t+1 8: end while 9: return ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 13 Idea of the proof By the hyperbolic law of cosines : ch (ρt) ≥ ch (h) ch (ρt+1) ch (ρ1) ≥ ch (h)T ≥ ch ( r)T . ct+1 ct c∗ pt h > r ρt+1 ρt r ≤ rr θ θ Figure : Update of ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 14 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. Our error with the true MEHB center c∗ verifies ρ (c, c∗ ) ≤ arcosh ch ((1 + ) r) ch (R∗) c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. This suggests to implement a dichotomic search in order to compute an approximation of the minimal hyperbolic enclosing ball. We obtain a O(1 + + 2/4)-approximation of MEHB(P) in O N 2 log 1 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) algorithm Algorithm 2: (1 + )-approximation of MEHB(P) 1: c := p1 2: rmax := ρ (c, P); rmin = rmax 2 ; tmax := +∞ 3: r := rmax; 4: repeat 5: ctemp := Alg1 P, r, 2 , interrupt if t > tmax in Alg1 6: if call of Alg1 has been interrupted then 7: rmin := r 8: else 9: rmax := r ; c := ctemp 10: end if 11: dr := rmax−rmin 2 ; r := rmin + dr ; tmax := log(ch(1+ /2)r)−log(ch(rmin)) log(ch(r /2)) 12: until 2dr < rmin 2 13: return c c 2015 Frank Nielsen - Ga¨etan Hadjeres 17 Experimental results The number of iterations does not depend on d. Figure : Number of α-midpoint calculations as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 18 Experimental results The running time is approximately O(dn 2 ) (vertical translation in logarithmic scale). Figure : execution time as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 19 Applications Hyperbolic geometry arises when considering certain subfamilies of multivariate normal distributions. For instance, the following subfamilies N µ, σ2In of n-variate normal distributions with scalar covariance matrix (In is the n × n identity matrix), N µ, diag σ2 1, . . . , σ2 n of n-variate normal distributions with diagonal covariance matrix N(µ0, Σ) of d-variate normal distributions with fixed mean µ0 and arbitrary positive definite covariance matrix Σ are statistical manifolds whose Fisher information metric is hyperbolic. c 2015 Frank Nielsen - Ga¨etan Hadjeres 20 Applications In particular, our results apply to the two-dimensional location-scale subfamily: Figure : MEHB (D) of probability density functions (left) in the (µ, σ) superior half-plane (right). P = {A, B, C}. c 2015 Frank Nielsen - Ga¨etan Hadjeres 21 Openings Plugging the EHB and MEHB algorithms to compute clusters centers in the approximation algorithm by [Gonzalez, 1985], we obtain approximate algorithms for covering in hyperbolic spaces the k-center problem in O kNd 2 log 1 c 2015 Frank Nielsen - Ga¨etan Hadjeres 22 Algorithm 3: Gonzalez farthest-first traversal approximation algo- rithm 1: C1 := P, i = 0 2: while i ≤ k do 3: ∀j ≤ i, compute cj := MEB(Cj ) 4: ∀j ≤ i, set fj := argmaxp∈P ρ(p, cj ) 5: find f ∈ {fj } whose distance to its cluster center is maximal 6: create cluster Ci containing f 7: add to Ci all points whose distance to f is inferior to the distance to their cluster center 8: increment i 9: end while 10: return {Ci }i c 2015 Frank Nielsen - Ga¨etan Hadjeres 23 Openings The computation of the minimum enclosing hyperbolic ball does not necessarily involve all points p ∈ P. Core-sets in hyperbolic geometry the MEHB obtained by the algorithm is an -core-set differences with the euclidean setting: core-sets are of size at most 1/ [Badoiu and Clarkson, 2008] c 2015 Frank Nielsen - Ga¨etan Hadjeres 24 Thank you! c 2015 Frank Nielsen - Ga¨etan Hadjeres 25 Bibliography I Arnaudon, M. and Nielsen, F. (2013). On approximating the Riemannian 1-center. Computational Geometry, 46(1):93–104. Badoiu, M. and Clarkson, K. L. (2008). Optimal core-sets for balls. Comput. Geom., 40(1):14–22. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306. Nock, R. and Nielsen, F. (2005). Fitting the smallest enclosing Bregman ball. In Machine Learning: ECML 2005, pages 649–656. Springer. c 2015 Frank Nielsen - Ga¨etan Hadjeres 26

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14265
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_64
Authors = Emmanuel Kalunga, Eric Monacelli, Karim Djouani, Quentin Barthélemy, Sylvain Chevallier, Yskandar Hamam
Keywords = Brain- Computer, Information geometry, Interfaces, Riemannian means, Steady State, Visually Evoked Potentials
Abstract
Brain Computer Interfaces (BCI) based on electroencephalography (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices, and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably, a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset, as the invariances embedded in those distances have an influence on the classification accuracy. Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load.


Voir la vidéo
From Euclidean to Riemannian Means Information Geometry for SSVEP Classification

From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification Emmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al. F’SATI - Tshawne University of Technology (South Africa) LISV - Université de Versailles Saint-Quentin (France) Mensia Technologies (France) sylvain.chevallier@uvsq.fr 28 October 2015 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Cerebral interfaces Context Rehabilitation and disability compensation ) Out-of-the-lab solutions ) Open to a wider population Problem Intra-subject variabilities ) Online methods, adaptative algorithms Inter-subject variabilities ) Good generalization, fast convergence Opportunities New generation of BCI (Congedo & Barachant) • Growing interest in EEG community • Large community, available datasets • Challenging situations and problems S. Chevallier 28/10/2015 GSI 2 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Outline Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances S. Chevallier 28/10/2015 GSI 3 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction based on brain activity Brain-Computer Interface (BCI) for non-muscular communication • Medical applications • Possible applications for wider population Recording at what scale ? • Neuron !LFP • Neuronal group !ECoG !SEEG • Brain !EEG !MEG !IRMf !TEP S. Chevallier 28/10/2015 GSI 4 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback S. Chevallier 28/10/2015 GSI 5 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Electroencephalography Most BCI rely on EEG ) Efficient to capture brain waves • Lightweight system • Low cost • Mature technologies • High temporal resolution • No trepanation S. Chevallier 28/10/2015 GSI 6 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Origins of EEG • Local field potentials • Electric potential difference between dendrite and soma • Maxwell’s equation • Quasi-static approximation • Volume conduction effect • Sensitive to conductivity of brain skull • Sensitive to tissue anisotropies S. Chevallier 28/10/2015 GSI 7 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental paradigms Different brain signals for BCI : • Motor imagery : (de)synchronization in premotor cortex • Evoked responses : low amplitude potentials induced by stimulus Steady-State Visually Evoked Potentials 8 electrodes in occipital region SSVEP stimulation LEDs 13 Hz 17 Hz 21 Hz • Neural synchronization with visual stimulation • No learning required, based on visual attention • Strong induced activation S. Chevallier 28/10/2015 GSI 8 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances BCI Challenges Limitations • Data scarsity ) A few sources are non-linearly mixed on all electrodes • Individual variabilities ) Effect of mental fatigue • Inter-session variabilities ) Electronic impedances, localizations of electrodes • Inter-individual variabilities ) State of the art approaches fail with 20% of subjects Desired properties : • Online systems ) Continously adapt to the user’s variations • No calibration phase ) Non negligible cognitive load, raises fatigue • Generic model classifiers and transfert learning ) Use data from one subject to enhance the results for another S. Chevallier 28/10/2015 GSI 9 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Spatial covariance matrices Common approach : spatial filtering • Efficient on clean datasets • Specific to each user and session ) Require user calibration • Two step training with feature selection ) Overfitting risk, curse of dimensionality Working with covariance matrices • Good generalization across subjects • Fast convergence • Existing online algorithms • Efficient implementations S. Chevallier 28/10/2015 GSI 10 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG • An EEG trial : X 2 RC⇥N , C electrodes, N time samples • Assuming that X ⇠ N(0, ⌃) • Covariance matrices ⌃ belong to MC = ⌃ 2 RC⇥C : ⌃ = ⌃| and x| ⌃x > 0, 8x 2 RC \0 • Mean of the set {⌃i }i=1,...,I is ¯⌃ = argmin⌃2MC PI i=1 dm (⌃i , ⌃) • Each EEG class is represented by its mean • Classification based on those means • How to obtain a robust and efficient algorithm ? Congedo, 2013 S. Chevallier 28/10/2015 GSI 11 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Minimum distance to Riemannian mean Simple and robust classifier • Compute the center ⌃ (k) E of each of the K classes • Assign a given unlabelled ˆ⌃ to the closest class k⇤ = argmin k (ˆ⌃, ⌃ (k) E ) Trajectories on tangent space at mean of all trials ¯⌃µ −4 −2 0 2 4 −4 −2 0 2 4 6 Resting class 13Hz class 21Hz class 17Hz class Delay S. Chevallier 28/10/2015 GSI 12 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Riemannian potato Removing outliers and artifacts Reject any ⌃i that lies too far from the mean of all trials ¯⌃µ z( i ) = i µ > zth , i is d(⌃i , ¯⌃), µ and are the mean and standard deviation of distances { i } I i=1 Raw matrices Riemannian potato filtering S. Chevallier 28/10/2015 GSI 13 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG-based BCI Riemannian approaches in BCI : • Achieve state of the art results ! performing like spatial filtering or sensor-space methods • Rely on simpler algorithms ! less error-prone, computationally efficient What are the reason of this success ? • Invariances embedded with Riemannian distances ! invariance to rescaling, normalization, whitening ! invariance to electrode permutation or positionning • Equivalent to working in an optimal source space ! spatial filtering are sensitive to outliers and user-specific ! no question on "sensors or sources" methods ) What are the most desirable invariances for EEG ? S. Chevallier 28/10/2015 GSI 14 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Considered distances and divergences Euclidean dE(⌃1, ⌃2) = k⌃1 ⌃2kF Log-Euclidean dLE(⌃1, ⌃2) = klog(⌃1) log(⌃2)kF V. Arsigny et al., 2006, 2007 Affine-invariant dAI(⌃1, ⌃2) = klog(⌃ 1 1 ⌃2)kF T. Fletcher & S. Joshi, 2004 , M. Moakher, 2005 ↵-divergence d↵ D(⌃1, ⌃2) 1<↵<1 = 4 1 ↵2 log det( 1 ↵ 2 ⌃1+ 1+↵ 2 ⌃2) det(⌃1) 1 ↵ 2 det(⌃2) 1+↵ 2 Z. Chebbi & M. Moakher, 2012 Bhattacharyya dB(⌃1, ⌃2) = ⇣ log det 1 2 (⌃1+⌃2) (det(⌃1) det(⌃2))1/2 ⌘1/2 Z. Chebbi & M. Moakher, 2012 S. Chevallier 28/10/2015 GSI 15 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental results • Euclidean distances yield the lowest results ! Usually attributed to the invariance under inversion that is not guaranteed ! Displays swelling effect • Riemannian approaches outperform state-of-the-art methods (CCA+SVM) • ↵-divergence shows the best performances ! but requires a costly optimisation to find the best ↵ value • Bhattacharyya has the lowest computational cost and a good accuracy −1 −0.5 0 0.5 1 20 30 40 50 60 70 80 90 Accuracy(%) Alpha values (α) −1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 CPUtime(s) S. Chevallier 28/10/2015 GSI 16 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Conclusion Working with covariance matrices in BCI • Achieves very good results • Simple algorithms work well : MDM, Riemannian potato • Need for robust and online methods Interesting applications for IG : • Many freely available datasets • Several competitions • Many open source toolboxes for manipulating EEG Several open questions : • Handling electrodes misplacements and others artifacts • Missing data and covariance matrices of lower rank • Inter- and intra-individual variabilities S. Chevallier 28/10/2015 GSI 17 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Thank you ! S. Chevallier 28/10/2015 GSI 18 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback First systems in early ’70 S. Chevallier 28/10/2015 GSI 19 / 19

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14266
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_65
Authors = Hiroto Inoue
Keywords =
Abstract
We consider the geodesic equation on the elliptical model, which is a generalization of the normal model. More precisely, we characterize this manifold from the group theoretical view point and formulate Eriksen’s procedure to obtain geodesics on normal model and give an alternative proof for it.


Voir la vidéo
Group Theoretical Study on Geodesics for the Elliptical Models

Group Theoretical Study on Geodesics for the Elliptical Models Hiroto Inoue Kyushu University, Japan October 28, 2015 GSI2015, ´Ecole Polytechnique, Paris-Saclay, France Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 1 / 14 Overview 1 Eriksen’s construction of geodesics on normal model Problem 2 Reconsideration of Eriksen’s argument Embedding Nn → Sym+ n+1(R) 3 Geodesic equation on Elliptical model 4 Future work Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 2 / 14 Eriksen’s construction of geodesics on normal model Let Sym+ n (R) be the set of n-dimensional positive-definite matrices. The normal model Nn = (M, ds2) is a Riemannian manifold defined by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 ). The geodesic equation on Nn is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0. (1) The solution of this geodesic equation has been obtained by Eriksen. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 3 / 14 Theorem ([Eriksen 1987]) For any x ∈ Rn, B ∈ Symn(R), define a matrix exponential Λ(t) by Λ(t) =   ∆ δ Φ tδ tγ tΦ γ Γ   := exp(−tA), A :=   B x 0 tx 0 −tx 0 −x −B   ∈ Mat2n+1. (2) Then, the curve (µ(t), Σ(t)) := (−∆−1δ, ∆−1) is the geodesic on Nn satisfiying the initial condition (µ(0), Σ(0)) = (0, In), ( ˙µ(0), ˙Σ(0)) = (x, B). (proof) We see that by the definition, (µ(t), Σ(t)) satisfies the geodesic equation. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 4 / 14 Problem 1 Explain Eriksen’s theorem, to clarify the relation between the normal model and symmetric spaces. 2 Extend Eriksen’s theorem to the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 5 / 14 Reconsideration of Eriksen’s argument Sym+ n+1(R) Notice that the positive-definite symmetric matrices Sym+ n+1(R) is a symmetric space by G/K Sym+ n+1(R) gK → g · tg, where G = GLn+1(R), K = O(n + 1). This space G/K has the G-invariant Riemannian metric ds2 = 1 2 tr (S−1 dS)2 . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 6 / 14 Embedding Nn → Sym+ n+1(R) Put an affine subgroup GA := P µ 0 1 P ∈ GLn(R), µ ∈ Rn ⊂ GLn+1(R). Define a Riemannian submanifold as the orbit GA · In+1 = {g · t g| g ∈ GA} ⊂ Sym+ n+1(R). Theorem (Ref. [Calvo, Oller 2001]) We have the following isometry Nn ∼ −→ GA · In+1 ⊂ Sym+ n+1(R), (Σ, µ) → Σ + µtµ µ tµ 1 . (3) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 7 / 14 Embedding Nn → Sym+ n+1(R) By using the above embedding, we get a simpler expression of the metric and the geodesic equation. Nn ∼= GA · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σ + µtµ µ tµ 1 metric ds2 = (tdµ)Σ−1(dµ) +1 2tr((Σ−1dΣ)2) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0 ⇔ (In, 0)(S−1 ˙S) = (B, x) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 8 / 14 Reconsideration of Eriksen’s argument We can interpret the Eriksen’s argument as follows. Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (B, x) A =   B x 0 t x 0 −t x 0 −x −B   −→ e−tA =   ∆ δ ∗ t δ ∗ ∗ ∗ ∗   −→ S := ∆ δ t δ −1 ∈ ∈ ∈ {A : JAJ = −A} −→ {Λ : JΛJ = Λ−1 } −→ Essential! Nn ∼= GA · In+1 ∩ ∩ ∩ sym2n+1(R) −→ exp Sym+ 2n+1(R) −→ projection Sym+ n+1(R) Here J =   In 1 In  . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 9 / 14 Geodesic equation on Elliptical model Definition Let us define a Riemannian manifold En(α) = (M, ds2) by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 )+ 1 2 dα tr(Σ−1 dΣ) 2 . (4) where dα = (n + 1)α2 + 2α, α ∈ C. Then En(0) = Nn. The geodesic equation on En(α) is    ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ− dα ndα + 1 t ˙µΣ−1 ˙µΣ = 0. (5) This is equivalent to the geodesic equation on the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 10 / 14 Geodesic equation on Elliptical model The manifold En(α) is also embedded into positive-definite symmetric matrices Sym+ n+1(R), ref. [Calvo, Oller 2001], and we have simpler expression of the geodesic equation. En(α) ∼= ∃GA(α) · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = |Σ|α Σ + µtµ µ tµ 1 metric (4) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. (5) ⇔ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) |A| = det A Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 11 / 14 Geodesic equation on Elliptical model But, in general, we do not ever construct any submanifold N ⊂ Sym+ 2n+1(R) such that its projection is En(α): Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) Λ(t) −→ S(t) ∈ ∈ N −→ En(α) ∼= GA(α) · In+1 ∩ ∩ Sym+ 2n+1(R) −→ projection Sym+ n+1(R) The geodesic equation on elliptical model has not been solved. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 12 / 14 Future work 1 Extend Eriksen’s theorem for elliptical models (ongoing) 2 Find Eriksen type theorem for general symmetric spaces G/K Sketch of the problem: For a projection p : G/K → G/K, find a geodesic submanifold N ⊂ G/K, such that p|N maps all the geodesics to the geodesics: ∀Λ(t): Geodesic −→ p(Λ(t)): Geodesic ∈ ∈ N −→ p|N p(N) ∩ ∩ G/K −→ p:projection G/K Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 13 / 14 References Calvo, M., Oller, J.M. A distance between elliptical distributions based in an embedding into the Siegel group, J. Comput. Appl. Math. 145, 319–334 (2002). Eriksen, P.S. Geodesics connected with the Fisher metric on the multivariate normal manifold, pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987). Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 14 / 14

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14267
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_66
Authors = Osamu Komori, Shinto Eguchi
Keywords =
Abstract
We introduce a class of paths or one-parameter models connecting arbitrary two probability density functions (pdf’s). The class is derived by employing the Kolmogorov-Nagumo average between the two pdf’s. There is a variety of such path connectedness on the space of pdf’s since the Kolmogorov-Nagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The one-parameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.


Voir la vidéo
Path connectedness on a space of probability density functions

Path connectedness on a space of probability density functions Osamu Komori1 , Shinto Eguchi2 University of Fukui1 , Japan The Institute of Statistical Mathematics2 , Japan Ecole Polytechnique, Paris-Saclay (France) October 28, 2015 Komori, O. (University of Fukui) GSI2015 October 28, 2015 1 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 2 / 18 Setting Terminology . . X : data space P : probability measure on X FP: space of probability density functions associated with P We consider a path connecting f and g, where f, g ∈ FP, and investigate the property from a viewpoint of information geometry. Komori, O. (University of Fukui) GSI2015 October 28, 2015 3 / 18 Kolmogorov-Nagumo (K-N) average Let ϕ : (0, ∞) → R be an monotonic increasing and concave continuous function. Then for f and g in Fp The Kolmogorov-Nagumo (K-N) average . . ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) for 0 ≤ t ≤ 1. Remark 1 . . ϕ−1 is monotone increasing, convex and continuous on (0, ∞) Komori, O. (University of Fukui) GSI2015 October 28, 2015 4 / 18 ϕ-path Based on K-N average, we consider ϕ-path connecting f and g in FP: ϕ-path . . ft(x, ϕ) = ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) , where κt ≤ 0 is a normalizing factor, where the equality holds if t = 0 or t = 1. Komori, O. (University of Fukui) GSI2015 October 28, 2015 5 / 18 Existence of κt Theorem 1 . . There uniquely exists κt such that ∫ X ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) dP(x) = 1 Proof From the convexity of ϕ−1 , we have 0 ≤ ∫ ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) dP(x) ≤ ∫ {(1 − t)f(x) + tg(x)}dP(x) ≤ 1 And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing. Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equation above. Komori, O. (University of Fukui) GSI2015 October 28, 2015 6 / 18 Illustration of ϕ-path Komori, O. (University of Fukui) GSI2015 October 28, 2015 7 / 18 Examples of ϕ-path Example 1 . 1 ϕ0(x) = log(x). The ϕ0-path is given by ft(x, ϕ0) = exp((1 − t) log f(x) + t log g(x) − κt), where κt = log ∫ exp((1 − t) log f(x) + t log g(x))dP(x). 2 ϕη(x) = log(x + η) with η ≥ 0. The ϕη-path is given by ft(x, ϕη) = exp [ (1 − t) log{ f(x) + η} + t log{g(x) + η} − κt ] , where κt = log [ ∫ exp{(1 − t) log{f(x) + η} + t log{g(x) + η}}dP(x) − η ] . 3 ϕβ(x) = (xβ − 1)/β with β ≤ 1. The ϕβ-path is given by ft(x, ϕβ) = {(1 − t)f(x)β + tg(x)β − κt} 1 β , where κt does not have an explicit form. Komori, O. (University of Fukui) GSI2015 October 28, 2015 8 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 9 / 18 Extended expectation For a function a(x): X → R, we consider Extended expectation . . E(ϕ) f {a(X)} = ∫ X 1 ϕ′(f(x)) a(x)dP(x) ∫ X 1 ϕ′(f(x)) dP(x) , where ϕ: (0, ∞) → R is a generator function. Remark 2 If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation. Komori, O. (University of Fukui) GSI2015 October 28, 2015 10 / 18 Properties of extended expectation We note that 1 E(ϕ) f (c) = c for any constant c. 2 E(ϕ) f {ca(X)} = cE(ϕ) f {a(X)} for any constant c. 3 E(ϕ) f {a(X) + b(X)} = E(ϕ) f {a(X)} + E(ϕ) f {b(X)}. 4 E(ϕ) f {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 for P-almost everywhere x in X. Remark 3 If we define f(ϕ) (x) = 1/ϕ′ ( f(x))/ ∫ X 1/ϕ′ (f(x))dP(x), then E(ϕ) f {a(X)} = Ef(ϕ) {a(X)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 11 / 18 Tangent space of FP Let Hf be a Hilbert space with the inner product defined by ⟨a, b⟩f = E(ϕ) f {a(X)b(X)}, and the tangent space Tangent space associated with extended expectation . . Tf = {a ∈ Hf : ⟨a, 1⟩f = 0}. For a statistical model M = { fθ(x)}θ∈Θ we have E(ϕ) fθ {∂iϕ(fθ(X))} = 0 for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi)i=1,··· ,p. Further, E(ϕ) fθ {∂i∂jϕ(fθ(X))} = E(ϕ) fθ { ϕ′′ ( fθ(X)) ϕ′(fθ(X))2 ∂iϕ(fθ(X))∂iϕ(fθ(X)) } . Komori, O. (University of Fukui) GSI2015 October 28, 2015 12 / 18 Parallel displacement A(ϕ) t Define A(ϕ) t (x) in Tft by the solution for a differential equation ˙A(ϕ) t (x) − E(ϕ) ft { A(ϕ) t ˙ft ϕ′′ ( ft) ϕ′(ft) } = 0, where ft is a path connecting f and g such that f0 = f and f1 = g. ˙A(ϕ) t (x) is the derivative of A(ϕ) t (x) with respect to t. Theorem 2 The geodesic curve {ft}0≤t≤1 by the parallel displacement A(ϕ) t is the ϕ-path. Komori, O. (University of Fukui) GSI2015 October 28, 2015 13 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 14 / 18 U-divergence Assume that U(s) is a convex and increasing function of a scalar s and let ξ(t) = argmaxs{st − U(s)} . Then we have U-divergence . . DU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP − ∫ {U(ξ(f)) − fξ( f)}dP. In fact, U-divergence is the difference of the cross entropy CU( f, g) with the diagonal entropy CU( f, f), where CU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP. Komori, O. (University of Fukui) GSI2015 October 28, 2015 15 / 18 Connections based on U-divergence For a manifold of finite dimension M = { fθ(x) : θ ∈ Θ} and vector fields X and Y on M, the Riemannian metric is G(U) (X, Y)(f) = ∫ X f Yξ( f)dP for f ∈ M and linear connections ∇(U) and ∇∗(U) are G(U) (∇(U) X Y, Z)(f) = ∫ XY f Zξ(f)dP and G(U) (∇∗ X (U) Y, Z)(f) = ∫ Z f XYξ(f)dP. See Eguchi (1992) for details. Komori, O. (University of Fukui) GSI2015 October 28, 2015 16 / 18 Equivalence between ∇∗ -geodesic and ξ-path Let ∇(U) and ∇∗(U) be linear connections associated with U-divergence DU, and let C(ϕ) = {ft(x, ϕ) : 0 ≤ t ≤ 1} be the ϕ path connecting f and g of FP. Then, we have Theorem 3 A ∇(U) -geodesic curve connecting f and g is equal to C(id) , where id denotes the identity function; while a ∇∗(U) -geodesic curve connecting f and g is equal to C(ξ) , where ξ(t) = argmaxs{st − U(s)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 17 / 18 Summary 1 We consider ϕ-path based on Kolmogorov-Nagumo average. 2 The relation between U-divergence and ϕ-path was investigated (ϕ corresponds to ξ). 3 The idea of ϕ-path can be applied to probability density estimation as well as classification problems. 4 Divergence associated with ϕ-path can be considered, where a special case would be Bhattacharyya divergence. Komori, O. (University of Fukui) GSI2015 October 28, 2015 18 / 18

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International

Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 28-30 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of affine connections { −1 , +1 } (mixture and exponential connections). These allow to define notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two affine structures (maps on M) are used simultaneously: -1: Mixture affine geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential affine geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifold-based systems to simplex-based geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efficient algorithms in a computational framework. It is inherently finite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in -1-space π1 π2 -6 -4 -2 0 2 4 6 -6-4-20246 mixed geodesics in +1-space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the log-likelihood function. The log-likelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1-representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an infinite-dimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {X|X ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2-simplex. Right: Embedded binomial (k = 3) in the 3-simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefficient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates well-known difficulties: Identifiability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore defined on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1-geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)-dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efficient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows low-dimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a n-dim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with low-dimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1-convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1-convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efficient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple one-parameter binomial [...] models generally provides poor fits to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Define the norm ||π||π0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Define the distance function as d(π(θ), π0) := inf π∈∪Si ||π(θ) − π||π0 . Theorem (Anaya-Izquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N||(ˆπG − ˆπNP MLE )||ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary High-dimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, S-I (1985), Differential-geometrical methods in statistics, Springer-Verlag. Anaya-Izquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 1197-1224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 2454-2471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582-600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 69-76. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 77-93. Germain Van Bever CIG for mixtures 19/19

Bayesian and Information Geometry for Inverse Problems (chaired by Ali Mohammad-Djafari, Olivier Swander)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14269
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_76
Authors = Damiano Brigo, John Armstrong
Keywords =
Abstract
We review the manifold projection method for stochastic nonlinear filtering in a more general setting than in our previous paper in Geometric Science of Information 2013. We still use a Hilbert space structure on a space of probability densities to project the infinite dimensional stochastic partial differential equation for the optimal filter onto a finite dimensional exponential or mixture family, respectively, with two different metrics, the Hellinger distance and the L2 direct metric. This reduces the problem to finite dimensional stochastic differential equations. In this paper we summarize a previous equivalence result between Assumed Density Filters (ADF) and Hellinger/Exponential projection filters, and introduce a new equivalence between Galerkin method based filters and Direct metric/Mixture projection filters. This result allows us to give a rigorous geometric interpretation to ADF and Galerkin filters. We also discuss the different finite-dimensional filters obtained when projecting the stochastic partial differential equation for either the normalized (Kushner-Stratonovich) or a specific unnormalized (Zakai) density of the optimal filter.


Voir la vidéo
Stochastic PDE projection on manifolds Assumed-Density and Galerkin Filters

Stochastic PDE projection on manifolds: Assumed-Density and Galerkin Filters GSI 2015, Oct 28, 2015, Paris Damiano Brigo Dept. of Mathematics, Imperial College, London www.damianobrigo.it — Joint work with John Armstrong Dept. of Mathematics, King’s College, London — Full paper to appear in MCSS, see also arXiv.org D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 1 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). The topology & differential structure in the chart is the L2 structure, but two possibilities: S : d2(p1, p2) = p1 − p2 (L2 direct distance), p1,2 ∈ L2 S1/2 : dH( √ p1, √ p2) = √ p1 − √ p2 (Hellinger distance), p1,2 ∈ L1 where · is the norm of Hilbert space L2. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) d2 ort. projection: Πγ θ [v] = m i=1 [ m j=1 γij (θ) v, ∂p(·, θ) ∂θj ] ∂p(·, θ) ∂θi (dH proj. analogous inserting √ · and replacing γ with g) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) The nonlinear filtering problem consists in finding the conditional probability distribution πt of the state Xt given the observations up to time t, i.e. πt (dx) := P[Xt ∈ dx | Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t). Assume πt has a density pt : then pt satisfies the Str SPDE: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). With Ito calculus we would have terms ∂2p(·,θ) ∂θi ∂θj d θi, θj (not tang vec) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . Instead, using the Hellinger distance & the Fisher metric with projection Πg dθi t =   m j=1 gij (θt ) L∗ t p(x, θt ) p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 gij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 gij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). Projection filter in η coincides with classical approx filter: assumed density filter (based on generalized “moment matching”) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. We define a simple mixture family as follows. Given m + 1 fixed squared integrable probability densities q = [q1, q2, . . . , qm+1]T , define ˆθ(θ) := [θ1, θ2, . . . , θm, 1 − θ1 − θ2 − . . . − θm]T for all θ ∈ Rm. We write ˆθ instead of ˆθ(θ). Mixture family (simplex): SM (q) = {ˆθ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1} D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). The L2 metric does not depend on the specific point θ of the manifold. The same holds for the tangent space at p(·, θ), which is given by span{q1 − qm+1, q2 − qm+1, · · · , qm − qm+1} Also the L2 projection becomes particularly simple. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. See the full paper for the details. Summing up: Family → Exponential Basic Mixture Metric ↓ Hellinger dH Good Nothing special Fisher g(θ) ∼ADF ≈ local moment matching Direct L2 d2 Nothing special Good matrix γ(θ) (∼Galerkin) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 We are now going to illustrate the Gaussian mixture projection filter (GMPF) in a fundamental example. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (pink) vs EKF (N) (blue) vs exact (green, finite diff. method, grid 1000 state & 5000 time) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 0 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 13 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 1 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 14 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 2 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 15 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 3 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 16 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 4 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 17 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 5 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 18 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 6 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 19 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 7 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 20 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 8 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 21 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 9 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 22 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 10 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 23 / 37 Mixture Projection Filter The quadratic sensor Comparing local approximation errors (L2 residuals) εt ε2 t = (pexact,t (x) − papprox,t (x))2 dx papprox,t (x): three possible choices. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (blue) vs EKF (N) (green) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 24 / 37 Mixture Projection Filter The quadratic sensor L2 residuals for the quadratic sensor 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 25 / 37 Mixture Projection Filter The quadratic sensor Comparing local approx errors (Prokhorov residuals) εt εt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) + ∀x} with F the CDF of p’s. Levy-Prokhorov metric works well with singular densities like particles where L2 metric not ideal. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (green) vs best three particles (blue) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 26 / 37 Mixture Projection Filter The quadratic sensor L´evy residuals for the quadratic sensor 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 1 2 3 4 5 6 7 8 9 10 Time ProkhorovResiduals Prokhorov Residual (L2NM) Prokhorov Residual (HE) Best possible residual (3Deltas) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 27 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular The solution is to dynamically change the parameterization and even the dimension of the manifold. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? Optimality: introducing new projections (forthcoming J. Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Thanks With thanks to the organizing committee. Thank you for your attention. Questions and comments welcome D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 30 / 37 Conclusions and References References I [1] J. Aggrawal: Sur l’information de Fisher. In: Theories de l’Information (J. Kampe de Feriet, ed.), Springer-Verlag, Berlin–New York 1974, pp. 111-117. [2] Amari, S. Differential-geometrical methods in statistics, Lecture notes in statistics, Springer-Verlag, Berlin, 1985 [3] Armstrong, J., and Brigo, D. (2016). Nonlinear filtering via stochastic PDE projection on mixture manifolds in L2 direct metric, Mathematics of Control, Signals and Systems, 2016, accepted. [4] Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W. (1999). Nonlinear Projection Filter based on Galerkin approximation. AIAA Journal of Guidance Control and Dynamics, 22 (2): 258-266. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 31 / 37 Conclusions and References References II [5] Beard, R. and Gunther, J. (1997). Galerkin Approximations of the Kushner Equation in Nonlinear Estimation. Working Paper, Brigham Young University. [6] Barndorff-Nielsen, O.E. (1978). Information and Exponential Families. John Wiley and Sons, New York. [7] Brigo, D. Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. Barndorff-Nielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientific, 1999 [8] Brigo, D, On SDEs with marginal laws evolving in finite-dimensional exponential families, STAT PROBABIL LETT, 2000, Vol: 49, Pages: 127 – 134 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 32 / 37 Conclusions and References References III [9] Brigo, D. (2011). The direct L2 geometric structure on a manifold of probability densities with applications to Filtering. Available on arXiv.org and damianobrigo.it [10] Brigo, D, Hanzon, B, LeGland, F, A differential geometric approach to nonlinear filtering: The projection filter, IEEE T AUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252 [11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear filtering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 – 534 [12] D. Brigo, Filtering by Projection on the Manifold of Exponential Densities, PhD Thesis, Free University of Amsterdam, 1996. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 33 / 37 Conclusions and References References IV [13] Brigo, D., and Pistone, G. (1996). Projecting the Fokker-Planck Equation onto a finite dimensional exponential family. Available at arXiv.org [14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbook of Nonlinear Filtering, Oxford University Press. [15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear filtering, in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: The Mathematics of Filtering and Identification and Applications (Reidel, Dordrecht, 1981) 53–75. [16] Elworthy, D. (1982). Stochastic Differential Equations on Manifolds. LMS Lecture Notes. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 34 / 37 Conclusions and References References V [17] Hanzon, B. A differential-geometric approach to approximate nonlinear filtering. In C.T.J. Dodson, Geometrization of Statistical Theory, pages 219 – 223,ULMD Publications, University of Lancaster, 1987. [18] B. Hanzon, Identifiability, recursive identification and spaces of linear dynamical systems, CWI Tracts 63 and 64, CWI, Amsterdam, 1989 [19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence of finite dimensional filters for conditional statistics of the cubic sensor problem, Systems and Control Letters 3 (1983) 331–340. [20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften, vol. 288 (1987), Springer-Verlag, Berlin, D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 35 / 37 Conclusions and References References VI [21] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochastic differential equations for the non linear filtering problem. Osaka J. Math. Volume 9, Number 1 (1972), 19-40. [23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets of Probability Distributions. Presented at the 1st International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 29 June - 2 July 1999 [24] R. Z. Khasminskii (1980). Stochastic Stability of Differential Equations. Alphen aan den Reijn [25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I, General Theory (Springer Verlag, Berlin, 1978). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 36 / 37 Conclusions and References References VII [26] M. Murray and J. Rice - Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993. [27] D. Ocone, E. Pardoux, A Lie algebraic criterion for non-existence of finite dimensionally computable filters, Lecture notes in mathematics 1390, 197–204 (Springer Verlag, 1989) [28] Pistone, G., and Sempi, C. (1995). An Infinite Dimensional Geometric Structure On the space of All the Probability Measures Equivalent to a Given one. The Annals of Statistics 23(5), 1995 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 37 / 37

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14270
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_77
Authors = Ali Mohammad-Djafari
Keywords =
Abstract
Clustering, classification and Pattern Recognition in a set of data are between the most important tasks in statistical researches and in many applications. In this paper, we propose to use a mixture of Student-t distribution model for the data via a hierarchical graphical model and the Bayesian framework to do these tasks. The main advantages of this model is that the model accounts for the uncertainties of variances and covariances and we can use the Variational Bayesian Approximation (VBA) methods to obtain fast algorithms to be able to handle large data sets.


Voir la vidéo
Variational Bayesian Approximation method for Classification and Clustering with a mixture of Studen

. Variational Bayesian Approximation method for Classification and Clustering with a mixture of Student-t model Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr http://djafari.free.fr http://publicationslist.org/djafari A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 1/20 Contents 1. Mixture models 2. Different problems related to classification and clustering Training Supervised classification Semi-supervised classification Clustering or unsupervised classification 3. Mixture of Student-t 4. Variational Bayesian Approximation 5. VBA for Mixture of Student-t 6. Conclusion A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 2/20 Mixture models General mixture model p(x|a, Θ, K) = K k=1 ak pk(xk|θk), 0 < ak < 1 Same family pk(xk|θk) = p(xk|θk), ∀k Gaussian p(xk|θk) = N(xk|µk, Σk) with θk = (µk, Σk) Data X = {xn, n = 1, · · · , N} where each element xn can be in one of these classes cn. ak = p(cn = k), a = {ak, k = 1, · · · , K}, Θ = {θk, k = 1, · · · , K} p(Xn, cn = k|a, θ) = N n=1 p(xn, cn = k|a, θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 3/20 Different problems Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classification: Given a sample xm and the parameters K, a and Θ determine its class k∗ = arg max k {p(cm = k|xm, a, Θ, K)} . Semi-supervised classification (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k∗ = arg max k {p(cm = k|xm, Θ, K)} . Clustering or unsupervised classification (Number of classes K is not known): Given a set of data X, determine K and c. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 4/20 Training Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): (a, Θ) = arg max (a,Θ) {p(X, c|a, Θ, K)} . Bayesian: Assign priors p(a|K) and p(Θ|K) = K k=1 p(θk) and write the expression of the joint posterior laws: p(a, Θ|X, c, K) = p(X, c|a, Θ, K) p(a|K) p(Θ|K) p(X, c|K) where p(X, c|K) = p(X, c|a, Θ|K)p(a|K) p(Θ|K) da dΘ Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 5/20 Supervised classification Given a sample xm and the parameters K, a and Θ determine p(cm = k|xm, a, Θ, K) = p(xm, cm = k|a, Θ, K) p(xm|a, Θ, K) where p(xm, cm = k|a, Θ, K) = akp(xm|θk) and p(xm|a, Θ, K) = K k=1 ak p(xm|θk) Best class k∗: k∗ = arg max k {p(cm = k|xm, a, Θ, K)} A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 6/20 Semi-supervised classification Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = k|xm, Θ, K) = p(xm, cm = k|Θ, K) p(xm|Θ, K) where p(xm, cm = k|Θ, K) = p(xm, cm = k|a, Θ, K)p(a|K) da and p(xm|Θ, K) = K k=1 p(xm, cm = k|Θ, K) Best class k∗, for example the MAP solution: k∗ = arg max k {p(cm = k|xm, Θ, K)} . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 7/20 Clustering or non-supervised classification Given a set of data X, determine K and c. Determination of the number of classes: p(K = L|X) = p(X, K = L) p(X) = p(X|K = L) p(K = L) p(X) and p(X) = L0 L=1 p(K = L) p(X|K = L), where L0 is the a priori maximum number of classes and p(X|K = L) = n L k=1 akp(xn, cn = k|θk)p(a|K) p(Θ|K) da dΘ When K and c are determined, we can also determine the characteristics of those classes a and Θ. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 8/20 Mixture of Student-t model Student-t and its Infinite Gaussian Scaled Model (IGSM): T (x|ν, µ, Σ) = ∞ 0 N(x|µ, z−1 Σ) G(z| ν 2 , ν 2 ) dz where N(x|µ, Σ)= |2πΣ|−1 2 exp −1 2(x − µ) Σ−1 (x − µ) = |2πΣ|−1 2 exp −1 2Tr (x − µ)Σ−1 (x − µ) and G(z|α, β) = βα Γ(α) zα−1 exp [−βz] . Mixture of Student-t: p(x|{νk, ak, µk, Σk, k = 1, · · · , K}, K) = K k=1 ak T (xn|νk, µk, Σk). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 9/20 Mixture of Student-t model Introducing znk, zk = {znk, n = 1, · · · , N}, Z = {znk}, c = {cn, n = 1, · · · , N}, θk = {νk, ak, µk, Σk}, Θ = {θk, k = 1, · · · , K} Assigning the priors p(Θ) = k p(θk), we can write: p(X, c, Z, Θ|K) = n k akN(xn|µk, z−1 n,k Σk) G(znk|νk 2 , νk 2 ) p(θk) Joint posterior law: p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K) p(X|K) . The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classification or clustering. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA) Main idea: to propose easy computational approximation q(c, Z, Θ) for p(c, Z, Θ|X, K). Criterion: KL(q : p) Interestingly, by noting that p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K)/p(X|K) we have: KL(q : p) = −F(q) + ln p(X|K) where F(q) = − ln p(X, c, Z, Θ|K) q is called free energy of q and we have the following properties: – Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(X|K). – When the optimum q∗ is obtained, F(q∗) can be used as a criterion for model selection. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 11/20 VBA: choosing the good families Using KL(q : p) has the very interesting property that using q to compute the means we obtain the same values if we have used p (Conservation of the means). Unfortunately, this is not the case for variances or other moments. If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 12/20 Hierarchical graphical model ξ0 d d‚    © αk   βk   znk   E γ0, Σ0 c Σk   µ0, η0 c µk   k0 c a   d d‚    © d d‚    © ¨ ¨¨¨ ¨¨%xn   E Figure : Graphical representation of the model. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 13/20 VBA for mixture of Student-t In our case, noting that p(X, c, Z, Θ|K) = n k p(xn, cn, znk|ak, µk, Σk, νk) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with p(xn, cn, znk|ak, µk, Σk, νk) = N(xn|µk, z−1 n,k Σk) G(znk|αk, βk) is separable, in one side for [c, Z] and in other size in components of Θ, we propose to use q(c, Z, Θ) = q(c, Z) q(Θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 14/20 VBA for mixture of Student-t With this decomposition, the expression of the Kullback-Leibler divergence becomes: KL(q1(c, Z)q2(Θ) : p(c, Z, Θ|X, K) = c q1(c, Z)q2(Θ) ln q1(c, Z)q2(Θ) p(c, Z, Θ|X, K) dΘ dZ The expression of the Free energy becomes: F(q1(c, Z)q2(Θ)) = c q1(c, Z)q2(Θ) ln p(X, c, Z|Θ, K)p(Θ|K) q1(c, Z)q2(Θ) dΘ dZ A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Student-t priors model Using a generalized Student-t obtained by replacing G(zn,k|νk 2 , νk 2 ) by G(zn,k|αk, βk) it will be easier to propose conjugate priors for αk, βk than for νk. p(xn, cn = k, znk|ak, µk, Σk, αk, βk, K) = ak N(xn|µk, z−1 n,k Σk) G(zn,k|αk, βk). In the following, noting by Θ = {(ak, µk, Σk, αk, βk), k = 1, · · · , K}, we propose to use the factorized prior laws: p(Θ) = p(a) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with the following components:    p(a) = D(a|k0), k0 = [k0, · · · , k0] = k01 p(αk) = E(αk|ζ0) = G(αk|1, ζ0) p(βk) = E(βk|ζ0) = G(αk|1, ζ0) p(µk|Σk) = N(µk|µ01, η−1 0 Σk) p(Σk) = IW(Σk|γ0, γ0Σ0) A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Student-t priors model where D(a|k) = Γ( l kk) l Γ(kl ) l akl −1 l is the Dirichlet pdf, E(t|ζ0) = ζ0 exp [−ζ0t] is the Exponential pdf, G(t|a, b) = ba Γ(a) ta−1 exp [−bt] is the Gamma pdf and IW(Σ|γ, γ∆) = |1 2∆|γ/2 exp −1 2Tr ∆Σ−1 ΓD(γ/2)|Σ| γ+D+1 2 . is the inverse Wishart pdf. With these prior laws and the likelihood: joint posterior law: pk(c, Z, Θ|X) = p(X, c, Z, Θ) p(X) . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 17/20 Expressions of q q(c, Z, Θ) = q(c, Z) q(Θ) = n k[q(cn = k|znk) q(znk)] k[q(αk) q(βk) q(µk|Σk) q(Σk)] q(a). with:    q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ] q(αk) = G(αk|˜ζk, ˜ηk) q(βk) = G(βk|˜ζk, ˜ηk) q(µk|Σk) = N(µk|µ, ˜η−1Σk) q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ) With these choices, we have F(q(c, Z, Θ)) = ln p(X, c, Z, Θ|K) q(c,Z,Θ) = k n F1kn + k F2k F1kn = ln p(xn, cn, znk, θk) q(cn=k|znk )q(znk ) F2k = ln p(xn, cn, znk, θk) q(θk )A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 18/20 VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) fixed, we obtain the expression of q(cn = k|znk) = ˜ak, q(znk) = G(znk|αk, βk). M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) fixed, we obtain the expression of q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ], q(αk) = G(αk|˜ζk, ˜ηk), q(βk) = G(βk|˜ζk, ˜ηk), q(µk|Σk) = N(µk|µ, ˜η−1Σk), and q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ), which gives the updating algorithm for the corresponding tilded parameters. F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. Final value of F(q) for each value of K, noted Fk, can be used as a criterion for model selection, i.e.; the determination of the number of clusters. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 19/20 Conclusions Clustering and classification of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. Mixture models and in particular Mixture of Gaussians are classical models for these tasks. We proposed to use a mixture of generalised Student-t distribution model for the data via a hierarchical graphical model. To obtain fast algorithms and be able to handle large data sets, we used conjugate priors everywhere it was possible. The proposed algorithm has been used for clustering, classification and discriminant analysis of some biological data (Cancer research related), but in this paper, we only presented the main algorithm. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 20/20

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14271
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_78
Authors = Tomonari Sei, Ushio Tanaka
Keywords =
Abstract
The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix in order to draw a parallel coordinate plot. In this paper, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a geometrical viewpoint. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are restricted to be full-rank.


Voir la vidéo
Differential geometric properties of textile plot

What is textile plot? Textile set Main result Other results Summary Geometric Properties of textile plot Tomonari SEI and Ushio TANAKA University of Tokyo and Osaka Prefecture University at ´Ecole Polytechnique, Oct 28, 2015 1 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary 1 What is textile plot? 2 Textile set 3 Main result 4 Other results 5 Summary 3 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Coefficients a = (aj ) and b = (bj ) are the solution of the following minimization problem: Minimize a,b n∑ t=1 p∑ j=1 (ytj − ¯yt·)2 subject to yj = aj + bj xj , p∑ j=1 yj 2 = 1. Intuition: as horizontal as possible. Solution: a = 0 and b is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of X. yt1 yt2 yt3 yt4 yt5 yt. 6 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = 100, p = 4) X ∈ R100×4. Each row ∼ N(0, Σ), Σ =   1 −0.6 0.5 0.1 −0.6 1 −0.6 −0.2 0.5 −0.6 1 0.0 0.1 −0.2 0.0 1  . −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 (a) raw data X (b) textile plot Y 7 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = p = 2) T2,2 ⊂ R4 is the union of two tori, glued along O(2). θ φ ξ η T2,2 = { 1 √ 2 ( cos θ cos φ sin θ sin φ )} ∪ { 1 2 ( cos ξ + cos η cos ξ − cos η sin ξ + sin η sin ξ − sin η )} 11 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary Main theorem The differential geometrical property of U∗∗ n,p is given as follows: Theorem Let n ≥ p ≥ 3. Then we have the following decomposition U∗∗ n,p = M1 ∪ M2, where each Mi is a differentiable manifold, the dimensions of which are given by dim M1 = p(p + 1) 2 − (p − 1), dim M2 = p(p + 1) 2 − p, respectively. M2 is connected while M1 may not. 16 / 23 What is textile plot? Textile set Main result Other results Summary Example U∗∗ 3,3 is the union of 4-dim and 3-dim manifolds. We look at a cross section with y11 = y22 = 1: y12 y13 y33 Union of a surface and a vertical line. 17 / 23 What is textile plot? Textile set Main result Other results Summary Corollary Let n ≥ p ≥ 3. Then we have U∗ n,p = π−1 (M1) ∪ π−1 (M2), where π denotes the map of Gram-Schmidt orthonormalization. The dimensions are dim π−1 (M1) = np − (p − 1), dim π−1 (M2) = np − p. 18 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Lastly, we state a characterization of fλ −1 (O) from the viewpoint of differential geometry. Theorem Let λ ≥ 0. fλ −1 (O) is a regular sub-manifold of Rn×p with codimension p + 1 whenever λ > 0, y11yjj − y1j yj1 = 0, j = 2, . . . , p, ∃ ∈ { 2, . . . , p }; p∑ j=2 yij + yi (1 − 2λ) = 0, i = 1, . . . , n. 21 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary References . 1 Absil, P.-A., Mahony, R., and Sepulchre, R. (2008), Optimization Algorithms on Matrix Manifolds, Princeton University Press. . 2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot, Proceedings of the Institute of Statistical Mathematics, 55, 69–83. . 3 Inselberg, A. (2009), Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Springer. 4 Kumasaka, N. and Shibata, R. (2008), High-dimensional data visualisation: The textile plot, Computational Statistics and Data Analysis, 52, 3616–3644. 23 / 23

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14272
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_79
Authors = Hiroshi Matsuzoe, Monta Sakamoto
Keywords =
Abstract
In anomalous statistical physics, deformed algebraic structures are important objects. Heavily tailed probability distributions, such as Student’s t-distributions, are characterized by deformed algebras. In addition, deformed algebras cause deformations of expectations and independences of random variables. Hence, a generalization of independence for multivariate Student’s t-distribution is studied in this paper. Even if two random variables which follow to univariate Student’s t-distributions are independent, the joint probability distribution of these two distributions is not a bivariate Student’s t-distribution. It is shown that a bivariate Student’s t-distribution is obtained from two univariate Student’s t-distributions under q-deformed independence.


Voir la vidéo
A generalization of independence and multivariate Student's t-distributions

A generalization of independence and multivariate Student’s t-distributions MATSUZOE Hiroshi Nagoya Institute of Technology joint works with SAKAMOTO Monta (Efrei, Paris) 1 Deformed exponential family 2 Non-additive differentials and expectation functionals 3 Geometry of deformed exponential families 4 Generalization of independence 5 q-independence and Student’s t-distributions 6 Appendix   Notions of expectations, independence are determined from the choice of statistical models.  Introduction: Geometry and statistics • Geometry for the sample space • Geometry for the parameter space • Wasserstein geometry • Optimal transport theory • A pdf is regarded as a distribution of mass • Information geometry • Convexity of entropy and free energy • Duality of estimating function

Hessian Information Geometry (chaired by Shun-Ichi Amari, Michel Boyom)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14278
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_25
Authors = Charles Cavalcante, David de Souza, Rui Vigelis
Keywords =
Abstract
We define a metric and a family of α-connections in statistical manifolds, based on ϕ-divergence, which emerges in the framework of ϕ-families of probability distributions. This metric and α-connections generalize the Fisher information metric and Amari’s α-connections. We also investigate the parallel transport associated with the α-connection for α = 1.


Voir la vidéo
New metric and connections in statistical manifolds

2nd Conference on Geometric Science of Information, GSI2015 October 28–30, 2015 – Ecole Polytechnique, Paris-Saclay New Metric and Connections in Statistical Manifolds Rui F. Vigelis,1 David C. de Souza,2 and Charles C. Cavalcante3 1 3 Federal University of Ceará – Brazil 2 Federal Institute of Ceará – Brazil Session “Hessian Information Geometry”, October 28 Outline Introduction ϕ-Functions ϕ-Divergence Generalized Statistical Manifold Connections ϕ-Families Discussion Introduction In the paper R.F. Vigelis, C.C. Cavalcante. On ϕ-families of probability distributions. J. Theor. Probab., 26(3):870–884, 2013, the authors proposed the so called ϕ-divergence Dϕ(p q), for p, q ∈ Pµ. The ϕ-divergence is defined in terms of a ϕ-function. The metric and connections that we propose is derived from the ϕ-divergence Dϕ(· ·). Introduction The proposition of new geometric structures (metric and connections) in statistical manifolds is a recurrent research topic. To cite a few: J. Zhang. Divergence function, duality, and convex analysis. Neural Computation, 16(1): 159–195, 2004. J. Naudts. Estimators, escort probabilities, and φ-exponential families in statistical physics. JIPAM, 5(4): Paper No. 102, 15 p., 2004. S.-i. Amari, A. Ohara, H. Matsuzoe. Geometry of deformed exponential families: invariant, dually-flat and conformal geometries. Physica A, 391(18): 4308–4319, 2012. H. Matsuzoe. Hessian structures on deformed exponential families and their conformal structures. Differential Geom. Appl, 35(suppl.): 323–333, 2014. Introduction Let (T, Σ, µ) be a measure space. All probability distributions will be considered Pµ = p ∈ L0 : p > 0 and ˆ T pdµ = 1 , where L0 denotes the set of all real-valued, measurable functions on T, with equality µ-a.e. ϕ-Functions A function ϕ: R → (0, ∞) is said to be a ϕ-function if the following conditions are satisfied: (a1) ϕ(·) is convex; (a2) limu→−∞ ϕ(u) = 0 and limu→∞ ϕ(u) = ∞; (a3) there exists a measurable function u0 : T → (0, ∞) such that ˆ T ϕ(c(t) + λu0(t))dµ < ∞, for all λ > 0, for each measurable function c : T → R such that ϕ(c) ∈ Pµ. Not all functions satisfying (a1) and (a2) admit the existence of u0. Condition (a3) is imposed so that ϕ-families are parametrizations for Pµ in the same manner as exponential families. ϕ-Functions The κ-exponential function expκ : R → (0, ∞), for κ ∈ [−1, 1], which is given by expκ(u) = (κu + √ 1 + κ2u2)1/κ, if κ = 0, exp(u), if κ = 0, is a ϕ-function. The q-exponential function expq(u) = [1 + (1 − q)u] 1 1−q + , where q > 0 and q = 1, is not a ϕ-function (expq(u) = 0 for u < 1/(1 − q)). A ϕ-function ϕ(·) may not be a φ-exponential function expφ(·), which is defined as the inverse of lnφ(u) = ˆ u 1 1 φ(x) dx, u > 0, for some increasing function φ: [0, ∞) → [0, ∞). ϕ-Divergence We define the ϕ-divergence as Dϕ(p q) = ˆ T ϕ−1(p) − ϕ−1(q) (ϕ−1) (p) dµ ˆ T u0 (ϕ−1) (p) dµ , for any p, q ∈ Pµ. If ϕ(·) = exp(·) and u0 = 1 then Dϕ(p q) coincides with the Kullback–Leibler divergence DKL(p q) = ˆ T p log p q dµ. Generalized Statistical Manifold A metric (gij ) can be derived from the ϕ-divergence: gij = − ∂ ∂θi p ∂ ∂θj q Dϕ(p q) q=p = −Eθ ∂2fθ ∂θi ∂θj , where fθ = ϕ−1(pθ) and Eθ[·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . Considering the log-likelihood function lθ = log(pθ) in the place of fθ = ϕ−1(pθ), we get the Fisher information matrix. Generalized Statistical Manifold A family o probability distributions P = {pθ : θ ∈ Θ} ⊆ Pµ is said to be a generalized statistical manifold if the following conditions are satisfied: (P1) Θ is a domain (an open and connected set) in Rn. (P2) p(t; θ) = pθ(t) is a differentiable function with respect to θ. (P3) The operations of integration with respect to µ and differentiation with respect to θi commute. (P4) The matrix g = (gij ), which is defined by gij = −Eθ ∂2fθ ∂θi ∂θj , is positive definite at each θ ∈ Θ. Generalized Statistical Manifold The matrix (gij ) can also be expressed as gij = Eθ ∂fθ ∂θi ∂fθ ∂θj , where Eθ [·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . As consequence, the mapping X = i ai ∂ ∂θi → X = i ai ∂fθ ∂θi is an isometry between the tangent space TθP at pθ and TθP = span ∂fθ ∂θi : i = 1, . . . , n , equipped with the inner product X, Y θ = Eθ [XY ]. Connections We use the ϕ-divergence Dϕ(· ·) to define a pair of mutually dual connections D(1) and D(−1), whose Christoffel symbols are given by Γ (1) ijk = − ∂2 ∂θi ∂θj p ∂ ∂θk q Dϕ(p q) q=p and Γ (−1) ijk = − ∂ ∂θk p ∂2 ∂θi ∂θj q Dϕ(p q) q=p . Connections D(1) and D(−1) correspond to the exponential e mixture connections. Connections Expressions for the Christoffel symbols Γ (1) ijk and Γ (−1) ijk are given by Γ (1) ijk = Eθ ∂2fθ ∂θi ∂θj ∂fθ ∂θk − Eθ ∂2fθ ∂θi ∂θj Eθ u0 ∂fθ ∂θk and Γ (−1) ijk = Eθ ∂2fθ ∂θi ∂θj ∂fθ ∂θk + Eθ ∂fθ ∂θi ∂fθ ∂θj ∂fθ ∂θk − Eθ ∂fθ ∂θj ∂fθ ∂θk Eθ u0 ∂fθ ∂θi − Eθ ∂fθ ∂θi ∂fθ ∂θk Eθ u0 ∂fθ ∂θj , where Eθ [·] = ´ T (·)ϕ (fθ)dµ ´ T u0ϕ (fθ)dµ . Terms in red vanish if ϕ(·) = exp(·) and u0 = 1. Connections Using the pair of mutually dual connections D(1) and D(−1), we can specify a family of α-connections D(α) in generalized statistical manifolds, whose Christoffel symbols are Γ (α) ijk = 1 + α 2 Γ (1) ijk + 1 − α 2 Γ (−1) ijk . The connections D(α) and D(−α) are mutually dual. For α = 0 , the connection D(0), which is clearly self-dual. corresponds to the Levi–Civita connection . ϕ-Families A parametric ϕ-family Fp = {pθ : θ ∈ Θ} centered at p = ϕ(c) is defined by pθ(t) := ϕ c(t) + n i=1 θi ui (t) − ψ(θ)u0(t) , where ψ: Θ → [0, ∞) is a normalizing function. The functions satisfy some conditions, which imply ψ ≥ 0. The domain Θ can be chosen to be maximal. If ϕ(·) = exp(·) and u0 = 1, then Fp corresponds to an exponential family. ϕ-Families The normalizing function and ϕ-divergence are related by ψ(θ) = Dϕ(p pθ). The matrix (gij ) is the Hessian of the normalizing function ψ: gij = ∂2ψ ∂θi ∂θj . As a result, Γ (0) ijk = 1 2 ∂gij ∂θk = 1 2 ∂2ψ ∂θi ∂θj ∂θj . ϕ-Families In ϕ-families, the Christoffel symbols Γ (1) ijk vanish identically, i.e., (θi ) is an affine coordinate system, and the connection D(1) is flat (and D(−1) is also flat). Thus Fp admits a coordinate system (ηj ) that is dual to (θi ), and there exist potential functions ψ and ψ∗ such that θi = ∂ψ∗ ∂ηi , ηj = ∂ψ ∂θj , and ψ(p) + ψ∗ (p) = i θi (p)ηi (p). Discussion Advantages of (gij ), and Γ (1) ijk , Γ (−1) ijk being derived from Dϕ(· ·): Duality. Pythagorean Relation. Projection Theorem. Open questions: An example of generalized statistical manifold whose coordinate system is D(−1) -flat. Parallel transport with respect to D(−1) . Divergence or ϕ-function associated with α-connections. End Thank you!

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14279
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_26
Authors = Barbara Opozda
Keywords = Affine connection, Curvature tensor, Laplacian Bochner’s technique, Ricci tensor, Sectional curvature
Abstract
Curvature properties for statistical structures are studied. The study deals with the curvature tensor of statistical connections and their duals as well as the Ricci tensor of the connections, Laplacians and the curvature operator. Two concepts of sectional curvature are introduced. The meaning of the notions is illustrated by presenting few exemplary theorems.


Voir la vidéo
Curvatures of Statistical Structures

Curvatures of statistical structures Barbara Opozda Paris, October 2015 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 1 / 29 Statistical structures - statistical setting M - open subset of Rn Λ - probability space with a fixed σ-algebra p : M × Λ (x, λ) → p(x, λ) ∈ R - smooth relative to x such that px (λ) := p(x, λ) is a probability measure on Λ — probability distribution (x, λ) := log(p(x, λ)) gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to the probability px ∀x ∈ M, ∂1, ..., ∂n - the canonical frame on M g – Fisher information metric tensor field on M Cijk(x) = Ex [(∂i )(∂j )(∂k )] - cubic form (g, C) – statistical structure on M Barbara Opozda () Curvatures of statistical structures Paris, October 2015 2 / 29 Statistical structures (Codazzi structures)– geometric setting; three equivalent definitions M – manifold, dim M = n I) (g, C), C - totally symmetric (0, 3)-tensor field on M, that is, C(X, Y , Z) = C(Y , X, Z) = C(Y , Z, X) ∀X, Y , Z ∈ Tx M, x ∈ M C – cubic form II) (g, K), K – symmetric (1, 2)-tensor field (i.e., K(X, Y ) = K(Y , X)) and symmetric relative to g, that is, g(X, K(Y , Z)) = g(Y , K(X, Z)) is symmetric for all arguments. C(X, Y , Z) = g(X, K(Y , Z)) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 3 / 29 III) (g, ), - torsion-free connection such that ( X g)(Y , Z) = ( Y g)(X, Z) (1) — statistical connection T – any tensor field of type (p, q) on M, T – of type (p, q + 1) T(X, Y1, ..., Yq) = ( X T)(Y1, ..., Yq) In particular, g(X, Y , Z) = ( X g)(Y , Z) (1) ⇔ g is a symmetric cubic form ˆ - Levi-Civita connection for g K(X, Y ) := X Y − ˆ X Y K – difference tensor g(X, Y , Z) = −2g(X, K(Y , Z)) = −2C(X, Y , Z) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 4 / 29 A statistical structure is trivial if and only if K = 0 or equivalently C = 0 or equivalently = ˆ . KX Y := K(X, Y ) E := tr g K = K(e1, e1) + ... + K(en, en) = (tr Ke1 )e1 + ... + (tr Ken )en E – mean difference vector field E = 0 ⇔ tr KX = 0 ∀X ∈ TM ⇔ tr g C(X, ·, ·) = 0 ∀X ∈ TM E = 0 ⇒ trace-free statistical structure Fact. (g, ) – trace-free if and only if νg = 0, where νg – volume form determined by g Barbara Opozda () Curvatures of statistical structures Paris, October 2015 5 / 29 Examples Riemannian geometry of the second fundamental form M – locally strongly hypersurface in Rn+1 – the second fundamental form h satisfies the Codazzi equation h(X, Y , Z) = h(Y , X, Z), where is the induced connection (the Levi-Civita connection of the first fundamental form) (h, ) - statistical structure Similarly one gets statistical structures on hypersurfaces in space forms. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 6 / 29 Equiaffine geometry of hypersurfaces in the standard affine space Rn+1 M – locally strongly convex hypersurface in Rn+1 ξ – a transversal vector field D – standard flat connection on Rn+1, X, Y ∈ X(M), ξ - transversal vector field DX Y = X Y + h(X, Y )ξ − Gauss formula – induced connection, h – second fundamental form (metric tensor field) DX ξ = −SX + τ(X)ξ − Weingarten formula If τ = 0, ξ is called equiaffine. In this case the Codazzi equation is satisfied h(X, Y , Z) = h(Y , X, Z) (h, ) – statistical structure Barbara Opozda () Curvatures of statistical structures Paris, October 2015 7 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Geometry of Lagrangian submanifolds in Kaehler manifolds N – Kaehler manifold of real dimension 2n and with complex structure J M – Lagrangian submanifold of N - n-dimensional submanifold such that JTM orthogonal to TM, i.e. JTM is the normal bundle (in the metric sense) for M ⊂ N D – the Kaehler connection on N DX Y = X Y + JK(X, Y ) g – induced metric tensor field on M (g, K) – statistical structure It is trace-free ⇔ M is minimal in N. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 11 / 29 Most of statistical structures are outside the three classes of examples. For instance, in order that a statistical structure is locally realizable on an equiaffine hypersurface it is necessary that is projectively flat. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 12 / 29 Dual connections, curvature tensors g – metric tensor field on M, – any connection Xg(Y , Z) = g( X Y , Z) + g(Y , X Z) (2) – dual connection (g, ) – statistical structure if and only if (g, ) – statistical structure R(X, Y )Z – (1, 3) - curvature tensor for If R = 0 the structure is called Hessian R(X, Y )Z – curvature tensor for g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) (3) In particular, R = 0 ⇔ R = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 13 / 29 ˆ – Levi-Civita connection for g, = ˆ + K, = ˆ − K ˆR – curvature tensor for ˆ R(X, Y ) = ˆR(X, Y ) + ( ˆ X K)Y − ( ˆ Y K)X + [KX , KY ] (4) , where [KX , KY ] = KX KY − KY KX R(X, Y ) = ˆR(X, Y ) − ( ˆ X K)Y + ( ˆ Y K)X + [KX , KY ] (5) R(X, Y ) + R(X, Y ) = 2ˆR(X, Y ) + 2[KX , KY ] (6) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 14 / 29 Sectional curvatures R does not have to be skew-symmetric relative to g, i.e. g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z), in general. Lemma * The following conditions are equivalent: 1) g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) ∀X, Y , Z, W 2) R = R 3) ˆ K is symmetric, that is, ( ˆ K)(X, Y , Z) = ( ˆ X K)(Y , Z) = ( ˆ Y K)(X, Z) = ( ˆ K)(Y , X, Z) ∀X, Y , Z. For hypersurfaces in Rn+1 each of the above conditions describes an affine sphere Barbara Opozda () Curvatures of statistical structures Paris, October 2015 15 / 29 R := R+R 2 [K, K](X, Y )Z := [KX , KY ]Z R(X, Y )Z and [K, K](X, Y )Z are Riemann-curvature-like tensors – they are skew-symmetric in X, Y , satisfy the first Bianchi identity, R(X, Y ), [K, K](X, Y ) are skew-symmetric relative to g ∀X, Y π – vector plane in Tx M, X, Y – orthonormal basis of π sectional curvature for g – ˆk(π) := g(ˆR(X, Y )Y , X) sectional K-curvature – k(π) := g([K, K](X, Y )Y , X) sectional -curvature – k (π) := g(R(X, Y )Y , X) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 16 / 29 In general, Schur’s lemma does not hold for k and k. We have, however, Lemma Assume that M is connected, dim M > 2 and the sectional - curvature (the sectional K-curvature) is point-wise constant. If one of the equivalent conditions in Lemma * holds then the sectional -curvature (the sectional K-curvature) is constant on M. sectional K-curvature The easiest situation which should be taken into account is when the sectional K-curvature is constant for all vector planes in Tx M. In this respect we have Barbara Opozda () Curvatures of statistical structures Paris, October 2015 17 / 29 Theorem If the sectional K-curvature is constant and equal to A for all vector planes in Tx M then there is an orthonormal basis e1, ..., en of Tx M and numbers λ1, ..., λn, µ1, ..., µn−1 such that Ke1 =       λ1 µ1 ... µ1       Kei =              µ1 ... µi−1 µ1 · · · µi−1 λi µi ... µi              Ken =       µ1 ... µn−1 µ1 · · · µn−1 λn       Barbara Opozda () Curvatures of statistical structures Paris, October 2015 18 / 29 continuation of the theorem Moreover µi = λi − λ2 i − 4Ai−1 2 , Ai = Ai−1 − µ2 i , for i = 1, ..., n − 1 where A0 = A. The above representation of K is not unique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µi for i = 1, ..., n − 1 are expressed as follows λi = (n − i) −Ai−1 n − i + 1 , µi = − −Ai−1 n − i + 1 . In particular, in the last case the numbers λi , µi depend only on A and the dimension of M. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 19 / 29 Example 1. Ke1 =       λ λ/2 ... λ/2       Kei =              λ/2 ... 0 λ/2 · · · 0 0 0 ... 0              Ken =       λ/2 ... 0 λ/2 · · · 0 0       The sectional K-curvature is constant = λ2/4 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 20 / 29 Example 2. K-curvature vanishes, i.e. [K, K] = 0. There is an orthonormal frame e1, ..., e1 such that Ke1 =       λ1 0 ... 0       Kei =              0 ... 0 0 · · · 0 λi 0 ... 0              Ken =       0 ... 0 0 · · · 0 λn       Barbara Opozda () Curvatures of statistical structures Paris, October 2015 21 / 29 Some theorems on the sectional K-curvature (g, K) – trace-free if E = tr g K = 0 Theorem Let (g, K) be a trace-free statistical structure on M with symmetric ˆ K. If the sectional K-curvature is constant then either K = 0 (the statistical structure is trivial) or ˆR = 0 and ˆ K = 0. Theorem Let ˆ K = 0. Each of the following conditions implies that ˆR = 0: 1) the sectional K-curvature is negative, 2) [K,K]=0 and K is non-degenerate, i.e. X → KX is a monomorphism. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 22 / 29 Theorem K is as in Example 1. at each point of M, ˆ K is symmetric, div E is constant on M (E = tr g K). Then the sectional curvature for g by any plane containing E is non-positive. Moreover, if M is connected it is constant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g) by any plane containing E vanishes. Theorem If the sectional K-curvature is non-positive on M and [K, K] · K = 0 then the sectional K-curvature vanishes on M. Corollary If (g, K) is a Hessian structure on M with non-negative sectional curvature of g and such that ˆR · K = 0 then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 23 / 29 Theorem The sectional K-curvature is negative on M, ˆR · K = 0. Then ˆR = 0. Theorem Let M be a Lagrangian submanifold of N, where N is a Kaehler manifold of constant holomorphic curvature 4c, the sectional curvature of the first fundamental form g on M is smaller than c on M and ˆR · K = 0, where K is the second fundamental tensor of M ⊂ N. Then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 24 / 29 -sectional curvature All affine spheres are statistical manifolds of constant sectional -curvature A Riemann curvature-like tensor defines the curvature operator. For instance, for the curvature tensor R = (R + R)/2 we have the curvature operator R : Λ2TM → Λ2TM given by g(R(X ∧ Y ), Z ∧ W ) = g(R(Z, W )Y , X) A curvature operator is symmetric relative to the canonical extension of g to the bundle Λ2TM. Hence it is diagonalizable. In particular, it can be positive definite, negative definite etc. The assumption that R is positive definite is stronger than the assumption that the sectional -curvature is positive. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 25 / 29 Theorem Let M be a connected compact oriented manifold and (g, ) be a trace-free statistical structure on M. If R = R and the curvature operator determined by the curvature tensor ˆR is positive definite on M then the sectional -curvature is constant. Theorem Let M be a connected compact oriented manifold and (g, ) be a trace-free statistical structure on M. If the curvature operator for R = R+R 2 is positive on M then the Betti numbers b1(M) = ... = bn−1(M) = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 26 / 29 sectional curvature for g ˆk(π) = g(ˆR(X, Y )Y , X), X, Y – an orthonormal basis for π Theorem Let M be a compact manifold equipped with a trace-free statistical structure (g, ) such that R = R. If the sectional curvature ˆk for g is positive on M then the structure is trivial, that is = ˆ . In the 2-dimensional case we have Theorem Let M be a compact surface equipped with a trace-free statistical structure (g, ). If M is of genus 0 and R = R then the structure is trivial. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 27 / 29 B. Opozda, Bochner’s technique for statistical manifolds, Annals of Global Analysis and Geometry, DOI 10.1007/s10455-015-9475-z B. Opozda, A sectional curvature for statistical structures, arXiv:1504.01279[math.DG] Barbara Opozda () Curvatures of statistical structures Paris, October 2015 28 / 29 Hessian structures (g, ) – Hessian if R = 0. Then R = 0 and ˆR = −[K, K]. (g, ) is Hessian if and only if ˆ K is symmetric and ˆR = −[K, K]. All Hessian structure are locally realizable on affine hypersurfaces in Rn+1 equipped with Calabi’s structure. If they are trace-free they are locally realizable on improper affine spheres. If the difference tensor is as in Example 1. and the structure is Hessian then K = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 29 / 29

Authors

Frédéric Barbaresco

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International

Geometric Science of Information SEE/SMAI GSI’15 Conference LIX Colloquium 2015 Frédéric BARBARESCO* & Frank Nielsen** GSI’15 General Chairmen (*) President of SEE ISIC Club (Ingéniérie des Systèmes d’Information de Communications) (**) LIX Department, Ecole Polytechnique Société de l'électricité, de l'électronique et des technologies de l'information et de la communication Flash-back GSI’13 Ecole des Mines de Paris Hirohiko Shima Jean-Louis Koszul Shin-Ichi Amari SEE at a glance • Meeting place for science, industry and society • An officialy recognised non-profit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 19 «Clubs techniques» and 12 «Groupes régionaux» • Organizes conferences and seminars • Initiates/attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, Blanc-Lapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 3 periodical publications (REE, …) & 3 monographs each year • Web: http://www.see.asso.fr and LinkedIn SEE group • SEE Presidents: Louis de Broglie, Paul Langevin, … 1883-2015: From SIE & SFE to SEE: 132 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 Louis de Broglie Paul Langevin GSI’15 Sponsors GSI Logo: Adelard of Bath • He left England toward the end of the 11th century for Tours in France • Adelard taught for a time at Laon, leaving Laon for travel no later than 1109. • After Laon, he travelled to Southern Italy and Sicily no later than 1116. • Adelard also travelled extensively throughout the "lands of the Crusades": Greece, West Asia, Sicily, Spain, and potentially Palestine. The frontispiece of an Adelard of Bath Latin translation of Euclid's Elements, c. 1309– 1316; the oldest surviving Latin translation of the Elements is a 12th-century translation by Adelard from an Arabic version Adelard of Bath was the first to translate Euclid’s Elements in Latin Adelard of Bath has introduced the word « Algorismus » in Latin after his translation of Al Khuwarizmi SMAI/SEE GSI’15 • More than 150 attendees from 15 different countries • 85 scientific presentations on 3 days • 3 keynote speakers • Mathilde MARCOLLI (CallTech): “From Geometry and Physics to Computational Linguistics” • Tudor RATIU (EPFL): “Symmetry methods in geometric mechanics” • Marc ARNAUDON (Bordeaux University): “Stochastic Euler-Poincaré reduction” • 1 Short Course • Chaired by Roger BALIAN • Dominique SPEHNER (Grenoble University): “Geometry on the set of quantum states and quantum correlations” • 1 Guest speaker • Charles-Michel MARLE (UPMC): “Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems” • Social events: • Welcome cocktail at Ecole Polytechnique • Diner in Versailles Palace Gardens GSI’15 Topics • GSI’15 federates skills from Geometry, Probability and Information Theory: • Dimension reduction on Riemannian manifolds • Optimal Transport and applications in Imagery/Statistics • Shape Space & Diffeomorphic mappings • Random Geometry/Homology • Hessian Information Geometry • Topological forms and Information • Information Geometry Optimization • Information Geometry in Image Analysis • Divergence Geometry • Optimization on Manifold • Lie Groups and Geometric Mechanics/Thermodynamics • Computational Information Geometry • Lie Groups: Novel Statistical and Computational Frontiers • Geometry of Time Series and Linear Dynamical systems • Bayesian and Information Geometry for Inverse Problems • Probability Density Estimation GSI’15 Program GSI’15 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 9389 (800 pages), ISBN 978-3-319-25039-7 • http://www.springer.com/us/book/9783319250397 GSI’15 Special Issue • Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI • http://www.mdpi.com/journal/entropy/special_issues/entropy-statistics • A book could be edited by MDPI: e.g. Ecole Polytechnique • Special thanks to « LIX » Department A product of the French Revolution and the Age of Enlightenment, École Polytechnique has a rich history that spans over 220 years. https://www.polytechnique.edu/en/history Henri Poincaré – X1873 Paris-Saclay University in Top 8 World Innovation Hubs http://www.technologyreview.com/news/517626/ infographic-the-worlds-technology-hubs/ A new Grammar of Information “Mathematics is the art of giving the same name to different things” – Henri Poincaré GROUP EVERYWHERE Elie Cartan Henri Poincaré METRIC EVERYWHERE Maurice Fréchet Misha Gromov “the problems addressed by Elie Cartan are among the most important, most abstract and most general dealing with mathematics; group theory is, so to speak, the whole mathematics, stripped of its material and reduced to pure form. This extreme level of abstraction has probably made my presentation a little dry; to assess each of the results, I would have had virtually render him the material which he had been stripped; but this refund can be made in a thousand different ways; and this is the only form that can be found as well as a host of various garments, which is the common link between mathematical theories that are often surprised to find so near” H. Poincaré Elie Cartan: Group Everywhere (Henri Poincaré review of Cartan’s Works) Maurice Fréchet: Metric Everywhere • Maurice Fréchet made major contributions to the topology of point sets and introduced the entire concept of metric spaces. • His dissertation opened the entire field of functionals on metric spaces and introduced the notion of compactness. • He has extended Probability in Metric space 1948 (Annales de l’IHP) Les éléments aléatoires de nature quelconque dans un espace distancié Extension of Probability/Statistic in abstract/Metric space GSI’15 & Geometric Mechanics • The master of geometry during the last century, Elie Cartan, was the son of Joseph Cartan who was the village blacksmith. • Elie recalled that his childhood had passed under “blows of the anvil, which started every morning from dawn”. • We can imagine easily that the child, Elie Cartan, watching his father Joseph “coding curvature” on metal between the hammer and the anvil, insidiously influencing Elie’s mind with germinal intuition of fundamental geometric concepts. • The etymology of the word “Forge”, that comes from the late XIV century, “a smithy”, from Old French forge “forge, smithy” (XII century), earlier faverge, from Latin fabrica “workshop, smith’s shop”, from faber (genitive fabri) “workman in hard materials, smith”. HAMMER = The CoderANVIL = Curvature Libraries Bigorne Bicorne Venus at the Forge of Vulcan, Le Nain Brothers, Musée Saint-Denis, Reims From Homo Sapiens to Homo Faber “Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson Into the Flaming Forge of Vulcan, Diego Velázquez, Museo Nacional del Prado Geometric Thermodynamics & Statistical Physics Enjoy all « Geometries » (Dinner at Versailles Palace Gardens) Restaurant of GSI’15 Gala Dinner André Le Nôtre Landscape Geometer of Versailles the Apex of “Le Jardin à la française” Louis XIV Patron of Science The Royal Academy of Sciences was established in 1666 On 1st September 1715, 300 years ago, Louis XIV passed away at the age of 77, having reigned for 72 years Keynote Speakers Prof. Mathilde MARCOLLI (CALTECH, USA) From Geometry and Physics to Computational Linguistics Abstact: I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic structure and language evolution, within the context of Chomsky's Principles and Parameters framework. Biography: • Laurea in Physics, University of Milano, 1993 • Master of Science, Mathematics, University of Chicago, 1994 • PhD, Mathematics, University of Chicago, 1997 • Moore Instructor, Massachusetts Institute of Technology, 1997-2000 • Associate Professor (C3), Max Planck Institute for Mathematics, 2000-2008 • Professor, California Institute of Technology, 2008-present • Distinguished Visiting Research Chair, Perimeter Institute for Theoretical Physics, 2013-present . Talk chaired by Daniel Bennequin Keynote Speakers Prof. Marc ARNAUDON (Bordeaux University, France) Stochastic Euler-Poincaré reduction Abstact: We will prove a Euler-Poincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the Navier-Stokes equation on a bounded domain and the Camassa-Holm equation. Biography: Marc Arnaudon was born in France in 1965. He graduated from Ecole Normale Supérieure de Paris, France, in 1991. He received the PhD degree in mathematics and the Habilitation à diriger des Recherches degree from Strasbourg University, France, in January 1994 and January 1998 respectively. After postdoctoral research and teaching at Strasbourg, he began in September 1999 a full professor position in the Department of Mathematics at Poitiers University, France, where he was the head of the Probability Research Group. In January 2013 he left Poitiers and joined the Department of Mathematics of Bordeaux University, France, where he is a full professor in mathematics. Talk chaired by Frank Nielsen Keynote Speakers Prof. Tudor RATIU (EPFL, Switzerland) Symmetry methods in geometric mechanics Abstact: The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. Biography: • BA in Mathematics, University of Timisoara, Romania, 1973 • MA in Applied Mathematics, University of Timisoara, Romania, 1974 • Ph.D. in Mathematics, University of California, Berkeley, 1980 • T.H. Hildebrandt Research Assistant Professor, University of Michigan, Ann Arbor, USA 1980-1983 • Associate Professor of Mathematics, University of Arizona, Tuscon, USA 1983- 1988 • Professor of Mathematics, University of California, Santa Cruz, USA, 1988-2001 • Chaired Professor of Mathematics, Ecole Polytechnique Federale de Lausanne, Switzerland, 1998 - present • Professor of Mathematics, Skolkovo Institute of Science and Technonology, Moscow, Russia, 2014 - present Talk chaired by Xavier Pennec Short Course Prof. Dominique SPEHNER (Grenoble University) Geometry on the set of quantum states and quantum correlations Abstact: I will show that the set of states of a quantum system with a finite- dimensional Hilbert space can be equipped with various Riemannian distances having nice properties from a quantum information viewpoint, namely they are contractive under all physically allowed operations on the system. The corresponding metrics are quantum analogs of the Fisher metric and have been classified by D. Petz. Two distances are particularly relevant physically: the Bogoliubov-Kubo-Mori distance studied by R. Balian, Y. Alhassid and H. Reinhardt, and the Bures distance studied by A. Uhlmann and by S.L. Braunstein and C.M. Caves. The latter gives the quantum Fisher information playing an important role in quantum metrology. A way to measure the amount of quantum correlations (entanglement or quantum discord) in bipartite systems (that is, systems composed of two parties) with the help of these distances will be also discussed. Biography: • Diplôme d'Études Approfondies (DEA) in Theoretical Physics at the École Normale Supérieure de Lyon, 1994 • Civil Service (Service National de la Coopération), Technion Institute of Technology, Haifa, Israel, 1995-1996 • PhD in Theoretical Physics, Université Paul Sabatier, Toulouse, France, 1996- 2000. • Postdoctoral fellow, Pontificia Universidad Católica, Santiago, Chile, 2000-2001 • Research Associate, University of Duisburg-Essen, Germany, 2001-2005 • Maître de Conférences, Université Joseph Fourier, Grenoble, France, 2005-present • Habilitation à diriger des Recherches (HDR), Université Grenoble Alpes, 2015 • Member of the Institut Fourier (since 2005) and the Laboratoire de Physique et Modélisation des Milieux Condensés (since 2013) of the university Grenoble Alpes, France Talk chaired by Roger Balian Guest Speakers Prof. Charles-Michel MARLE (UPMC, France) Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems Abstact: I will present some tools in Symplectic and Poisson Geometry in view of their applications in Geometric Mechanics and Mathematical Physics. Lie group and Lie algebra actions on symplectic and Poisson manifolds, momentum maps and their equivariance properties, first integrals associated to symmetries of Hamiltonian systems will be discussed. Reduction methods taking advantage of symmetries will be discussed. Biography: Charles-Michel Marle was born in 1934; He studied at Ecole Polytechnique (1953-1955), Ecole Nationale Supérieure des Mines de Paris (1957-1958) and Ecole Nationale Supérieure du Pétrole et des Moteurs (1957-1958). He obtained a doctor's degree in Mathematics at the University of Paris in 1968. From 1959 to 1969 he worked as a research engineer at the Institut Français du Pétrole. He joined the Université de Besançon as Associate Professor in 1969, and the Université Pierre et Marie Curie, first as Associate Professor (1975) and then as full Professor (1981). His resarch works were first about fluid flows through porous media, then about Differential Geometry, Hamiltonian systems and applications in Mechanics and Mathematical Physics. Talk chaired by Frédéric Barbaresco

Frederic Lavancier, Charles Kervrann

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14259
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_20
Authors = Charles Kervrann, Frederic Lavancier
Keywords =
Abstract
A model of two-type (or two-color) interacting random balls is introduced. Each colored random set is a union of random balls and the interaction relies on the volume of the intersection between the two random sets. This model is motivated by the detection and quantification of co-localization between two proteins. Simulation and inference are discussed. Since all individual balls cannot been identified, e.g. a ball may contain another one, standard methods of inference as likelihood or pseudolikelihood are not available and we apply the Takacs-Fiksel method with a specific choice of test functions.


Voir la vidéo
A two-color interacting random balls model for co-localization analysis of proteins

A testing procedure A model for co-localization Estimation A two-color interacting random balls model for co-localization analysis of proteins. Frédéric Lavancier, Laboratoire de Mathématiques Jean Leray, Nantes INRIA Rennes, Serpico team Joint work with C. Kervrann (INRIA Rennes, Serpico team). GSI’15, 28-30 October 2015. A testing procedure A model for co-localization Estimation Introduction : some data Vesicular trafficking analysis and colocalization quantification by TIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA] ? =⇒ Langerin proteins (left) and Rab11 GTPase proteins (right). Is there colocalization ? ⇔ Is there some spatial dependencies between the two types of proteins ? A testing procedure A model for co-localization Estimation Image pre-processing After segmentation Superposition : ? ⇒ After a Gaussian weights thresholding Superposition : ? ⇒ A testing procedure A model for co-localization Estimation The problem of co-localization can be described as follows : We observe two binary images in a domain Ω : First image (green) : realization of a random set Γ1 ∩ Ω Second image (red) : realization of a random set Γ2 ∩ Ω −→ Is there some dependencies between Γ1 and Γ2 ? −→ If so, can we quantify/model this dependency ? A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A natural measure of departure from independency is ˆp12 − ˆp1 ˆp2 where ˆp1 = |Ω|−1 x∈Ω 1Γ1 (x), ˆp2 = |Ω|−1 x∈Ω 1Γ2 (x), ˆp12 = |Ω|−1 x∈Ω 1Γ1∩Γ2 (x). A testing procedure A model for co-localization Estimation Testing procedure Assume Γ1 and Γ2 are m-dependent stationary random sets. If Γ1 is independent of Γ2, then as |Ω| tends to infinity, T := |Ω| ˆp12 − ˆp1 ˆp2 x∈Ω y∈Ω ˆC1(x − y) ˆC2(x − y) → N(0, 1) where ˆC1 and ˆC2 are the empirical covariance functions of Γ1 ∩ Ω and Γ2 ∩ Ω respectively. Hence to test the null hypothesis of independence between Γ1 and Γ2 p-value = 2(1 − Φ(|T|)) where Φ is the c.d.f. of the standard normal distribution. A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls Independent case (and each color ∼ Poisson) Number of p−values < 0.05 over 100 realizations : 4. A testing procedure A model for co-localization Estimation Some simulations Dependent case (see later for the model) Number of p−values < 0.05 over 100 realizations : 100. A testing procedure A model for co-localization Estimation Some simulations Independent case, larger radii Number of p−values < 0.05 over 100 realizations : 5. A testing procedure A model for co-localization Estimation Some simulations Dependent case, larger radii and "small" dependence Number of p−values < 0.05 over 100 realizations : 97. A testing procedure A model for co-localization Estimation Real Data Depending on the pre-processing : T = 9.9 T = 17 p − value = 0 p − value = 0 A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. Notation : (ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}. → viewed as a marked point, marked by R and i. xi : collection of all marked points with color i. Hence Γi = (ξ,R)i∈xi (ξ, R)i x = x1 ∪ x2 : collection of all marked points. A testing procedure A model for co-localization Estimation Example : three realizations of the reference process A testing procedure A model for co-localization Estimation The model We consider a density on any bounded domain Ω with respect to the reference model f(x) ∝ zn1 1 zn2 2 eθ |Γ1∩ Γ2| where n1 : number of green balls and n2 : number of red balls. This density depends on 3 parameters z1 : rules the mean number of green balls z2 : rules the mean number of red balls θ : interaction parameter. If θ > 0 : attraction (co-localization) between Γ1 and Γ2 If θ = 0 : back to the reference model, up to the intensities (independence between Γ1 and Γ2). A testing procedure A model for co-localization Estimation Simulation Realizations can be generated by a standard birth-death Metropolis-Hastings algorithm. Examples : A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. Issue : The number of balls n1 and n2 is not observed. ⇒ likelihood or pseudo-likelihood based inference is not feasible. = A testing procedure A model for co-localization Estimation An equilibrium equation Consider, for any non-negative function h, C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) and for i = 1, 2, Ii(θ; h) = Rmax Rmin Ω h((ξ, R)i, x) λ((ξ, R)i, x) 2zi dξ µ(dR). Denoting by z∗ 1 , z∗ 2 and θ∗ the true unknown values of the parameters, we know from the Georgii-Nguyen-Zessin equation that for any h E(C(z∗ 1 , z∗ 2 , θ∗ ; h)) = 0. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable How many ? At least K = 3 because 3 parameters to estimate. A testing procedure A model for co-localization Estimation A first possibility : h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} where S(ξ, R) is the sphere {y, ||y − ξ|| = R}. ⇓ ⇓ ⇓ ⇓ A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = ⇒ S(h1) = P(Γ1) (the perimeter of Γ1) A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). Let h3((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2)c then S(h3) = P(Γ1 ∪ Γ2). A testing procedure A model for co-localization Estimation Simulations with test functions h1, h2 and h3 over 100 realizations θ = 0.2 (and small radii) θ = 0.05 (and large radii) Frequency 0.15 0.20 0.25 0.30 05101520 Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 010203040 A testing procedure A model for co-localization Estimation Real Data We assume the law of the radii is uniform on [Rmin, Rmax]. (each image is embedded in [0, 250] × [0, 280]) Rmin = 0.5, Rmax = 2.5 Rmin = 0.5, Rmax = 10 ˆθ = 0.45 ˆθ = 0.03 A testing procedure A model for co-localization Estimation Conclusion The testing procedure allows to detect co-localization between two binary images is easy and fast to implement does not depend too much on the image pre-processing The model for co-localization relies on geometric features (area of intersection) can be fitted by the Takacs-Fiksel method allows to compare the degree of co-localization θ between two pairs of images if the laws of radii are similar

Roman Belavkin

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14261
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_23
Authors = Roman Belavkin
Keywords =
Abstract
Asymmetric information distances are used to define asymmetric norms and quasimetrics on the statistical manifold and its dual space of random variables. Quasimetric topology, generated by the Kullback-Leibler (KL) divergence, is considered as the main example, and some of its topological properties are investigated.


Voir la vidéo
Asymmetric Topologies on Statistical Manifolds

Asymmetric Topologies on Statistical Manifolds Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK GSI2015, October 28, 2015 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 1 / 16 Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 2 / 16 Sources and Consequences of Asymmetry Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 3 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α|(p − q)|, q] ≤ 1} sup x {Ep−q{x} : Eq{e|x| − 1 − |x|} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Practically all other results have to be reconsidered (e.g. Baire category theorem, Alaoglu-Bourbaki, etc). (Cobzas, 2013). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} D∗[x, 0] = ex − 1 − x, z Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} 0 /∈ Int(dom Eq⊗p{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Method: Symmetric Sandwich Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 8 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µM◦ ≤ µ(−M◦ ) ∨ µM◦ µ(−M)co ∧ µM ≤ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µ(−M◦ )co ∧ µM◦ ≤ µM◦ µM ≤ µ(−M) ∨ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Results Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 11 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Proof. u ϕ+ ≤ u|ϕ (resp. x ϕ− ≤ x|ϕ) implies (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is finer than normed space (Y, · ϕ+) (resp. (X, · ∗ ϕ−)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Proof. ϕ+(u) = (1 + |u|) ln(1 + |u|) − |u| ∈ ∆2 (resp. ϕ∗ −(x) = e−|x| − 1 + |x| ∈ ∆2). Note that ϕ− /∈ ∆2 and ϕ∗ + /∈ ∆2. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Proof. ρs(y, z) = z − y|ϕ ∨ y − z|ϕ ≤ y − z ϕ−, where (Y, · ϕ−) is Banach. Then use theorems of Reilly et al. (1982) and Chen et al. (2007). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Other asymmetric information distances (e.g. Renyi divergence). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 References Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16 Results Borodin, P. A. (2001). The Banach-Mazur theorem for spaces with asymmetric norm. Mathematical Notes, 69(3–4), 298–305. Chen, S.-A., Li, W., Zou, D., & Chen, S.-B. (2007, Aug). Fixed point theorems in quasi-metric spaces. In Machine learning and cybernetics, 2007 international conference on (Vol. 5, p. 2499-2504). IEEE. Cobzas, S. (2013). Functional analysis in asymmetric normed spaces. Birkh¨auser. Fletcher, P., & Lindgren, W. F. (1982). Quasi-uniform spaces (Vol. 77). New York: Marcel Dekker. Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982). Cauchy sequences in quasi-pseudo-metric spaces. Monatshefte f¨ur Mathematik, 93, 127–140. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16

Pierre Calka

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14355
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_22
Authors = Pierre Calka
Keywords =
Abstract
Random polytopes have constituted some of the central objects of stochastic geometry for more than 150 years. They are in general generated as convex hulls of a random set of points in the Euclidean space. The study of such models requires the use of ingredients coming from both convex geometry and probability theory. In the last decades, the study has been focused on their asymptotic properties and in particular expectation and variance estimates. In several joint works with Tomasz Schreiber and J. E. Yukich, we have investigated the scaling limit of several models (uniform model in the unit-ball, uniform model in a smooth convex body, Gaussian model) and have deduced from it limiting variances for several geometric characteristics including the number of k-dimensional faces and the volume. In this paper, we survey the most recent advances on these questions and we emphasize the particular cases of random polytopes in the unit-ball and Gaussian polytopes.


Voir la vidéo
Asymptotic properties of random polytopes

Asymptotic properties of random polytopes Pierre Calka 2nd conference on Geometric Science of Information ´Ecole Polytechnique, Paris-Saclay, 28 October 2015 default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Joint work with Joseph Yukich (Lehigh University, USA) & Tomasz Schreiber (Toru´n University, Poland) default Outline Random polytopes: an overview Uniform polytopes Gaussian polytopes Expectation asymptotics Main results: variance asymptotics Sketch of proof: Gaussian case default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K50, K ball K50, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K100, K ball K100, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K500, K ball K500, K square default Uniform polytopes Poissonian model K := convex body of Rd Pλ, λ > 0:= Poisson point process of intensity measure λdx Kλ := Conv(Pλ ∩ K) K500, K ball K500, K square default Gaussian polytopes Binomial model Φd (x) := 1 (2π)d/2 e− x 2/2, x ∈ Rd, d ≥ 2 (Xk, k ∈ N∗):= independent and with density Φd Kn := Conv(X1, · · · , Xn) Poissonian model Pλ, λ > 0:= Poisson point process of intensity measure λΦd(x)dx Kλ := Conv(Pλ) default Gaussian polytopes K50 K100 K500 default Gaussian polytopes: spherical shape K50 K100 K500 default Asymptotic spherical shape of the Gaussian polytope Geffroy (1961) : dH(Kn, B(0, 2 log(n))) → n→∞ 0 a.s. K50000 default Expectation asymptotics Considered functionals fk(·) := number of k-dimensional faces, 0 ≤ k ≤ d Vol(·) := volume B. Efron’s relation (1965): Ef0(Kn) = n 1 − EVol(Kn−1) Vol(K) Uniform polytope, K smooth E[fk(Kλ)] ∼ λ→∞ cd,k ∂K κ 1 d+1 s ds λ d−1 d+1 κs := Gaussian curvature of ∂K Uniform polytope, K polytope E[fk(Kλ)] ∼ λ→∞ c′ d,kF(K) logd−1 (λ) F(K) := number of flags of K Gaussian polytope E[fk(Kλ)] ∼ λ→∞ c′′ d,k log d−1 2 (λ) A. R´enyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Affentranger & R. Schneider (1992) default Outline Random polytopes: an overview Main results: variance asymptotics Uniform model, K smooth Uniform model, K polytope Gaussian model Sketch of proof: Gaussian case default Uniform model, K smooth K := convex body of Rd with volume 1 and with a C3 boundary κ := Gaussian curvature of ∂K lim λ→∞ λ−(d−1)/(d+1) Var[fk(Kλ)] = ck,d ∂K κ(z)1/(d+1) dz lim λ→∞ λ(d+3)/(d+1) Var [Vol(Kλ)] = c′ d ∂K κ(z)1/(d+1) dz (ck,d , c′ d explicit positive constants) M. Reitzner (2005): Var[fk (Kλ)] = Θ(λ(d−1)/(d+1) ) default Uniform model, K polytope K := simple polytope of Rd with volume 1 i.e. each vertex of K is included in exactly d facets. lim λ→∞ log−(d−1) (λ)Var[fk(Kλ)] = cd,kf0(K) lim λ→∞ λ2 log−(d−1) (λ)Var[Vol(Kλ)] = c′ d,k f0(K) (ck,d , c′ k,d explicit positive constants) I. B´ar´any & M. Reitzner (2010): Var[fk (Kλ)] = Θ(log(d−1) (λ)) default Gaussian model lim λ→∞ log− d−1 2 (λ)Var[fk(Kλ)] = ck,d lim λ→∞ log−k+ d+3 2 (λ)Var[Vol(Kλ)] = c′ k,d E Vol(Kλ) Vol(B(0, 2 log(n))) = λ→∞ 1 − d log(log(λ)) 4 log(λ) + O 1 log(λ) (ck,d , c′ k,d explicit positive constants) D. Hug & M. Reitzner (2005), I. B´ar´any & V. Vu (2007): Var[fk (Kλ)] = Θ(log(d−1)/2 (λ)) default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Calculation of the expectation of fk(Kλ) Calculation of the variance of fk(Kλ) Scaling transform default Calculation of the expectation of fk(Kλ) 1. Decomposition: E[fk(Kλ)] = E   x∈Pλ ξ(x, Pλ)   ξ(x, Pλ) := 1 k+1 #k-face containing x if x extreme 0 if not 2. Mecke-Slivnyak formula E[fk(Kλ)] = λ E[ξ(x, Pλ ∪ {x})]Φd (x)dx 3. Limit of the expectation of one score default Calculation of the variance of fk(Kλ) Var[fk (Kλ)] = E   x∈Pλ ξ2 (x, Pλ) + x=y∈Pλ ξ(x, Pλ)ξ(y, Pλ)   − (E[fk (Kλ)]) 2 = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 E[ξ(x, Pλ ∪ {x, y})ξ(y, Pλ ∪ {x, y})]Φd (x)Φd (y)dxdy − λ2 E[ξ(x, Pλ ∪ {x})]E[ξ(y, Pλ ∪ {y})]Φd (x)Φd (y)dxdy = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 ”Cov”(ξ(x, Pλ ∪ {x}), ξ(y, Pλ ∪ {y}))Φd (x)Φd (y)dxdy default Scaling transform Question : Limits of E[ξ(x, Pλ)] and ”Cov”(ξ(x, Pλ), ξ(y, Pλ)) ? Answer : definition of limit scores in a new space ◮ Critical radius Rλ := 2 log λ − log(2 · (2π)d · log λ) ◮ Scaling transform : Tλ : Rd \ {0} −→ Rd−1 × R x −→ Rλ exp−1 d−1 x |x|, R2 λ(1 − |x| Rλ ) expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1 ◮ Image of a score : ξ(λ)(Tλ(x), Tλ(Pλ)) := ξ(x, Pλ) ◮ Convergence of Pλ : Tλ(Pλ) D → P o`u P : Poisson point process in Rd−1 × R of intensity measure ehdvdh default Action of the scaling transform Π↑ := {(v, h) ∈ Rd−1 × R : h ≥ v 2 2 } Π↓ := {(v, h) ∈ Rd−1 × R : h ≤ − v 2 2 } Half-space Translate of Π↓ Sphere containing O Translate of ∂Π↑ Convexity Parabolic convexity Extreme point (x + Π↑) not fully covered k-face of Kλ Parabolic k-face RλVol Vol default Limiting picture Ψ := x∈P(x + Π↑) In red : image of the balls of diameter [0, x] where x is extreme default Limiting picture Φ := x∈Rd−1×R:x+Π↓∩P=∅(x + Π↓) In green : image of the boundary of the convex hull Kλ default Thank you for your attention!

Laurent Decreusefond, Aurélien Vasseur

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14260
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_21
Authors = Aurélien Vasseur, Laurent Decreusefond
Keywords = Ginibre point process, Poisson point process, Stein’s method, Stochastic geometry, β-Ginibre point process
Abstract
The characteristic independence property of Poisson point processes gives an intuitive way to explain why a sequence of point processes becoming less and less repulsive can converge to a Poisson point process. The aim of this paper is to show this convergence for sequences built by superposing, thinning or rescaling determinantal processes. We use Papangelou intensities and Stein’s method to prove this result with a topology based on total variation distance.


Voir la vidéo
Asymptotics of superposition of point processes

I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications 2nd conference on Geometric Science of Information Aurélien VASSEUR Asymptotics of some Point Processes Transformations Ecole Polytechnique, Paris-Saclay, October 28, 2015 1/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Mobile network in Paris - Motivation −2000 0 2000 4000 100020003000 −2000 0 2000 4000 100020003000 Figure: On the left, positions of all BS in Paris. On the right, locations of BS for one frequency band. 2/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Table of Contents I-Generalities on point processes Correlation function, Papangelou intensity and repulsiveness Determinantal point processes II-Kantorovich-Rubinstein distance Convergence dened by dKR dKR(PPP, Φ) ≤ "nice" upper bound III-Applications to transformations of point processes Superposition Thinning Rescaling 3/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Framework Y a locally compact metric space µ a diuse and locally nite measure of reference on Y NY the space of congurations on Y NY the space of nite congurations on Y 4/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Correlation function - Papangelou intensity Correlation function ρ of a point process Φ: E[ α∈NY α⊂Φ f (α)] = +∞ k=0 1 k! ˆ Yk f · ρ({x1, . . . , xk})µ(dx1) . . . µ(dxk) ρ(α) ≈ probability of nding a point in at least each point of α Papangelou intensity c of a point process Φ: E[ x∈Φ f (x, Φ \ {x})] = ˆ Y E[c(x, Φ)f (x, Φ)]µ(dx) c(x, ξ) ≈ conditionnal probability of nding a point in x given ξ 5/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Point process Properties Intensity measure: A ∈ FY → ´ A ρ({x})µ(dx) ρ({x}) = E[c(x, Φ)] If Φ is nite, then: IP(|Φ| = 1) = ˆ Y c(x, ∅)µ(dx) IP(|Φ| = 0). 6/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Poisson point process Properties Φ PPP with intensity M(dy) = m(y)dy Correlation function: ρ(α) = x∈α m(x) Papangelou intensity: c(x, ξ) = m(x) 7/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Repulsive point process Denition Point process repulsive if φ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ) Point process weakly repulsive if c(x, ξ) ≤ c(x, ∅) 8/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Determinantal point process Denition Determinantal point process DPP(K, µ): ρ({x1, · · · , xk}) = det(K(xi , xj ), 1 ≤ i, j ≤ k) Proposition Papangelou intensity of DPP(K, µ): c(x0, {x1, · · · , xk}) = det(J(xi , xj ), 0 ≤ i, j ≤ k) det(J(xi , xj ), 1 ≤ i, j ≤ k) where J = (I − K)−1K. 9/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Ginibre point process Denition Ginibre point process on B(0, R): K(x, y) = 1 π e−1 2 (|x|2 +|y|2 ) exy 1{x∈B(0,R)}1{y∈B(0,R)} β-Ginibre point process on B(0, R): Kβ(x, y) = 1 π e − 1 2β (|x|2 +|y|2 ) e 1 β xy 1{x∈B(0,R)} 1{y∈B(0,R)} 10/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process β-Ginibre point processes 11/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Kantorovich-Rubinstein distance Total variation distance: dTV(ν1, ν2) := sup A∈FY ν1(A),ν2(A)<∞ |ν1(A) − ν2(A)| F : NY → IR is 1-Lipschitz (F ∈ Lip1) if |F(φ1) − F(φ2)| ≤ dTV (φ1, φ2) for all φ1, φ2 ∈ NY Kantorovich-Rubinstein distance: dKR(IP1, IP2) = sup F∈Lip1 ˆ NY F(φ) IP1(dφ) − ˆ NY F(φ) IP2(dφ) Convergence in K.-R. distance =⇒ strictly Convergence in law 12/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Upper bound theorem Theorem (L. Decreusefond, AV) Φ a nite point process on Y ζM a PPP with nite control measure M(dy) = m(y)µ(dy). Then, we have: dKR(IPΦ, IPζM ) ≤ ˆ Y ˆ NY |m(y) − c(y, φ)|IPΦ(dφ)µ(dy). 13/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Φn,1, . . . , Φn,n: n independent, nite and weakly repulsive point processes on Y Φn := n i=1 Φn,i Rn := ´ Y | n i=1 ρn,i (x) − m(x)|µ(dx) ζM a PPP with control measure M(dx) = m(x)µ(dx) 14/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Proposition (LD, AV) Φn = n i=1 Φn,i ζM a PPP with control measure M(dx) = m(x)µ(dx) dKR(IPΦn , IPζM ) ≤ Rn + max 1≤i≤n ˆ Y ρn,i (x)µ(dx) 15/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Consequence Corollary (LD, AV) f pdf on [0; 1] such that f (0+) := limx→0+ f (x) ∈ IR Λ compact subset of IR+ X1, . . . , Xn i.i.d. with pdf fn = 1 n f (1 n ·) Φn = {X1, . . . , Xn} ∩ Λ dKR(Φn, ζ) ≤ ˆ Λ f 1 n x − f (0+) dx + 1 n ˆ Λ f 1 n x dx where ζ is the PPP(f (0+)) reduced to Λ. 16/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning β-Ginibre point processes Proposition (LD, AV) Φn the βn-Ginibre process reduced to a compact set Λ ζ the PPP with intensity 1/π on Λ dKR(IPΦn , IPζ) ≤ Cβn 17/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Kallenberg's theorem Theorem (O. Kallenberg) Φn a nite point process on Y pn : Y → [0; 1) uniformly −−−−−→ 0 Φn the pn-thinning of Φn γM a Cox process (pnΦn) law −−→ M ⇐⇒ (Φn) law −−→ γM 18/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Polish distance (fn) a sequence in the space of real continuous functions with compact support generating FY d∗(ν1, ν2) = n≥1 1 2n Ψ(|ν1(fn) − ν2(fn)|) with Ψ(x) = x 1 + x d∗ KR the Kantorovich-Rubinstein distance associated to the distance d∗ 19/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Thinned point processes Proposition (LD, AV) Φn a nite point process on Y pn : Y → [0; 1) Φn the pn-thinning of Φn γM a Cox process Then, we have: d∗ KR(IPΦn , IPγM ) ≤ 2E[ x∈Φn p2 n(x)] + d∗ KR(IPM, IPpnΦn ). 20/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning References L.Decreusefond, and A.Vasseur, Asymptotics of superposition of point processes, 2015. H.O. Georgii, and H.J. Yoo, Conditional intensity and gibbsianness of determinantal point processes, J. Statist. Phys. (118), January 2004. J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P. Martins, and Wei Chen, A Case Study on Regularity in Cellular Network Deployment, IEEE Wireless Communications Letters, 2015. A.F. Karr, Point Processes and their Statistical Inference, Ann. Probab. 15 (1987), no. 3, 12261227. 21/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Thank you ... ... for your attention. Questions? 22/22 Aurélien VASSEUR Télécom ParisTech

Nicolas Chenavier

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14258
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_19
Authors = Nicolas Chenavier
Keywords = Extreme values, Poisson point process, Random tessellations
Abstract
Let m be a random tessellation in R d , d ≥ 1, observed in the window W p = ρ1/d[0, 1] d , ρ > 0, and let f be a geometrical characteristic. We investigate the asymptotic behaviour of the maximum of f(C) over all cells C ∈ m with nucleus W p as ρ goes to infinity.When the normalized maximum converges, we show that its asymptotic distribution depends on the so-called extremal index. Two examples of extremal indices are provided for Poisson-Voronoi and Poisson-Delaunay tessellations.


Voir la vidéo
The extremal index for a random tessellation

Random tessellations Main problem Extremal index The extremal index for a random tessellation Nicolas Chenavier Université Littoral Côte d’Opale October 28, 2015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Plan 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Random tessellations Definition A (convex) random tessellation m in Rd is a partition of the Euclidean space into random polytopes (called cells). We will only consider the particular case where m is a : Poisson-Voronoi tessellation ; Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Voronoi tessellation X, Poisson point process in Rd ; ∀x ∈ X, CX(x) := {y ∈ Rd , |y − x| ≤ |y − x |, x ∈ X} (Voronoi cell with nucleus x) ; mPVT := {CX(x), x ∈ X}, Poisson-Voronoi tessellation ; ∀CX(x) ∈ mPVT , we let z(CX(x)) := x. x CX(x) Mosaique de Poisson-Voronoi Figure: Poisson-Voronoi tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Delaunay tessellation X, Poisson point process in Rd ; ∀x, x ∈ X, x and x define an edge if CX(x) ∩ CX(x ) = ∅ ; mPDT , Poisson-Delaunay tessellation ; ∀C ∈ mPDT , we let z(C) as the circumcenter of C. x x z(C) Mosaique de Poisson-Delaunay Figure: Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Typical cell Definition Let m be a stationary random tessellation. The typical cell of m is a random polytope C in Rd which distribution given as follows : for each bounded translation-invariant function g : {polytopes} → R, we have E [g(C)] := 1 N(B) E     C∈m, z(C)∈B g(C)     , where : B ⊂ R is any Borel subset with finite and non-empty volume ; N(B) is the mean number of cells with nucleus in B. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Main problem Framework : m = mPVT , mPDT ; Wρ := [0, ρ]d , with ρ > 0 ; g : {polytopes} → R, geometrical characteristic. Aim : asymptotic behaviour, when ρ → ∞, of Mg,ρ = max C∈m, z(C)∈Wρ g(C)? Figure: Voronoi cell maximizing the area in the square. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Objective and applications Objective : find ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρt + bg,ρ converges, as ρ → ∞, for each t ∈ R. Applications : regularity of the tessellation ; discrimination of point processes and tessellations ; Poisson-Voronoi approximation. Approximation de Poisson-Voronoi Figure: Poisson-Voronoi approximation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Asymptotics under a local correlation condition Notation : let vρ := ag,ρt + bρ be a threshold such that ρd · P (g(C) > vρ) −→ ρ→∞ τ, for some τ := τ(t) ≥ 0. Local Correlation Condition (LCC) ρd (log ρ)d · E      (C1,C2)=∈m2, z(C1),z(C2)∈[0,log ρ]d 1g(C1)>vρ,g(C2)>vρ      −→ ρ→∞ 0. Theorem Under (LCC), we have : P (Mg,ρ ≤ vρ) −→ ρ→∞ e−τ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Definition of the extremal index Proposition Assume that for all τ ≥ 0, there exists a threshold v (τ) ρ depending on ρ such that ρd · P(g(C) > v (τ) ρ ) −→ ρ→∞ τ. Then there exists θ ∈ [0, 1] such that, for all τ ≥ 0, lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ , provided that the limit exists. Definition According to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if, for each τ ≥ 0, we have : ρd · P g(C) > v(τ) ρ −→ ρ→∞ τ and lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 1 Framework : m := mPVT : Poisson-Voronoi tessellation ; g(C) := r(C) : inradius of any cell C := CX(x) with x ∈ X, i.e. r(C) := r (CX(x)) := max{r ∈ R+ : B(x, r) ⊂ CX(x)}. rmin,PVT (ρ) := minx∈X∩Wρ r (CX(x)). Extremal index : θ = 1/2 for each d ≥ 1. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Minimum of inradius for a Poisson-Voronoi tessellation (b) Typical Poisson−Voronoï cell with a small inradii x y −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 2 Framework : m := mPDT : Poisson-Delaunay tessellation ; g(C) := R(C) : circumradius of any cell C, i.e. R(C) := min{r ∈ R+ : B(x, r) ⊃ C}. Rmax,PDT (ρ) := maxC∈mPDT :z(C)∈Wρ R(C). Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Maximum of circumradius for a Poisson-Delaunay tessellation (d) Typical Poisson−Delaunay cell with a large circumradii x y −15 −10 −5 0 5 10 15 −15−10−5051015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Work in progress Joint work with C. Robert (ISFA, Lyon 1) : new characterization of the extremal index (not based on classical block and run estimators appearing in the classical Extreme Value Theory) ; simulation and estimation for the extremal index and cluster size distribution (for Poisson-Voronoi and Poisson-Delaunay tessellations). Nicolas Chenavier The extremal index for a random tessellation

Frank Nielsen, Gaëtan Hadjeres

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14264
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_63
Authors = Frank Nielsen, Gaëtan Hadjeres
Keywords =
Abstract
We generalize the O(dnϵ2)-time (1 + ε)-approximation algorithm for the smallest enclosing Euclidean ball [2,10] to point sets in hyperbolic geometry of arbitrary dimension. We guarantee a O(1/ϵ2) convergence time by using a closed-form formula to compute the geodesic α-midpoint between any two points. Those results allow us to apply the hyperbolic k-center clustering for statistical location-scale families or for multivariate spherical normal distributions by using their Fisher information matrix as the underlying Riemannian hyperbolic metric.


Voir la vidéo
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen1 Ga¨etan Hadjeres2 ´Ecole Polytechnique 1 Sony Computer Science Laboratories, Inc 1,2 Conference on Geometric Science of Information c 2015 Frank Nielsen - Ga¨etan Hadjeres 1 The Minimum Enclosing Ball problem Finding the Minimum Enclosing Ball (or the 1-center) of a finite point set P = {p1, . . . , pn} in the metric space (X, dX (., .)) consists in finding c ∈ X such that c = argminc ∈X max p∈P dX (c , p) Figure : A finite point set P and its minimum enclosing ball MEB(P) c 2015 Frank Nielsen - Ga¨etan Hadjeres 2 The approximating minimum enclosing ball problem In a euclidean setting, this problem is well-defined: uniqueness of the center c∗ and radius R∗ of the MEB computationally intractable in high dimensions. We fix an > 0 and focus on the Approximate Minimum Enclosing Ball problem of finding an -approximation c ∈ X of MEB(P) such that dX (c, p) ≤ (1 + )R∗ ∀p ∈ P. c 2015 Frank Nielsen - Ga¨etan Hadjeres 3 The approximating minimum enclosing ball problem: prior work Approximate solution in the euclidean case are given by Badoiu and Clarkson’s algorithm [Badoiu and Clarkson, 2008]: Initialize center c1 ∈ P Repeat 1/ 2 times the following update: ci+1 = ci + fi − ci i + 1 where fi ∈ P is the farthest point from ci . How to deal with point sets whose underlying geometry is not euclidean ? c 2015 Frank Nielsen - Ga¨etan Hadjeres 4 The approximating minimum enclosing ball problem: prior work This algorithm has been generalized to dually flat manifolds [Nock and Nielsen, 2005] Riemannian manifolds [Arnaudon and Nielsen, 2013] Applying these results to hyperbolic geometry give the existence and uniqueness of MEB(P), but give no explicit bounds on the number of iterations assume that we are able to precisely cut geodesics. c 2015 Frank Nielsen - Ga¨etan Hadjeres 5 The approximating minimum enclosing ball problem: our contribution We analyze the case of point sets whose underlying geometry is hyperbolic. Using a closed-form formula to compute geodesic α-midpoints, we obtain a intrinsic (1 + )-approximation algorithm to the approximate minimum enclosing ball problem a O(1/ 2) convergence time guarantee a one-class clustering algorithm for specific subfamilies of normal distributions using their Fisher information metric c 2015 Frank Nielsen - Ga¨etan Hadjeres 6 Model of d-dimensional hyperbolic geometry: The Poincar´e ball model The Poincar´e ball model (Bd , ρ(., .)) consists in the open unit ball Bd = {x ∈ Rd : x < 1} together with the hyperbolic distance ρ (p, q) = arcosh 1 + 2 p − q 2 (1 − p 2) (1 − q 2) , ∀p, q ∈ Bd . This distance induces on the metric space (Bd , ρ) a Riemannian structure. c 2015 Frank Nielsen - Ga¨etan Hadjeres 7 Geodesics in the Poincar´e ball model Shorter paths between two points (geodesics) are exactly straight (euclidean) lines passing through the origin circle arcs orthogonal to the unit sphere Figure : “Straight” lines in the Poincar´e ball model c 2015 Frank Nielsen - Ga¨etan Hadjeres 8 Circles in the Poincar´e ball model Circles in the Poincar´e ball model look like euclidean circles but with different center Figure : Difference between euclidean MEB (in blue) and hyperbolic MEB (in red) for the set of blue points in hyperbolic Poincar´e disk (in black). The red cross is the hyperbolic center of the red circle while the pink one is its euclidean center. c 2015 Frank Nielsen - Ga¨etan Hadjeres 9 Translations in the Poincar´e ball model Tp (x) = 1 − p 2 x + x 2 + 2 x, p + 1 p p 2 x 2 + 2 x, p + 1 Figure : Tiling of the hyperbolic plane by squares c 2015 Frank Nielsen - Ga¨etan Hadjeres 10 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . For the special case p = (0, . . . , 0), q = (xq, 0, . . . , 0), we have p#αq := (xα, 0, . . . , 0) with xα = cα,q − 1 cα,q + 1 , where cα,q := eαρ(p,q) = 1 + xq 1 − xq α . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints Noting that p#αq = Tp (T−p (p) #αT−p (q)) ∀p, q ∈ Bd we obtain a closed-form formula for computing p#αq how to compute p#αq in linear time O(d) that these transformations are exact. c 2015 Frank Nielsen - Ga¨etan Hadjeres 12 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius For a fixed radius r > R∗, we can find c ∈ Bd such that ρ (c, P) ≤ (1 + )r ∀p ∈ P with Algorithm 1: (1 + )-approximation of EHB(P, r) 1: c0 := p1 2: t := 0 3: while ∃p ∈ P such that p /∈ B (ct, (1 + ) r) do 4: let p ∈ P be such a point 5: α := ρ(ct ,p)−r ρ(ct ,p) 6: ct+1 := ct#αp 7: t := t+1 8: end while 9: return ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 13 Idea of the proof By the hyperbolic law of cosines : ch (ρt) ≥ ch (h) ch (ρt+1) ch (ρ1) ≥ ch (h)T ≥ ch ( r)T . ct+1 ct c∗ pt h > r ρt+1 ρt r ≤ rr θ θ Figure : Update of ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 14 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. Our error with the true MEHB center c∗ verifies ρ (c, c∗ ) ≤ arcosh ch ((1 + ) r) ch (R∗) c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. This suggests to implement a dichotomic search in order to compute an approximation of the minimal hyperbolic enclosing ball. We obtain a O(1 + + 2/4)-approximation of MEHB(P) in O N 2 log 1 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) algorithm Algorithm 2: (1 + )-approximation of MEHB(P) 1: c := p1 2: rmax := ρ (c, P); rmin = rmax 2 ; tmax := +∞ 3: r := rmax; 4: repeat 5: ctemp := Alg1 P, r, 2 , interrupt if t > tmax in Alg1 6: if call of Alg1 has been interrupted then 7: rmin := r 8: else 9: rmax := r ; c := ctemp 10: end if 11: dr := rmax−rmin 2 ; r := rmin + dr ; tmax := log(ch(1+ /2)r)−log(ch(rmin)) log(ch(r /2)) 12: until 2dr < rmin 2 13: return c c 2015 Frank Nielsen - Ga¨etan Hadjeres 17 Experimental results The number of iterations does not depend on d. Figure : Number of α-midpoint calculations as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 18 Experimental results The running time is approximately O(dn 2 ) (vertical translation in logarithmic scale). Figure : execution time as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 19 Applications Hyperbolic geometry arises when considering certain subfamilies of multivariate normal distributions. For instance, the following subfamilies N µ, σ2In of n-variate normal distributions with scalar covariance matrix (In is the n × n identity matrix), N µ, diag σ2 1, . . . , σ2 n of n-variate normal distributions with diagonal covariance matrix N(µ0, Σ) of d-variate normal distributions with fixed mean µ0 and arbitrary positive definite covariance matrix Σ are statistical manifolds whose Fisher information metric is hyperbolic. c 2015 Frank Nielsen - Ga¨etan Hadjeres 20 Applications In particular, our results apply to the two-dimensional location-scale subfamily: Figure : MEHB (D) of probability density functions (left) in the (µ, σ) superior half-plane (right). P = {A, B, C}. c 2015 Frank Nielsen - Ga¨etan Hadjeres 21 Openings Plugging the EHB and MEHB algorithms to compute clusters centers in the approximation algorithm by [Gonzalez, 1985], we obtain approximate algorithms for covering in hyperbolic spaces the k-center problem in O kNd 2 log 1 c 2015 Frank Nielsen - Ga¨etan Hadjeres 22 Algorithm 3: Gonzalez farthest-first traversal approximation algo- rithm 1: C1 := P, i = 0 2: while i ≤ k do 3: ∀j ≤ i, compute cj := MEB(Cj ) 4: ∀j ≤ i, set fj := argmaxp∈P ρ(p, cj ) 5: find f ∈ {fj } whose distance to its cluster center is maximal 6: create cluster Ci containing f 7: add to Ci all points whose distance to f is inferior to the distance to their cluster center 8: increment i 9: end while 10: return {Ci }i c 2015 Frank Nielsen - Ga¨etan Hadjeres 23 Openings The computation of the minimum enclosing hyperbolic ball does not necessarily involve all points p ∈ P. Core-sets in hyperbolic geometry the MEHB obtained by the algorithm is an -core-set differences with the euclidean setting: core-sets are of size at most 1/ [Badoiu and Clarkson, 2008] c 2015 Frank Nielsen - Ga¨etan Hadjeres 24 Thank you! c 2015 Frank Nielsen - Ga¨etan Hadjeres 25 Bibliography I Arnaudon, M. and Nielsen, F. (2013). On approximating the Riemannian 1-center. Computational Geometry, 46(1):93–104. Badoiu, M. and Clarkson, K. L. (2008). Optimal core-sets for balls. Comput. Geom., 40(1):14–22. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306. Nock, R. and Nielsen, F. (2005). Fitting the smallest enclosing Bregman ball. In Machine Learning: ECML 2005, pages 649–656. Springer. c 2015 Frank Nielsen - Ga¨etan Hadjeres 26

Germain Van Bever, Radka Sabolova, Frank Critchley, Paul Marriott

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International

Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 28-30 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of affine connections { −1 , +1 } (mixture and exponential connections). These allow to define notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two affine structures (maps on M) are used simultaneously: -1: Mixture affine geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential affine geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifold-based systems to simplex-based geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efficient algorithms in a computational framework. It is inherently finite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in -1-space π1 π2 -6 -4 -2 0 2 4 6 -6-4-20246 mixed geodesics in +1-space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the log-likelihood function. The log-likelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1-representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an infinite-dimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {X|X ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2-simplex. Right: Embedded binomial (k = 3) in the 3-simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefficient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates well-known difficulties: Identifiability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore defined on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1-geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)-dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efficient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows low-dimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a n-dim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with low-dimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1-convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1-convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efficient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple one-parameter binomial [...] models generally provides poor fits to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Define the norm ||π||π0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Define the distance function as d(π(θ), π0) := inf π∈∪Si ||π(θ) − π||π0 . Theorem (Anaya-Izquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N||(ˆπG − ˆπNP MLE )||ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary High-dimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, S-I (1985), Differential-geometrical methods in statistics, Springer-Verlag. Anaya-Izquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 1197-1224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 2454-2471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582-600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 69-76. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 77-93. Germain Van Bever CIG for mixtures 19/19

Vahed Maroufy, Paul Marriott

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14263
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_62
Authors = Paul Marriott, Vahed Maroufy
Keywords = Computational information geometry, Computing boundaries, Embedded manifolds, Local mixture models, Polytopes, Ruled and developable surfaces
Abstract
Local mixture models give an inferentially tractable but still flexible alternative to general mixture models. Their parameter space naturally includes boundaries; near these the behaviour of the likelihood is not standard. This paper shows how convex and differential geometries help in characterising these boundaries. In particular the geometry of polytopes, ruled and developable surfaces is exploited to develop efficient inferential algorithms.


Voir la vidéo
Computing Boundaries in Local Mixture Models

Computing Boundaries in Local Mixture Models Computing Boundaries in Local Mixture Models Vahed Maroufy & Paul Marriott Department of Statistics and Actuarial Science University of Waterloo October 28 GSI 2015, Paris Computing Boundaries in Local Mixture Models Outline Outline 1 Influence of boundaries on parameter inference 2 Local mixture models (LMM) 3 Parameter space and boundaries Hard boundaries and Soft boundaries 4 Computing the boundaries for LMMs 5 Summary and future direction Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation The costs for all these good properties and flexibility are Hard boundary =⇒ Positivity (boundary of Λµ) Soft boundary =⇒ Mixture behavior We compute them for two models here: Poisson and Normal We fix k = 4 Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Normal model let y = x−µ σ2 Λµ = {λ | (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}. We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Ruled and developable surfaces Definition Ruled surface: Γ(x, γ) = α(x) + γ · β(x), x ∈ I ⊂ R, γ ∈ Rk Developable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Definition The family of planes, A = {λ ∈ R3 | a(x) · λ + d(x) = 0, x ∈ R}, each determined by an x ∈ R, is called a one-parameter infinite family of planes. Each element of the set {λ ∈ R3 |a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R} is called a characteristic line of the surface at x and the union is called the envelope of the family. A characteristic line is the intersection of two consecutive planes The envelope is a developable surface Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Hard boundary of for Normal LMM (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 = 0, ∀y ∈ R . λ2 λ3 λ4 λ4 λ3 λ2 Figure : Left: The hard boundary for the normal LMM (shaded) as a subset of a self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Soft boundary of for Normal LMM recap : Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). For visualization purposes let k = 3, (µ ∈ M, fix σ) M3(f ) = (µ, µ2 + σ2 , µ3 + 3µσ2 ), M3(g) = (µ, µ2 + σ2 + 2λ2, µ3 + 3µσ2 + 6µλ2 + 6λ3). Figure : the 3-D curve ϕ(µ); Middle: the bounding ruled surface γa(µ, u); Right: the convex subspace restricted to soft boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Ruled surface parametrization Two boundary surfaces, each constructed by a curve and a set of lines attached to it. γa(µ, u) = ϕ(µ) + u La(µ) γb(µ, u) = ϕ(µ) + u Lb(µ) where for M = [a, b] and ϕ(µ) = M3(f ) La(µ): lines between ϕ(a) and ϕ(µ) Lb(µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture Models Summary Summary Understanding these boundaries is important if we want to exploit the nice statistical properties of LMM The boundaries described in this paper have both discrete aspects and smooth aspects The two example discussed represent the structure for almost all exponential family models It is a interesting problem to design optimization algorithms on these boundaries for finding boundary maximizers of likelihood Computing Boundaries in Local Mixture Models References Anaya-Izquierdo, K., Critchley, F., and Marriott, P. (2013). when are first order asymptotics adequate? a diagnostic. Stat, 3(1):17–22. Anaya-Izquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640. Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, rnt078. Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20. Boroczky, K. and Fodor, F. (2008). Approximating 3-dimensional convex bodies by polytopes with a restricted number of edges. Contributions to Algebra and Geometry, 49(1):177–193. Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272. Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal of Statistics, 3:259–289. Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of Differential Geometry, 57(2):239–271. Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492. Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93. Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture Models END Thank You

Emmanuel Kalunga, Sylvain Chevallier, Quentin Barthélemy, Karim Djouani, Yskandar Hamam, Eric Monacelli

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14265
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_64
Authors = Emmanuel Kalunga, Eric Monacelli, Karim Djouani, Quentin Barthélemy, Sylvain Chevallier, Yskandar Hamam
Keywords = Brain- Computer, Information geometry, Interfaces, Riemannian means, Steady State, Visually Evoked Potentials
Abstract
Brain Computer Interfaces (BCI) based on electroencephalography (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices, and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably, a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset, as the invariances embedded in those distances have an influence on the classification accuracy. Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load.


Voir la vidéo
From Euclidean to Riemannian Means Information Geometry for SSVEP Classification

From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification Emmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al. F’SATI - Tshawne University of Technology (South Africa) LISV - Université de Versailles Saint-Quentin (France) Mensia Technologies (France) sylvain.chevallier@uvsq.fr 28 October 2015 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Cerebral interfaces Context Rehabilitation and disability compensation ) Out-of-the-lab solutions ) Open to a wider population Problem Intra-subject variabilities ) Online methods, adaptative algorithms Inter-subject variabilities ) Good generalization, fast convergence Opportunities New generation of BCI (Congedo & Barachant) • Growing interest in EEG community • Large community, available datasets • Challenging situations and problems S. Chevallier 28/10/2015 GSI 2 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Outline Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances S. Chevallier 28/10/2015 GSI 3 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction based on brain activity Brain-Computer Interface (BCI) for non-muscular communication • Medical applications • Possible applications for wider population Recording at what scale ? • Neuron !LFP • Neuronal group !ECoG !SEEG • Brain !EEG !MEG !IRMf !TEP S. Chevallier 28/10/2015 GSI 4 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback S. Chevallier 28/10/2015 GSI 5 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Electroencephalography Most BCI rely on EEG ) Efficient to capture brain waves • Lightweight system • Low cost • Mature technologies • High temporal resolution • No trepanation S. Chevallier 28/10/2015 GSI 6 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Origins of EEG • Local field potentials • Electric potential difference between dendrite and soma • Maxwell’s equation • Quasi-static approximation • Volume conduction effect • Sensitive to conductivity of brain skull • Sensitive to tissue anisotropies S. Chevallier 28/10/2015 GSI 7 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental paradigms Different brain signals for BCI : • Motor imagery : (de)synchronization in premotor cortex • Evoked responses : low amplitude potentials induced by stimulus Steady-State Visually Evoked Potentials 8 electrodes in occipital region SSVEP stimulation LEDs 13 Hz 17 Hz 21 Hz • Neural synchronization with visual stimulation • No learning required, based on visual attention • Strong induced activation S. Chevallier 28/10/2015 GSI 8 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances BCI Challenges Limitations • Data scarsity ) A few sources are non-linearly mixed on all electrodes • Individual variabilities ) Effect of mental fatigue • Inter-session variabilities ) Electronic impedances, localizations of electrodes • Inter-individual variabilities ) State of the art approaches fail with 20% of subjects Desired properties : • Online systems ) Continously adapt to the user’s variations • No calibration phase ) Non negligible cognitive load, raises fatigue • Generic model classifiers and transfert learning ) Use data from one subject to enhance the results for another S. Chevallier 28/10/2015 GSI 9 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Spatial covariance matrices Common approach : spatial filtering • Efficient on clean datasets • Specific to each user and session ) Require user calibration • Two step training with feature selection ) Overfitting risk, curse of dimensionality Working with covariance matrices • Good generalization across subjects • Fast convergence • Existing online algorithms • Efficient implementations S. Chevallier 28/10/2015 GSI 10 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG • An EEG trial : X 2 RC⇥N , C electrodes, N time samples • Assuming that X ⇠ N(0, ⌃) • Covariance matrices ⌃ belong to MC = ⌃ 2 RC⇥C : ⌃ = ⌃| and x| ⌃x > 0, 8x 2 RC \0 • Mean of the set {⌃i }i=1,...,I is ¯⌃ = argmin⌃2MC PI i=1 dm (⌃i , ⌃) • Each EEG class is represented by its mean • Classification based on those means • How to obtain a robust and efficient algorithm ? Congedo, 2013 S. Chevallier 28/10/2015 GSI 11 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Minimum distance to Riemannian mean Simple and robust classifier • Compute the center ⌃ (k) E of each of the K classes • Assign a given unlabelled ˆ⌃ to the closest class k⇤ = argmin k (ˆ⌃, ⌃ (k) E ) Trajectories on tangent space at mean of all trials ¯⌃µ −4 −2 0 2 4 −4 −2 0 2 4 6 Resting class 13Hz class 21Hz class 17Hz class Delay S. Chevallier 28/10/2015 GSI 12 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Riemannian potato Removing outliers and artifacts Reject any ⌃i that lies too far from the mean of all trials ¯⌃µ z( i ) = i µ > zth , i is d(⌃i , ¯⌃), µ and are the mean and standard deviation of distances { i } I i=1 Raw matrices Riemannian potato filtering S. Chevallier 28/10/2015 GSI 13 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG-based BCI Riemannian approaches in BCI : • Achieve state of the art results ! performing like spatial filtering or sensor-space methods • Rely on simpler algorithms ! less error-prone, computationally efficient What are the reason of this success ? • Invariances embedded with Riemannian distances ! invariance to rescaling, normalization, whitening ! invariance to electrode permutation or positionning • Equivalent to working in an optimal source space ! spatial filtering are sensitive to outliers and user-specific ! no question on "sensors or sources" methods ) What are the most desirable invariances for EEG ? S. Chevallier 28/10/2015 GSI 14 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Considered distances and divergences Euclidean dE(⌃1, ⌃2) = k⌃1 ⌃2kF Log-Euclidean dLE(⌃1, ⌃2) = klog(⌃1) log(⌃2)kF V. Arsigny et al., 2006, 2007 Affine-invariant dAI(⌃1, ⌃2) = klog(⌃ 1 1 ⌃2)kF T. Fletcher & S. Joshi, 2004 , M. Moakher, 2005 ↵-divergence d↵ D(⌃1, ⌃2) 1<↵<1 = 4 1 ↵2 log det( 1 ↵ 2 ⌃1+ 1+↵ 2 ⌃2) det(⌃1) 1 ↵ 2 det(⌃2) 1+↵ 2 Z. Chebbi & M. Moakher, 2012 Bhattacharyya dB(⌃1, ⌃2) = ⇣ log det 1 2 (⌃1+⌃2) (det(⌃1) det(⌃2))1/2 ⌘1/2 Z. Chebbi & M. Moakher, 2012 S. Chevallier 28/10/2015 GSI 15 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental results • Euclidean distances yield the lowest results ! Usually attributed to the invariance under inversion that is not guaranteed ! Displays swelling effect • Riemannian approaches outperform state-of-the-art methods (CCA+SVM) • ↵-divergence shows the best performances ! but requires a costly optimisation to find the best ↵ value • Bhattacharyya has the lowest computational cost and a good accuracy −1 −0.5 0 0.5 1 20 30 40 50 60 70 80 90 Accuracy(%) Alpha values (α) −1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 CPUtime(s) S. Chevallier 28/10/2015 GSI 16 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Conclusion Working with covariance matrices in BCI • Achieves very good results • Simple algorithms work well : MDM, Riemannian potato • Need for robust and online methods Interesting applications for IG : • Many freely available datasets • Several competitions • Many open source toolboxes for manipulating EEG Several open questions : • Handling electrodes misplacements and others artifacts • Missing data and covariance matrices of lower rank • Inter- and intra-individual variabilities S. Chevallier 28/10/2015 GSI 17 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Thank you ! S. Chevallier 28/10/2015 GSI 18 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback First systems in early ’70 S. Chevallier 28/10/2015 GSI 19 / 19

Paul Marriott, Radka Sabolova, Germain Van Bever, Frank Critchley

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14262
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_61
Authors = Frank Critchley, Germain Van Bever, Paul Marriott, Radka Sabolova
Keywords =
Abstract
We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.


Voir la vidéo
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 Cressie-Read power divergence λ-family - equivalent to Amari’s α-family asymptotic properties of two test statistics: Pearson’s χ2-test and deviance simulation study for other statistics within power divergence family 3 k-asymptotics instead of N-asymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, High-Dimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. continuous data - High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis - example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in Log-Linear Models Despite the widespread usage of these [log-linear] models, the applicability and statistical properties of log-linear models under sparse settings are still very poorly understood. As a result, even though high-dimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difficult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodness-of-fit test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - theory Theorem For k > 1 and N ≥ 6, the first three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for fixed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Define the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) define ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions - Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of k-asymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does k-asymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp   k i=0    ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h      Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 Cressie-Read recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 Cressie-Read, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= -1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= -3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary - key points 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 k-asymptotics instead of N-asymptotics 3 Cressie-Read power divergence λ-family asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. Anaya-Izquierdo, F. Critchley, and P. Marriott (2014): When are first order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. Anaya-Izquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in log-linear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efficiency for certain goodnes-of-fit tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling

Hiroto Inoue

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14266
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_65
Authors = Hiroto Inoue
Keywords =
Abstract
We consider the geodesic equation on the elliptical model, which is a generalization of the normal model. More precisely, we characterize this manifold from the group theoretical view point and formulate Eriksen’s procedure to obtain geodesics on normal model and give an alternative proof for it.


Voir la vidéo
Group Theoretical Study on Geodesics for the Elliptical Models

Group Theoretical Study on Geodesics for the Elliptical Models Hiroto Inoue Kyushu University, Japan October 28, 2015 GSI2015, ´Ecole Polytechnique, Paris-Saclay, France Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 1 / 14 Overview 1 Eriksen’s construction of geodesics on normal model Problem 2 Reconsideration of Eriksen’s argument Embedding Nn → Sym+ n+1(R) 3 Geodesic equation on Elliptical model 4 Future work Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 2 / 14 Eriksen’s construction of geodesics on normal model Let Sym+ n (R) be the set of n-dimensional positive-definite matrices. The normal model Nn = (M, ds2) is a Riemannian manifold defined by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 ). The geodesic equation on Nn is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0. (1) The solution of this geodesic equation has been obtained by Eriksen. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 3 / 14 Theorem ([Eriksen 1987]) For any x ∈ Rn, B ∈ Symn(R), define a matrix exponential Λ(t) by Λ(t) =   ∆ δ Φ tδ tγ tΦ γ Γ   := exp(−tA), A :=   B x 0 tx 0 −tx 0 −x −B   ∈ Mat2n+1. (2) Then, the curve (µ(t), Σ(t)) := (−∆−1δ, ∆−1) is the geodesic on Nn satisfiying the initial condition (µ(0), Σ(0)) = (0, In), ( ˙µ(0), ˙Σ(0)) = (x, B). (proof) We see that by the definition, (µ(t), Σ(t)) satisfies the geodesic equation. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 4 / 14 Problem 1 Explain Eriksen’s theorem, to clarify the relation between the normal model and symmetric spaces. 2 Extend Eriksen’s theorem to the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 5 / 14 Reconsideration of Eriksen’s argument Sym+ n+1(R) Notice that the positive-definite symmetric matrices Sym+ n+1(R) is a symmetric space by G/K Sym+ n+1(R) gK → g · tg, where G = GLn+1(R), K = O(n + 1). This space G/K has the G-invariant Riemannian metric ds2 = 1 2 tr (S−1 dS)2 . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 6 / 14 Embedding Nn → Sym+ n+1(R) Put an affine subgroup GA := P µ 0 1 P ∈ GLn(R), µ ∈ Rn ⊂ GLn+1(R). Define a Riemannian submanifold as the orbit GA · In+1 = {g · t g| g ∈ GA} ⊂ Sym+ n+1(R). Theorem (Ref. [Calvo, Oller 2001]) We have the following isometry Nn ∼ −→ GA · In+1 ⊂ Sym+ n+1(R), (Σ, µ) → Σ + µtµ µ tµ 1 . (3) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 7 / 14 Embedding Nn → Sym+ n+1(R) By using the above embedding, we get a simpler expression of the metric and the geodesic equation. Nn ∼= GA · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σ + µtµ µ tµ 1 metric ds2 = (tdµ)Σ−1(dµ) +1 2tr((Σ−1dΣ)2) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0 ⇔ (In, 0)(S−1 ˙S) = (B, x) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 8 / 14 Reconsideration of Eriksen’s argument We can interpret the Eriksen’s argument as follows. Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (B, x) A =   B x 0 t x 0 −t x 0 −x −B   −→ e−tA =   ∆ δ ∗ t δ ∗ ∗ ∗ ∗   −→ S := ∆ δ t δ −1 ∈ ∈ ∈ {A : JAJ = −A} −→ {Λ : JΛJ = Λ−1 } −→ Essential! Nn ∼= GA · In+1 ∩ ∩ ∩ sym2n+1(R) −→ exp Sym+ 2n+1(R) −→ projection Sym+ n+1(R) Here J =   In 1 In  . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 9 / 14 Geodesic equation on Elliptical model Definition Let us define a Riemannian manifold En(α) = (M, ds2) by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 )+ 1 2 dα tr(Σ−1 dΣ) 2 . (4) where dα = (n + 1)α2 + 2α, α ∈ C. Then En(0) = Nn. The geodesic equation on En(α) is    ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ− dα ndα + 1 t ˙µΣ−1 ˙µΣ = 0. (5) This is equivalent to the geodesic equation on the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 10 / 14 Geodesic equation on Elliptical model The manifold En(α) is also embedded into positive-definite symmetric matrices Sym+ n+1(R), ref. [Calvo, Oller 2001], and we have simpler expression of the geodesic equation. En(α) ∼= ∃GA(α) · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = |Σ|α Σ + µtµ µ tµ 1 metric (4) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. (5) ⇔ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) |A| = det A Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 11 / 14 Geodesic equation on Elliptical model But, in general, we do not ever construct any submanifold N ⊂ Sym+ 2n+1(R) such that its projection is En(α): Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) Λ(t) −→ S(t) ∈ ∈ N −→ En(α) ∼= GA(α) · In+1 ∩ ∩ Sym+ 2n+1(R) −→ projection Sym+ n+1(R) The geodesic equation on elliptical model has not been solved. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 12 / 14 Future work 1 Extend Eriksen’s theorem for elliptical models (ongoing) 2 Find Eriksen type theorem for general symmetric spaces G/K Sketch of the problem: For a projection p : G/K → G/K, find a geodesic submanifold N ⊂ G/K, such that p|N maps all the geodesics to the geodesics: ∀Λ(t): Geodesic −→ p(Λ(t)): Geodesic ∈ ∈ N −→ p|N p(N) ∩ ∩ G/K −→ p:projection G/K Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 13 / 14 References Calvo, M., Oller, J.M. A distance between elliptical distributions based in an embedding into the Siegel group, J. Comput. Appl. Math. 145, 319–334 (2002). Eriksen, P.S. Geodesics connected with the Fisher metric on the multivariate normal manifold, pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987). Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 14 / 14

Shinto Eguchi, Osamu Komori

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14267
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_66
Authors = Osamu Komori, Shinto Eguchi
Keywords =
Abstract
We introduce a class of paths or one-parameter models connecting arbitrary two probability density functions (pdf’s). The class is derived by employing the Kolmogorov-Nagumo average between the two pdf’s. There is a variety of such path connectedness on the space of pdf’s since the Kolmogorov-Nagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The one-parameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.


Voir la vidéo
Path connectedness on a space of probability density functions

Path connectedness on a space of probability density functions Osamu Komori1 , Shinto Eguchi2 University of Fukui1 , Japan The Institute of Statistical Mathematics2 , Japan Ecole Polytechnique, Paris-Saclay (France) October 28, 2015 Komori, O. (University of Fukui) GSI2015 October 28, 2015 1 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 2 / 18 Setting Terminology . . X : data space P : probability measure on X FP: space of probability density functions associated with P We consider a path connecting f and g, where f, g ∈ FP, and investigate the property from a viewpoint of information geometry. Komori, O. (University of Fukui) GSI2015 October 28, 2015 3 / 18 Kolmogorov-Nagumo (K-N) average Let ϕ : (0, ∞) → R be an monotonic increasing and concave continuous function. Then for f and g in Fp The Kolmogorov-Nagumo (K-N) average . . ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) for 0 ≤ t ≤ 1. Remark 1 . . ϕ−1 is monotone increasing, convex and continuous on (0, ∞) Komori, O. (University of Fukui) GSI2015 October 28, 2015 4 / 18 ϕ-path Based on K-N average, we consider ϕ-path connecting f and g in FP: ϕ-path . . ft(x, ϕ) = ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) , where κt ≤ 0 is a normalizing factor, where the equality holds if t = 0 or t = 1. Komori, O. (University of Fukui) GSI2015 October 28, 2015 5 / 18 Existence of κt Theorem 1 . . There uniquely exists κt such that ∫ X ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) dP(x) = 1 Proof From the convexity of ϕ−1 , we have 0 ≤ ∫ ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) dP(x) ≤ ∫ {(1 − t)f(x) + tg(x)}dP(x) ≤ 1 And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing. Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equation above. Komori, O. (University of Fukui) GSI2015 October 28, 2015 6 / 18 Illustration of ϕ-path Komori, O. (University of Fukui) GSI2015 October 28, 2015 7 / 18 Examples of ϕ-path Example 1 . 1 ϕ0(x) = log(x). The ϕ0-path is given by ft(x, ϕ0) = exp((1 − t) log f(x) + t log g(x) − κt), where κt = log ∫ exp((1 − t) log f(x) + t log g(x))dP(x). 2 ϕη(x) = log(x + η) with η ≥ 0. The ϕη-path is given by ft(x, ϕη) = exp [ (1 − t) log{ f(x) + η} + t log{g(x) + η} − κt ] , where κt = log [ ∫ exp{(1 − t) log{f(x) + η} + t log{g(x) + η}}dP(x) − η ] . 3 ϕβ(x) = (xβ − 1)/β with β ≤ 1. The ϕβ-path is given by ft(x, ϕβ) = {(1 − t)f(x)β + tg(x)β − κt} 1 β , where κt does not have an explicit form. Komori, O. (University of Fukui) GSI2015 October 28, 2015 8 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 9 / 18 Extended expectation For a function a(x): X → R, we consider Extended expectation . . E(ϕ) f {a(X)} = ∫ X 1 ϕ′(f(x)) a(x)dP(x) ∫ X 1 ϕ′(f(x)) dP(x) , where ϕ: (0, ∞) → R is a generator function. Remark 2 If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation. Komori, O. (University of Fukui) GSI2015 October 28, 2015 10 / 18 Properties of extended expectation We note that 1 E(ϕ) f (c) = c for any constant c. 2 E(ϕ) f {ca(X)} = cE(ϕ) f {a(X)} for any constant c. 3 E(ϕ) f {a(X) + b(X)} = E(ϕ) f {a(X)} + E(ϕ) f {b(X)}. 4 E(ϕ) f {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 for P-almost everywhere x in X. Remark 3 If we define f(ϕ) (x) = 1/ϕ′ ( f(x))/ ∫ X 1/ϕ′ (f(x))dP(x), then E(ϕ) f {a(X)} = Ef(ϕ) {a(X)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 11 / 18 Tangent space of FP Let Hf be a Hilbert space with the inner product defined by ⟨a, b⟩f = E(ϕ) f {a(X)b(X)}, and the tangent space Tangent space associated with extended expectation . . Tf = {a ∈ Hf : ⟨a, 1⟩f = 0}. For a statistical model M = { fθ(x)}θ∈Θ we have E(ϕ) fθ {∂iϕ(fθ(X))} = 0 for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi)i=1,··· ,p. Further, E(ϕ) fθ {∂i∂jϕ(fθ(X))} = E(ϕ) fθ { ϕ′′ ( fθ(X)) ϕ′(fθ(X))2 ∂iϕ(fθ(X))∂iϕ(fθ(X)) } . Komori, O. (University of Fukui) GSI2015 October 28, 2015 12 / 18 Parallel displacement A(ϕ) t Define A(ϕ) t (x) in Tft by the solution for a differential equation ˙A(ϕ) t (x) − E(ϕ) ft { A(ϕ) t ˙ft ϕ′′ ( ft) ϕ′(ft) } = 0, where ft is a path connecting f and g such that f0 = f and f1 = g. ˙A(ϕ) t (x) is the derivative of A(ϕ) t (x) with respect to t. Theorem 2 The geodesic curve {ft}0≤t≤1 by the parallel displacement A(ϕ) t is the ϕ-path. Komori, O. (University of Fukui) GSI2015 October 28, 2015 13 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 14 / 18 U-divergence Assume that U(s) is a convex and increasing function of a scalar s and let ξ(t) = argmaxs{st − U(s)} . Then we have U-divergence . . DU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP − ∫ {U(ξ(f)) − fξ( f)}dP. In fact, U-divergence is the difference of the cross entropy CU( f, g) with the diagonal entropy CU( f, f), where CU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP. Komori, O. (University of Fukui) GSI2015 October 28, 2015 15 / 18 Connections based on U-divergence For a manifold of finite dimension M = { fθ(x) : θ ∈ Θ} and vector fields X and Y on M, the Riemannian metric is G(U) (X, Y)(f) = ∫ X f Yξ( f)dP for f ∈ M and linear connections ∇(U) and ∇∗(U) are G(U) (∇(U) X Y, Z)(f) = ∫ XY f Zξ(f)dP and G(U) (∇∗ X (U) Y, Z)(f) = ∫ Z f XYξ(f)dP. See Eguchi (1992) for details. Komori, O. (University of Fukui) GSI2015 October 28, 2015 16 / 18 Equivalence between ∇∗ -geodesic and ξ-path Let ∇(U) and ∇∗(U) be linear connections associated with U-divergence DU, and let C(ϕ) = {ft(x, ϕ) : 0 ≤ t ≤ 1} be the ϕ path connecting f and g of FP. Then, we have Theorem 3 A ∇(U) -geodesic curve connecting f and g is equal to C(id) , where id denotes the identity function; while a ∇∗(U) -geodesic curve connecting f and g is equal to C(ξ) , where ξ(t) = argmaxs{st − U(s)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 17 / 18 Summary 1 We consider ϕ-path based on Kolmogorov-Nagumo average. 2 The relation between U-divergence and ϕ-path was investigated (ϕ corresponds to ξ). 3 The idea of ϕ-path can be applied to probability density estimation as well as classification problems. 4 Divergence associated with ϕ-path can be considered, where a special case would be Bhattacharyya divergence. Komori, O. (University of Fukui) GSI2015 October 28, 2015 18 / 18

Monta Sakamoto, Hiroshi Matsuzoe

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14272
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_79
Authors = Hiroshi Matsuzoe, Monta Sakamoto
Keywords =
Abstract
In anomalous statistical physics, deformed algebraic structures are important objects. Heavily tailed probability distributions, such as Student’s t-distributions, are characterized by deformed algebras. In addition, deformed algebras cause deformations of expectations and independences of random variables. Hence, a generalization of independence for multivariate Student’s t-distribution is studied in this paper. Even if two random variables which follow to univariate Student’s t-distributions are independent, the joint probability distribution of these two distributions is not a bivariate Student’s t-distribution. It is shown that a bivariate Student’s t-distribution is obtained from two univariate Student’s t-distributions under q-deformed independence.


Voir la vidéo
A generalization of independence and multivariate Student's t-distributions

A generalization of independence and multivariate Student’s t-distributions MATSUZOE Hiroshi Nagoya Institute of Technology joint works with SAKAMOTO Monta (Efrei, Paris) 1 Deformed exponential family 2 Non-additive differentials and expectation functionals 3 Geometry of deformed exponential families 4 Generalization of independence 5 q-independence and Student’s t-distributions 6 Appendix   Notions of expectations, independence are determined from the choice of statistical models.  Introduction: Geometry and statistics • Geometry for the sample space • Geometry for the parameter space • Wasserstein geometry • Optimal transport theory • A pdf is regarded as a distribution of mass • Information geometry • Convexity of entropy and free energy • Duality of estimating function

Tomonari Sei, Ushio Tanaka

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14271
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_78
Authors = Tomonari Sei, Ushio Tanaka
Keywords =
Abstract
The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix in order to draw a parallel coordinate plot. In this paper, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a geometrical viewpoint. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are restricted to be full-rank.


Voir la vidéo
Differential geometric properties of textile plot

What is textile plot? Textile set Main result Other results Summary Geometric Properties of textile plot Tomonari SEI and Ushio TANAKA University of Tokyo and Osaka Prefecture University at ´Ecole Polytechnique, Oct 28, 2015 1 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary 1 What is textile plot? 2 Textile set 3 Main result 4 Other results 5 Summary 3 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Coefficients a = (aj ) and b = (bj ) are the solution of the following minimization problem: Minimize a,b n∑ t=1 p∑ j=1 (ytj − ¯yt·)2 subject to yj = aj + bj xj , p∑ j=1 yj 2 = 1. Intuition: as horizontal as possible. Solution: a = 0 and b is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of X. yt1 yt2 yt3 yt4 yt5 yt. 6 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = 100, p = 4) X ∈ R100×4. Each row ∼ N(0, Σ), Σ =   1 −0.6 0.5 0.1 −0.6 1 −0.6 −0.2 0.5 −0.6 1 0.0 0.1 −0.2 0.0 1  . −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 (a) raw data X (b) textile plot Y 7 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = p = 2) T2,2 ⊂ R4 is the union of two tori, glued along O(2). θ φ ξ η T2,2 = { 1 √ 2 ( cos θ cos φ sin θ sin φ )} ∪ { 1 2 ( cos ξ + cos η cos ξ − cos η sin ξ + sin η sin ξ − sin η )} 11 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary Main theorem The differential geometrical property of U∗∗ n,p is given as follows: Theorem Let n ≥ p ≥ 3. Then we have the following decomposition U∗∗ n,p = M1 ∪ M2, where each Mi is a differentiable manifold, the dimensions of which are given by dim M1 = p(p + 1) 2 − (p − 1), dim M2 = p(p + 1) 2 − p, respectively. M2 is connected while M1 may not. 16 / 23 What is textile plot? Textile set Main result Other results Summary Example U∗∗ 3,3 is the union of 4-dim and 3-dim manifolds. We look at a cross section with y11 = y22 = 1: y12 y13 y33 Union of a surface and a vertical line. 17 / 23 What is textile plot? Textile set Main result Other results Summary Corollary Let n ≥ p ≥ 3. Then we have U∗ n,p = π−1 (M1) ∪ π−1 (M2), where π denotes the map of Gram-Schmidt orthonormalization. The dimensions are dim π−1 (M1) = np − (p − 1), dim π−1 (M2) = np − p. 18 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Lastly, we state a characterization of fλ −1 (O) from the viewpoint of differential geometry. Theorem Let λ ≥ 0. fλ −1 (O) is a regular sub-manifold of Rn×p with codimension p + 1 whenever λ > 0, y11yjj − y1j yj1 = 0, j = 2, . . . , p, ∃ ∈ { 2, . . . , p }; p∑ j=2 yij + yi (1 − 2λ) = 0, i = 1, . . . , n. 21 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary References . 1 Absil, P.-A., Mahony, R., and Sepulchre, R. (2008), Optimization Algorithms on Matrix Manifolds, Princeton University Press. . 2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot, Proceedings of the Institute of Statistical Mathematics, 55, 69–83. . 3 Inselberg, A. (2009), Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Springer. 4 Kumasaka, N. and Shibata, R. (2008), High-dimensional data visualisation: The textile plot, Computational Statistics and Data Analysis, 52, 3616–3644. 23 / 23

Damiano Brigo, John Armstrong

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14269
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_76
Authors = Damiano Brigo, John Armstrong
Keywords =
Abstract
We review the manifold projection method for stochastic nonlinear filtering in a more general setting than in our previous paper in Geometric Science of Information 2013. We still use a Hilbert space structure on a space of probability densities to project the infinite dimensional stochastic partial differential equation for the optimal filter onto a finite dimensional exponential or mixture family, respectively, with two different metrics, the Hellinger distance and the L2 direct metric. This reduces the problem to finite dimensional stochastic differential equations. In this paper we summarize a previous equivalence result between Assumed Density Filters (ADF) and Hellinger/Exponential projection filters, and introduce a new equivalence between Galerkin method based filters and Direct metric/Mixture projection filters. This result allows us to give a rigorous geometric interpretation to ADF and Galerkin filters. We also discuss the different finite-dimensional filters obtained when projecting the stochastic partial differential equation for either the normalized (Kushner-Stratonovich) or a specific unnormalized (Zakai) density of the optimal filter.


Voir la vidéo
Stochastic PDE projection on manifolds Assumed-Density and Galerkin Filters

Stochastic PDE projection on manifolds: Assumed-Density and Galerkin Filters GSI 2015, Oct 28, 2015, Paris Damiano Brigo Dept. of Mathematics, Imperial College, London www.damianobrigo.it — Joint work with John Armstrong Dept. of Mathematics, King’s College, London — Full paper to appear in MCSS, see also arXiv.org D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 1 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). The topology & differential structure in the chart is the L2 structure, but two possibilities: S : d2(p1, p2) = p1 − p2 (L2 direct distance), p1,2 ∈ L2 S1/2 : dH( √ p1, √ p2) = √ p1 − √ p2 (Hellinger distance), p1,2 ∈ L1 where · is the norm of Hilbert space L2. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) d2 ort. projection: Πγ θ [v] = m i=1 [ m j=1 γij (θ) v, ∂p(·, θ) ∂θj ] ∂p(·, θ) ∂θi (dH proj. analogous inserting √ · and replacing γ with g) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) The nonlinear filtering problem consists in finding the conditional probability distribution πt of the state Xt given the observations up to time t, i.e. πt (dx) := P[Xt ∈ dx | Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t). Assume πt has a density pt : then pt satisfies the Str SPDE: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). With Ito calculus we would have terms ∂2p(·,θ) ∂θi ∂θj d θi, θj (not tang vec) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . Instead, using the Hellinger distance & the Fisher metric with projection Πg dθi t =   m j=1 gij (θt ) L∗ t p(x, θt ) p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 gij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 gij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). Projection filter in η coincides with classical approx filter: assumed density filter (based on generalized “moment matching”) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. We define a simple mixture family as follows. Given m + 1 fixed squared integrable probability densities q = [q1, q2, . . . , qm+1]T , define ˆθ(θ) := [θ1, θ2, . . . , θm, 1 − θ1 − θ2 − . . . − θm]T for all θ ∈ Rm. We write ˆθ instead of ˆθ(θ). Mixture family (simplex): SM (q) = {ˆθ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1} D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). The L2 metric does not depend on the specific point θ of the manifold. The same holds for the tangent space at p(·, θ), which is given by span{q1 − qm+1, q2 − qm+1, · · · , qm − qm+1} Also the L2 projection becomes particularly simple. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. See the full paper for the details. Summing up: Family → Exponential Basic Mixture Metric ↓ Hellinger dH Good Nothing special Fisher g(θ) ∼ADF ≈ local moment matching Direct L2 d2 Nothing special Good matrix γ(θ) (∼Galerkin) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 We are now going to illustrate the Gaussian mixture projection filter (GMPF) in a fundamental example. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (pink) vs EKF (N) (blue) vs exact (green, finite diff. method, grid 1000 state & 5000 time) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 0 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 13 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 1 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 14 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 2 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 15 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 3 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 16 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 4 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 17 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 5 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 18 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 6 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 19 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 7 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 20 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 8 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 21 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 9 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 22 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 10 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 23 / 37 Mixture Projection Filter The quadratic sensor Comparing local approximation errors (L2 residuals) εt ε2 t = (pexact,t (x) − papprox,t (x))2 dx papprox,t (x): three possible choices. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (blue) vs EKF (N) (green) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 24 / 37 Mixture Projection Filter The quadratic sensor L2 residuals for the quadratic sensor 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 25 / 37 Mixture Projection Filter The quadratic sensor Comparing local approx errors (Prokhorov residuals) εt εt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) + ∀x} with F the CDF of p’s. Levy-Prokhorov metric works well with singular densities like particles where L2 metric not ideal. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (green) vs best three particles (blue) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 26 / 37 Mixture Projection Filter The quadratic sensor L´evy residuals for the quadratic sensor 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 1 2 3 4 5 6 7 8 9 10 Time ProkhorovResiduals Prokhorov Residual (L2NM) Prokhorov Residual (HE) Best possible residual (3Deltas) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 27 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular The solution is to dynamically change the parameterization and even the dimension of the manifold. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? Optimality: introducing new projections (forthcoming J. Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Thanks With thanks to the organizing committee. Thank you for your attention. Questions and comments welcome D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 30 / 37 Conclusions and References References I [1] J. Aggrawal: Sur l’information de Fisher. In: Theories de l’Information (J. Kampe de Feriet, ed.), Springer-Verlag, Berlin–New York 1974, pp. 111-117. [2] Amari, S. Differential-geometrical methods in statistics, Lecture notes in statistics, Springer-Verlag, Berlin, 1985 [3] Armstrong, J., and Brigo, D. (2016). Nonlinear filtering via stochastic PDE projection on mixture manifolds in L2 direct metric, Mathematics of Control, Signals and Systems, 2016, accepted. [4] Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W. (1999). Nonlinear Projection Filter based on Galerkin approximation. AIAA Journal of Guidance Control and Dynamics, 22 (2): 258-266. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 31 / 37 Conclusions and References References II [5] Beard, R. and Gunther, J. (1997). Galerkin Approximations of the Kushner Equation in Nonlinear Estimation. Working Paper, Brigham Young University. [6] Barndorff-Nielsen, O.E. (1978). Information and Exponential Families. John Wiley and Sons, New York. [7] Brigo, D. Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. Barndorff-Nielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientific, 1999 [8] Brigo, D, On SDEs with marginal laws evolving in finite-dimensional exponential families, STAT PROBABIL LETT, 2000, Vol: 49, Pages: 127 – 134 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 32 / 37 Conclusions and References References III [9] Brigo, D. (2011). The direct L2 geometric structure on a manifold of probability densities with applications to Filtering. Available on arXiv.org and damianobrigo.it [10] Brigo, D, Hanzon, B, LeGland, F, A differential geometric approach to nonlinear filtering: The projection filter, IEEE T AUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252 [11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear filtering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 – 534 [12] D. Brigo, Filtering by Projection on the Manifold of Exponential Densities, PhD Thesis, Free University of Amsterdam, 1996. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 33 / 37 Conclusions and References References IV [13] Brigo, D., and Pistone, G. (1996). Projecting the Fokker-Planck Equation onto a finite dimensional exponential family. Available at arXiv.org [14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbook of Nonlinear Filtering, Oxford University Press. [15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear filtering, in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: The Mathematics of Filtering and Identification and Applications (Reidel, Dordrecht, 1981) 53–75. [16] Elworthy, D. (1982). Stochastic Differential Equations on Manifolds. LMS Lecture Notes. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 34 / 37 Conclusions and References References V [17] Hanzon, B. A differential-geometric approach to approximate nonlinear filtering. In C.T.J. Dodson, Geometrization of Statistical Theory, pages 219 – 223,ULMD Publications, University of Lancaster, 1987. [18] B. Hanzon, Identifiability, recursive identification and spaces of linear dynamical systems, CWI Tracts 63 and 64, CWI, Amsterdam, 1989 [19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence of finite dimensional filters for conditional statistics of the cubic sensor problem, Systems and Control Letters 3 (1983) 331–340. [20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften, vol. 288 (1987), Springer-Verlag, Berlin, D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 35 / 37 Conclusions and References References VI [21] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochastic differential equations for the non linear filtering problem. Osaka J. Math. Volume 9, Number 1 (1972), 19-40. [23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets of Probability Distributions. Presented at the 1st International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 29 June - 2 July 1999 [24] R. Z. Khasminskii (1980). Stochastic Stability of Differential Equations. Alphen aan den Reijn [25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I, General Theory (Springer Verlag, Berlin, 1978). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 36 / 37 Conclusions and References References VII [26] M. Murray and J. Rice - Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993. [27] D. Ocone, E. Pardoux, A Lie algebraic criterion for non-existence of finite dimensionally computable filters, Lecture notes in mathematics 1390, 197–204 (Springer Verlag, 1989) [28] Pistone, G., and Sempi, C. (1995). An Infinite Dimensional Geometric Structure On the space of All the Probability Measures Equivalent to a Given one. The Annals of Statistics 23(5), 1995 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 37 / 37

Ali Mohammad-Djafari

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14270
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_77
Authors = Ali Mohammad-Djafari
Keywords =
Abstract
Clustering, classification and Pattern Recognition in a set of data are between the most important tasks in statistical researches and in many applications. In this paper, we propose to use a mixture of Student-t distribution model for the data via a hierarchical graphical model and the Bayesian framework to do these tasks. The main advantages of this model is that the model accounts for the uncertainties of variances and covariances and we can use the Variational Bayesian Approximation (VBA) methods to obtain fast algorithms to be able to handle large data sets.


Voir la vidéo
Variational Bayesian Approximation method for Classification and Clustering with a mixture of Studen

. Variational Bayesian Approximation method for Classification and Clustering with a mixture of Student-t model Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr http://djafari.free.fr http://publicationslist.org/djafari A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 1/20 Contents 1. Mixture models 2. Different problems related to classification and clustering Training Supervised classification Semi-supervised classification Clustering or unsupervised classification 3. Mixture of Student-t 4. Variational Bayesian Approximation 5. VBA for Mixture of Student-t 6. Conclusion A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 2/20 Mixture models General mixture model p(x|a, Θ, K) = K k=1 ak pk(xk|θk), 0 < ak < 1 Same family pk(xk|θk) = p(xk|θk), ∀k Gaussian p(xk|θk) = N(xk|µk, Σk) with θk = (µk, Σk) Data X = {xn, n = 1, · · · , N} where each element xn can be in one of these classes cn. ak = p(cn = k), a = {ak, k = 1, · · · , K}, Θ = {θk, k = 1, · · · , K} p(Xn, cn = k|a, θ) = N n=1 p(xn, cn = k|a, θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 3/20 Different problems Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classification: Given a sample xm and the parameters K, a and Θ determine its class k∗ = arg max k {p(cm = k|xm, a, Θ, K)} . Semi-supervised classification (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k∗ = arg max k {p(cm = k|xm, Θ, K)} . Clustering or unsupervised classification (Number of classes K is not known): Given a set of data X, determine K and c. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 4/20 Training Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): (a, Θ) = arg max (a,Θ) {p(X, c|a, Θ, K)} . Bayesian: Assign priors p(a|K) and p(Θ|K) = K k=1 p(θk) and write the expression of the joint posterior laws: p(a, Θ|X, c, K) = p(X, c|a, Θ, K) p(a|K) p(Θ|K) p(X, c|K) where p(X, c|K) = p(X, c|a, Θ|K)p(a|K) p(Θ|K) da dΘ Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 5/20 Supervised classification Given a sample xm and the parameters K, a and Θ determine p(cm = k|xm, a, Θ, K) = p(xm, cm = k|a, Θ, K) p(xm|a, Θ, K) where p(xm, cm = k|a, Θ, K) = akp(xm|θk) and p(xm|a, Θ, K) = K k=1 ak p(xm|θk) Best class k∗: k∗ = arg max k {p(cm = k|xm, a, Θ, K)} A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 6/20 Semi-supervised classification Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = k|xm, Θ, K) = p(xm, cm = k|Θ, K) p(xm|Θ, K) where p(xm, cm = k|Θ, K) = p(xm, cm = k|a, Θ, K)p(a|K) da and p(xm|Θ, K) = K k=1 p(xm, cm = k|Θ, K) Best class k∗, for example the MAP solution: k∗ = arg max k {p(cm = k|xm, Θ, K)} . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 7/20 Clustering or non-supervised classification Given a set of data X, determine K and c. Determination of the number of classes: p(K = L|X) = p(X, K = L) p(X) = p(X|K = L) p(K = L) p(X) and p(X) = L0 L=1 p(K = L) p(X|K = L), where L0 is the a priori maximum number of classes and p(X|K = L) = n L k=1 akp(xn, cn = k|θk)p(a|K) p(Θ|K) da dΘ When K and c are determined, we can also determine the characteristics of those classes a and Θ. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 8/20 Mixture of Student-t model Student-t and its Infinite Gaussian Scaled Model (IGSM): T (x|ν, µ, Σ) = ∞ 0 N(x|µ, z−1 Σ) G(z| ν 2 , ν 2 ) dz where N(x|µ, Σ)= |2πΣ|−1 2 exp −1 2(x − µ) Σ−1 (x − µ) = |2πΣ|−1 2 exp −1 2Tr (x − µ)Σ−1 (x − µ) and G(z|α, β) = βα Γ(α) zα−1 exp [−βz] . Mixture of Student-t: p(x|{νk, ak, µk, Σk, k = 1, · · · , K}, K) = K k=1 ak T (xn|νk, µk, Σk). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 9/20 Mixture of Student-t model Introducing znk, zk = {znk, n = 1, · · · , N}, Z = {znk}, c = {cn, n = 1, · · · , N}, θk = {νk, ak, µk, Σk}, Θ = {θk, k = 1, · · · , K} Assigning the priors p(Θ) = k p(θk), we can write: p(X, c, Z, Θ|K) = n k akN(xn|µk, z−1 n,k Σk) G(znk|νk 2 , νk 2 ) p(θk) Joint posterior law: p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K) p(X|K) . The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classification or clustering. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA) Main idea: to propose easy computational approximation q(c, Z, Θ) for p(c, Z, Θ|X, K). Criterion: KL(q : p) Interestingly, by noting that p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K)/p(X|K) we have: KL(q : p) = −F(q) + ln p(X|K) where F(q) = − ln p(X, c, Z, Θ|K) q is called free energy of q and we have the following properties: – Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(X|K). – When the optimum q∗ is obtained, F(q∗) can be used as a criterion for model selection. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 11/20 VBA: choosing the good families Using KL(q : p) has the very interesting property that using q to compute the means we obtain the same values if we have used p (Conservation of the means). Unfortunately, this is not the case for variances or other moments. If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 12/20 Hierarchical graphical model ξ0 d d‚    © αk   βk   znk   E γ0, Σ0 c Σk   µ0, η0 c µk   k0 c a   d d‚    © d d‚    © ¨ ¨¨¨ ¨¨%xn   E Figure : Graphical representation of the model. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 13/20 VBA for mixture of Student-t In our case, noting that p(X, c, Z, Θ|K) = n k p(xn, cn, znk|ak, µk, Σk, νk) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with p(xn, cn, znk|ak, µk, Σk, νk) = N(xn|µk, z−1 n,k Σk) G(znk|αk, βk) is separable, in one side for [c, Z] and in other size in components of Θ, we propose to use q(c, Z, Θ) = q(c, Z) q(Θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 14/20 VBA for mixture of Student-t With this decomposition, the expression of the Kullback-Leibler divergence becomes: KL(q1(c, Z)q2(Θ) : p(c, Z, Θ|X, K) = c q1(c, Z)q2(Θ) ln q1(c, Z)q2(Θ) p(c, Z, Θ|X, K) dΘ dZ The expression of the Free energy becomes: F(q1(c, Z)q2(Θ)) = c q1(c, Z)q2(Θ) ln p(X, c, Z|Θ, K)p(Θ|K) q1(c, Z)q2(Θ) dΘ dZ A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Student-t priors model Using a generalized Student-t obtained by replacing G(zn,k|νk 2 , νk 2 ) by G(zn,k|αk, βk) it will be easier to propose conjugate priors for αk, βk than for νk. p(xn, cn = k, znk|ak, µk, Σk, αk, βk, K) = ak N(xn|µk, z−1 n,k Σk) G(zn,k|αk, βk). In the following, noting by Θ = {(ak, µk, Σk, αk, βk), k = 1, · · · , K}, we propose to use the factorized prior laws: p(Θ) = p(a) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with the following components:    p(a) = D(a|k0), k0 = [k0, · · · , k0] = k01 p(αk) = E(αk|ζ0) = G(αk|1, ζ0) p(βk) = E(βk|ζ0) = G(αk|1, ζ0) p(µk|Σk) = N(µk|µ01, η−1 0 Σk) p(Σk) = IW(Σk|γ0, γ0Σ0) A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Student-t priors model where D(a|k) = Γ( l kk) l Γ(kl ) l akl −1 l is the Dirichlet pdf, E(t|ζ0) = ζ0 exp [−ζ0t] is the Exponential pdf, G(t|a, b) = ba Γ(a) ta−1 exp [−bt] is the Gamma pdf and IW(Σ|γ, γ∆) = |1 2∆|γ/2 exp −1 2Tr ∆Σ−1 ΓD(γ/2)|Σ| γ+D+1 2 . is the inverse Wishart pdf. With these prior laws and the likelihood: joint posterior law: pk(c, Z, Θ|X) = p(X, c, Z, Θ) p(X) . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 17/20 Expressions of q q(c, Z, Θ) = q(c, Z) q(Θ) = n k[q(cn = k|znk) q(znk)] k[q(αk) q(βk) q(µk|Σk) q(Σk)] q(a). with:    q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ] q(αk) = G(αk|˜ζk, ˜ηk) q(βk) = G(βk|˜ζk, ˜ηk) q(µk|Σk) = N(µk|µ, ˜η−1Σk) q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ) With these choices, we have F(q(c, Z, Θ)) = ln p(X, c, Z, Θ|K) q(c,Z,Θ) = k n F1kn + k F2k F1kn = ln p(xn, cn, znk, θk) q(cn=k|znk )q(znk ) F2k = ln p(xn, cn, znk, θk) q(θk )A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 18/20 VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) fixed, we obtain the expression of q(cn = k|znk) = ˜ak, q(znk) = G(znk|αk, βk). M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) fixed, we obtain the expression of q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ], q(αk) = G(αk|˜ζk, ˜ηk), q(βk) = G(βk|˜ζk, ˜ηk), q(µk|Σk) = N(µk|µ, ˜η−1Σk), and q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ), which gives the updating algorithm for the corresponding tilded parameters. F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. Final value of F(q) for each value of K, noted Fk, can be used as a criterion for model selection, i.e.; the determination of the number of clusters. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 19/20 Conclusions Clustering and classification of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. Mixture models and in particular Mixture of Gaussians are classical models for these tasks. We proposed to use a mixture of generalised Student-t distribution model for the data via a hierarchical graphical model. To obtain fast algorithms and be able to handle large data sets, we used conjugate priors everywhere it was possible. The proposed algorithm has been used for clustering, classification and discriminant analysis of some biological data (Cancer research related), but in this paper, we only presented the main algorithm. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 20/20

Barbara Opozda

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14279
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_26
Authors = Barbara Opozda
Keywords = Affine connection, Curvature tensor, Laplacian Bochner’s technique, Ricci tensor, Sectional curvature
Abstract
Curvature properties for statistical structures are studied. The study deals with the curvature tensor of statistical connections and their duals as well as the Ricci tensor of the connections, Laplacians and the curvature operator. Two concepts of sectional curvature are introduced. The meaning of the notions is illustrated by presenting few exemplary theorems.


Voir la vidéo
Curvatures of Statistical Structures

Curvatures of statistical structures Barbara Opozda Paris, October 2015 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 1 / 29 Statistical structures - statistical setting M - open subset of Rn Λ - probability space with a fixed σ-algebra p : M × Λ (x, λ) → p(x, λ) ∈ R - smooth relative to x such that px (λ) := p(x, λ) is a probability measure on Λ — probability distribution (x, λ) := log(p(x, λ)) gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to the probability px ∀x ∈ M, ∂1, ..., ∂n - the canonical frame on M g – Fisher information metric tensor field on M Cijk(x) = Ex [(∂i )(∂j )(∂k )] - cubic form (g, C) – statistical structure on M Barbara Opozda () Curvatures of statistical structures Paris, October 2015 2 / 29 Statistical structures (Codazzi structures)– geometric setting; three equivalent definitions M – manifold, dim M = n I) (g, C), C - totally symmetric (0, 3)-tensor field on M, that is, C(X, Y , Z) = C(Y , X, Z) = C(Y , Z, X) ∀X, Y , Z ∈ Tx M, x ∈ M C – cubic form II) (g, K), K – symmetric (1, 2)-tensor field (i.e., K(X, Y ) = K(Y , X)) and symmetric relative to g, that is, g(X, K(Y , Z)) = g(Y , K(X, Z)) is symmetric for all arguments. C(X, Y , Z) = g(X, K(Y , Z)) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 3 / 29 III) (g, ), - torsion-free connection such that ( X g)(Y , Z) = ( Y g)(X, Z) (1) — statistical connection T – any tensor field of type (p, q) on M, T – of type (p, q + 1) T(X, Y1, ..., Yq) = ( X T)(Y1, ..., Yq) In particular, g(X, Y , Z) = ( X g)(Y , Z) (1) ⇔ g is a symmetric cubic form ˆ - Levi-Civita connection for g K(X, Y ) := X Y − ˆ X Y K – difference tensor g(X, Y , Z) = −2g(X, K(Y , Z)) = −2C(X, Y , Z) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 4 / 29 A statistical structure is trivial if and only if K = 0 or equivalently C = 0 or equivalently = ˆ . KX Y := K(X, Y ) E := tr g K = K(e1, e1) + ... + K(en, en) = (tr Ke1 )e1 + ... + (tr Ken )en E – mean difference vector field E = 0 ⇔ tr KX = 0 ∀X ∈ TM ⇔ tr g C(X, ·, ·) = 0 ∀X ∈ TM E = 0 ⇒ trace-free statistical structure Fact. (g, ) – trace-free if and only if νg = 0, where νg – volume form determined by g Barbara Opozda () Curvatures of statistical structures Paris, October 2015 5 / 29 Examples Riemannian geometry of the second fundamental form M – locally strongly hypersurface in Rn+1 – the second fundamental form h satisfies the Codazzi equation h(X, Y , Z) = h(Y , X, Z), where is the induced connection (the Levi-Civita connection of the first fundamental form) (h, ) - statistical structure Similarly one gets statistical structures on hypersurfaces in space forms. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 6 / 29 Equiaffine geometry of hypersurfaces in the standard affine space Rn+1 M – locally strongly convex hypersurface in Rn+1 ξ – a transversal vector field D – standard flat connection on Rn+1, X, Y ∈ X(M), ξ - transversal vector field DX Y = X Y + h(X, Y )ξ − Gauss formula – induced connection, h – second fundamental form (metric tensor field) DX ξ = −SX + τ(X)ξ − Weingarten formula If τ = 0, ξ is called equiaffine. In this case the Codazzi equation is satisfied h(X, Y , Z) = h(Y , X, Z) (h, ) – statistical structure Barbara Opozda () Curvatures of statistical structures Paris, October 2015 7 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 8 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 9 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 10 / 29 Geometry of Lagrangian submanifolds in Kaehler manifolds N – Kaehler manifold of real dimension 2n and with complex structure J M – Lagrangian submanifold of N - n-dimensional submanifold such that JTM orthogonal to TM, i.e. JTM is the normal bundle (in the metric sense) for M ⊂ N D – the Kaehler connection on N DX Y = X Y + JK(X, Y ) g – induced metric tensor field on M (g, K) – statistical structure It is trace-free ⇔ M is minimal in N. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 11 / 29 Most of statistical structures are outside the three classes of examples. For instance, in order that a statistical structure is locally realizable on an equiaffine hypersurface it is necessary that is projectively flat. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 12 / 29 Dual connections, curvature tensors g – metric tensor field on M, – any connection Xg(Y , Z) = g( X Y , Z) + g(Y , X Z) (2) – dual connection (g, ) – statistical structure if and only if (g, ) – statistical structure R(X, Y )Z – (1, 3) - curvature tensor for If R = 0 the structure is called Hessian R(X, Y )Z – curvature tensor for g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) (3) In particular, R = 0 ⇔ R = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 13 / 29 ˆ – Levi-Civita connection for g, = ˆ + K, = ˆ − K ˆR – curvature tensor for ˆ R(X, Y ) = ˆR(X, Y ) + ( ˆ X K)Y − ( ˆ Y K)X + [KX , KY ] (4) , where [KX , KY ] = KX KY − KY KX R(X, Y ) = ˆR(X, Y ) − ( ˆ X K)Y + ( ˆ Y K)X + [KX , KY ] (5) R(X, Y ) + R(X, Y ) = 2ˆR(X, Y ) + 2[KX , KY ] (6) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 14 / 29 Sectional curvatures R does not have to be skew-symmetric relative to g, i.e. g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z), in general. Lemma * The following conditions are equivalent: 1) g(R(X, Y )Z, W ) = −g(R(X, Y )W , Z) ∀X, Y , Z, W 2) R = R 3) ˆ K is symmetric, that is, ( ˆ K)(X, Y , Z) = ( ˆ X K)(Y , Z) = ( ˆ Y K)(X, Z) = ( ˆ K)(Y , X, Z) ∀X, Y , Z. For hypersurfaces in Rn+1 each of the above conditions describes an affine sphere Barbara Opozda () Curvatures of statistical structures Paris, October 2015 15 / 29 R := R+R 2 [K, K](X, Y )Z := [KX , KY ]Z R(X, Y )Z and [K, K](X, Y )Z are Riemann-curvature-like tensors – they are skew-symmetric in X, Y , satisfy the first Bianchi identity, R(X, Y ), [K, K](X, Y ) are skew-symmetric relative to g ∀X, Y π – vector plane in Tx M, X, Y – orthonormal basis of π sectional curvature for g – ˆk(π) := g(ˆR(X, Y )Y , X) sectional K-curvature – k(π) := g([K, K](X, Y )Y , X) sectional -curvature – k (π) := g(R(X, Y )Y , X) Barbara Opozda () Curvatures of statistical structures Paris, October 2015 16 / 29 In general, Schur’s lemma does not hold for k and k. We have, however, Lemma Assume that M is connected, dim M > 2 and the sectional - curvature (the sectional K-curvature) is point-wise constant. If one of the equivalent conditions in Lemma * holds then the sectional -curvature (the sectional K-curvature) is constant on M. sectional K-curvature The easiest situation which should be taken into account is when the sectional K-curvature is constant for all vector planes in Tx M. In this respect we have Barbara Opozda () Curvatures of statistical structures Paris, October 2015 17 / 29 Theorem If the sectional K-curvature is constant and equal to A for all vector planes in Tx M then there is an orthonormal basis e1, ..., en of Tx M and numbers λ1, ..., λn, µ1, ..., µn−1 such that Ke1 =       λ1 µ1 ... µ1       Kei =              µ1 ... µi−1 µ1 · · · µi−1 λi µi ... µi              Ken =       µ1 ... µn−1 µ1 · · · µn−1 λn       Barbara Opozda () Curvatures of statistical structures Paris, October 2015 18 / 29 continuation of the theorem Moreover µi = λi − λ2 i − 4Ai−1 2 , Ai = Ai−1 − µ2 i , for i = 1, ..., n − 1 where A0 = A. The above representation of K is not unique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µi for i = 1, ..., n − 1 are expressed as follows λi = (n − i) −Ai−1 n − i + 1 , µi = − −Ai−1 n − i + 1 . In particular, in the last case the numbers λi , µi depend only on A and the dimension of M. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 19 / 29 Example 1. Ke1 =       λ λ/2 ... λ/2       Kei =              λ/2 ... 0 λ/2 · · · 0 0 0 ... 0              Ken =       λ/2 ... 0 λ/2 · · · 0 0       The sectional K-curvature is constant = λ2/4 Barbara Opozda () Curvatures of statistical structures Paris, October 2015 20 / 29 Example 2. K-curvature vanishes, i.e. [K, K] = 0. There is an orthonormal frame e1, ..., e1 such that Ke1 =       λ1 0 ... 0       Kei =              0 ... 0 0 · · · 0 λi 0 ... 0              Ken =       0 ... 0 0 · · · 0 λn       Barbara Opozda () Curvatures of statistical structures Paris, October 2015 21 / 29 Some theorems on the sectional K-curvature (g, K) – trace-free if E = tr g K = 0 Theorem Let (g, K) be a trace-free statistical structure on M with symmetric ˆ K. If the sectional K-curvature is constant then either K = 0 (the statistical structure is trivial) or ˆR = 0 and ˆ K = 0. Theorem Let ˆ K = 0. Each of the following conditions implies that ˆR = 0: 1) the sectional K-curvature is negative, 2) [K,K]=0 and K is non-degenerate, i.e. X → KX is a monomorphism. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 22 / 29 Theorem K is as in Example 1. at each point of M, ˆ K is symmetric, div E is constant on M (E = tr g K). Then the sectional curvature for g by any plane containing E is non-positive. Moreover, if M is connected it is constant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g) by any plane containing E vanishes. Theorem If the sectional K-curvature is non-positive on M and [K, K] · K = 0 then the sectional K-curvature vanishes on M. Corollary If (g, K) is a Hessian structure on M with non-negative sectional curvature of g and such that ˆR · K = 0 then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 23 / 29 Theorem The sectional K-curvature is negative on M, ˆR · K = 0. Then ˆR = 0. Theorem Let M be a Lagrangian submanifold of N, where N is a Kaehler manifold of constant holomorphic curvature 4c, the sectional curvature of the first fundamental form g on M is smaller than c on M and ˆR · K = 0, where K is the second fundamental tensor of M ⊂ N. Then ˆR = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 24 / 29 -sectional curvature All affine spheres are statistical manifolds of constant sectional -curvature A Riemann curvature-like tensor defines the curvature operator. For instance, for the curvature tensor R = (R + R)/2 we have the curvature operator R : Λ2TM → Λ2TM given by g(R(X ∧ Y ), Z ∧ W ) = g(R(Z, W )Y , X) A curvature operator is symmetric relative to the canonical extension of g to the bundle Λ2TM. Hence it is diagonalizable. In particular, it can be positive definite, negative definite etc. The assumption that R is positive definite is stronger than the assumption that the sectional -curvature is positive. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 25 / 29 Theorem Let M be a connected compact oriented manifold and (g, ) be a trace-free statistical structure on M. If R = R and the curvature operator determined by the curvature tensor ˆR is positive definite on M then the sectional -curvature is constant. Theorem Let M be a connected compact oriented manifold and (g, ) be a trace-free statistical structure on M. If the curvature operator for R = R+R 2 is positive on M then the Betti numbers b1(M) = ... = bn−1(M) = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 26 / 29 sectional curvature for g ˆk(π) = g(ˆR(X, Y )Y , X), X, Y – an orthonormal basis for π Theorem Let M be a compact manifold equipped with a trace-free statistical structure (g, ) such that R = R. If the sectional curvature ˆk for g is positive on M then the structure is trivial, that is = ˆ . In the 2-dimensional case we have Theorem Let M be a compact surface equipped with a trace-free statistical structure (g, ). If M is of genus 0 and R = R then the structure is trivial. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 27 / 29 B. Opozda, Bochner’s technique for statistical manifolds, Annals of Global Analysis and Geometry, DOI 10.1007/s10455-015-9475-z B. Opozda, A sectional curvature for statistical structures, arXiv:1504.01279[math.DG] Barbara Opozda () Curvatures of statistical structures Paris, October 2015 28 / 29 Hessian structures (g, ) – Hessian if R = 0. Then R = 0 and ˆR = −[K, K]. (g, ) is Hessian if and only if ˆ K is symmetric and ˆR = −[K, K]. All Hessian structure are locally realizable on affine hypersurfaces in Rn+1 equipped with Calabi’s structure. If they are trace-free they are locally realizable on improper affine spheres. If the difference tensor is as in Example 1. and the structure is Hessian then K = 0. Barbara Opozda () Curvatures of statistical structures Paris, October 2015 29 / 29

Hideyuki Ishi

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14281
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_28
Authors = Hideyuki Ishi
Keywords = Hessian metric, Homogeneous cone, Left-symmetric algebra
Abstract
Based on the theory of compact normal left-symmetric algebra (clan), we realize every homogeneous cone as a set of positive definite real symmetric matrices, where homogeneous Hessian metrics as well as a transitive group action on the cone are described efficiently.


Voir la vidéo
Matrix realization of a homogeneous cone

Michel Boyom, Jamali Mohammed, Shahid Hasan

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
OAI = oai:www.see.asso.fr:GSI2015:14282
DOI = http://dx.doi.org/10.1007/978-3-319-25040-3_29
Authors = Jamali Mohammed, Michel Boyom, Shahid Hasan
Keywords =
Abstract
In this article, we derive an inequality satisfied by the squared norm of the imbedding curvature tensor of Multiply CR-warped product statistical submanifolds N of holomorphic statistical space forms M. Furthermore, we prove that under certain geometric conditions, N and M become Einstein.


Voir la vidéo
Multiply CR-Warped Product Statistical Submanifolds of a Holomorphic Stastistical Space Form

News

Information SEE 120 participants already!
Communiqué de presse GSI2015 registration is open!

Venue

Ecole Polytechnique, Paris-Saclay (France)

École Polytechnique
Route de Saclay
91128 Palaiseau
France