GSI2015

About

LIX Colloquium 2015 conferences

As for GSI’13, the objective of this SEE Conference GSI’15, hosted by Ecole Polytechnique, is to bring together pure/applied mathematicians and engineers, with common interest for Geometric tools and their applications for Information analysis.
It emphasizes an active participation of young researchers to discuss emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.
Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for industry.
The Conference will be therefore held in areas of priority/focused themes and topics of mutual interest with the aim to:
  • Provide an overview on the most recent state-of-the-art
  • Exchange mathematical information/knowledge/expertise in the area
  • Identify research areas/applications for future collaboration
  • Identify academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will unify skills from Geometry, Probability and Information Theory. The conference proceedings are published in Springer's Lecture Note in Computer Science (LNCS) series. 

Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI

Provisional Topics of Special Sessions:

  • Manifold/Topology Learning
  • Riemannian Geometry in Manifold Learning
  • Optimal Transport theory and applications in Imagery/Statistics
  • Shape Space & Diffeomorphic mappings
  • Geometry of distributed optimization
  • Random Geometry/Homology
  • Hessian Information Geometry
  • Topology and Information
  • Information Geometry Optimization
  • Divergence Geometry
  • Optimization on Manifold
  • Lie Groups and Geometric Mechanics/Thermodynamics
  • Quantum Information Geometry
  • Infinite Dimensional Shape spaces
  • Geometry on Graphs
  • Bayesian and Information geometry for inverse problems
  • Geometry of Time Series and Linear Dynamical Systems
  • Geometric structure of Audio Processing  
  • Lie groups in Structural Biology
  • Computational Information Geometry

Committees

Secrétaire

Webmestre

Program chairs

Scientific committee

Sponsors and Organizers

Documents

XLS

Opening Session (chaired by Frédéric Barbaresco)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Voir la vidéo
SEE-GSI'15 Opening session

Geometric Science of Information SEE/SMAI GSI’15 Conference LIX Colloquium 2015 Frédéric BARBARESCO* & Frank Nielsen** GSI’15 General Chairmen (*) President of SEE ISIC Club (Ingéniérie des Systèmes d’Information de Communications) (**) LIX Department, Ecole Polytechnique Société de l'électricité, de l'électronique et des technologies de l'information et de la communication Flash-back GSI’13 Ecole des Mines de Paris Hirohiko Shima Jean-Louis Koszul Shin-Ichi Amari SEE at a glance • Meeting place for science, industry and society • An officialy recognised non-profit organisation • About 2000 members and 5000 individuals involved • Large participation from industry (~50%) • 19 «Clubs techniques» and 12 «Groupes régionaux» • Organizes conferences and seminars • Initiates/attracts International Conferences in France • Institutional French member of IFAC and IFIP • Awards (Glavieux/Brillouin Prize, Général Ferrié Prize, Néel Prize, Jerphagnon Prize, Blanc-Lapierre Prize,Thévenin Prize), grades and medals (Blondel, Ampère) • Publishes 3 periodical publications (REE, …) & 3 monographs each year • Web: http://www.see.asso.fr and LinkedIn SEE group • SEE Presidents: Louis de Broglie, Paul Langevin, … 1883-2015: From SIE & SFE to SEE: 132 years of Sciences Société de l'électricité, de l'électronique et des technologies de l'information et de la communication 1881 Exposition Internationale d’Electricité 1883: SIE Société Internationale des Electriciens 1886: SFE Société Française des Electriciens 2013: SEE 17 rue de l'Amiral Hamelin 75783 Paris Cedex 16 Louis de Broglie Paul Langevin GSI’15 Sponsors GSI Logo: Adelard of Bath • He left England toward the end of the 11th century for Tours in France • Adelard taught for a time at Laon, leaving Laon for travel no later than 1109. • After Laon, he travelled to Southern Italy and Sicily no later than 1116. • Adelard also travelled extensively throughout the "lands of the Crusades": Greece, West Asia, Sicily, Spain, and potentially Palestine. The frontispiece of an Adelard of Bath Latin translation of Euclid's Elements, c. 1309– 1316; the oldest surviving Latin translation of the Elements is a 12th-century translation by Adelard from an Arabic version Adelard of Bath was the first to translate Euclid’s Elements in Latin Adelard of Bath has introduced the word « Algorismus » in Latin after his translation of Al Khuwarizmi SMAI/SEE GSI’15 • More than 150 attendees from 15 different countries • 85 scientific presentations on 3 days • 3 keynote speakers • Mathilde MARCOLLI (CallTech): “From Geometry and Physics to Computational Linguistics” • Tudor RATIU (EPFL): “Symmetry methods in geometric mechanics” • Marc ARNAUDON (Bordeaux University): “Stochastic Euler-Poincaré reduction” • 1 Short Course • Chaired by Roger BALIAN • Dominique SPEHNER (Grenoble University): “Geometry on the set of quantum states and quantum correlations” • 1 Guest speaker • Charles-Michel MARLE (UPMC): “Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems” • Social events: • Welcome cocktail at Ecole Polytechnique • Diner in Versailles Palace Gardens GSI’15 Topics • GSI’15 federates skills from Geometry, Probability and Information Theory: • Dimension reduction on Riemannian manifolds • Optimal Transport and applications in Imagery/Statistics • Shape Space & Diffeomorphic mappings • Random Geometry/Homology • Hessian Information Geometry • Topological forms and Information • Information Geometry Optimization • Information Geometry in Image Analysis • Divergence Geometry • Optimization on Manifold • Lie Groups and Geometric Mechanics/Thermodynamics • Computational Information Geometry • Lie Groups: Novel Statistical and Computational Frontiers • Geometry of Time Series and Linear Dynamical systems • Bayesian and Information Geometry for Inverse Problems • Probability Density Estimation GSI’15 Program GSI’15 Proceedings • Publication by SPRINGER in « Lecture Notes in Computer Science » LNCS vol. 9389 (800 pages), ISBN 978-3-319-25039-7 • http://www.springer.com/us/book/9783319250397 GSI’15 Special Issue • Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI • http://www.mdpi.com/journal/entropy/special_issues/entropy-statistics • A book could be edited by MDPI: e.g. Ecole Polytechnique • Special thanks to « LIX » Department A product of the French Revolution and the Age of Enlightenment, École Polytechnique has a rich history that spans over 220 years. https://www.polytechnique.edu/en/history Henri Poincaré – X1873 Paris-Saclay University in Top 8 World Innovation Hubs http://www.technologyreview.com/news/517626/ infographic-the-worlds-technology-hubs/ A new Grammar of Information “Mathematics is the art of giving the same name to different things” – Henri Poincaré GROUP EVERYWHERE Elie Cartan Henri Poincaré METRIC EVERYWHERE Maurice Fréchet Misha Gromov “the problems addressed by Elie Cartan are among the most important, most abstract and most general dealing with mathematics; group theory is, so to speak, the whole mathematics, stripped of its material and reduced to pure form. This extreme level of abstraction has probably made my presentation a little dry; to assess each of the results, I would have had virtually render him the material which he had been stripped; but this refund can be made in a thousand different ways; and this is the only form that can be found as well as a host of various garments, which is the common link between mathematical theories that are often surprised to find so near” H. Poincaré Elie Cartan: Group Everywhere (Henri Poincaré review of Cartan’s Works) Maurice Fréchet: Metric Everywhere • Maurice Fréchet made major contributions to the topology of point sets and introduced the entire concept of metric spaces. • His dissertation opened the entire field of functionals on metric spaces and introduced the notion of compactness. • He has extended Probability in Metric space 1948 (Annales de l’IHP) Les éléments aléatoires de nature quelconque dans un espace distancié Extension of Probability/Statistic in abstract/Metric space GSI’15 & Geometric Mechanics • The master of geometry during the last century, Elie Cartan, was the son of Joseph Cartan who was the village blacksmith. • Elie recalled that his childhood had passed under “blows of the anvil, which started every morning from dawn”. • We can imagine easily that the child, Elie Cartan, watching his father Joseph “coding curvature” on metal between the hammer and the anvil, insidiously influencing Elie’s mind with germinal intuition of fundamental geometric concepts. • The etymology of the word “Forge”, that comes from the late XIV century, “a smithy”, from Old French forge “forge, smithy” (XII century), earlier faverge, from Latin fabrica “workshop, smith’s shop”, from faber (genitive fabri) “workman in hard materials, smith”. HAMMER = The CoderANVIL = Curvature Libraries Bigorne Bicorne Venus at the Forge of Vulcan, Le Nain Brothers, Musée Saint-Denis, Reims From Homo Sapiens to Homo Faber “Intelligence is the faculty of manufacturing artificial objects, especially tools to make tools, and of indefinitely varying the manufacture.” Henri Bergson Into the Flaming Forge of Vulcan, Diego Velázquez, Museo Nacional del Prado Geometric Thermodynamics & Statistical Physics Enjoy all « Geometries » (Dinner at Versailles Palace Gardens) Restaurant of GSI’15 Gala Dinner André Le Nôtre Landscape Geometer of Versailles the Apex of “Le Jardin à la française” Louis XIV Patron of Science The Royal Academy of Sciences was established in 1666 On 1st September 1715, 300 years ago, Louis XIV passed away at the age of 77, having reigned for 72 years Keynote Speakers Prof. Mathilde MARCOLLI (CALTECH, USA) From Geometry and Physics to Computational Linguistics Abstact: I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic structure and language evolution, within the context of Chomsky's Principles and Parameters framework. Biography: • Laurea in Physics, University of Milano, 1993 • Master of Science, Mathematics, University of Chicago, 1994 • PhD, Mathematics, University of Chicago, 1997 • Moore Instructor, Massachusetts Institute of Technology, 1997-2000 • Associate Professor (C3), Max Planck Institute for Mathematics, 2000-2008 • Professor, California Institute of Technology, 2008-present • Distinguished Visiting Research Chair, Perimeter Institute for Theoretical Physics, 2013-present . Talk chaired by Daniel Bennequin Keynote Speakers Prof. Marc ARNAUDON (Bordeaux University, France) Stochastic Euler-Poincaré reduction Abstact: We will prove a Euler-Poincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the Navier-Stokes equation on a bounded domain and the Camassa-Holm equation. Biography: Marc Arnaudon was born in France in 1965. He graduated from Ecole Normale Supérieure de Paris, France, in 1991. He received the PhD degree in mathematics and the Habilitation à diriger des Recherches degree from Strasbourg University, France, in January 1994 and January 1998 respectively. After postdoctoral research and teaching at Strasbourg, he began in September 1999 a full professor position in the Department of Mathematics at Poitiers University, France, where he was the head of the Probability Research Group. In January 2013 he left Poitiers and joined the Department of Mathematics of Bordeaux University, France, where he is a full professor in mathematics. Talk chaired by Frank Nielsen Keynote Speakers Prof. Tudor RATIU (EPFL, Switzerland) Symmetry methods in geometric mechanics Abstact: The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. Biography: • BA in Mathematics, University of Timisoara, Romania, 1973 • MA in Applied Mathematics, University of Timisoara, Romania, 1974 • Ph.D. in Mathematics, University of California, Berkeley, 1980 • T.H. Hildebrandt Research Assistant Professor, University of Michigan, Ann Arbor, USA 1980-1983 • Associate Professor of Mathematics, University of Arizona, Tuscon, USA 1983- 1988 • Professor of Mathematics, University of California, Santa Cruz, USA, 1988-2001 • Chaired Professor of Mathematics, Ecole Polytechnique Federale de Lausanne, Switzerland, 1998 - present • Professor of Mathematics, Skolkovo Institute of Science and Technonology, Moscow, Russia, 2014 - present Talk chaired by Xavier Pennec Short Course Prof. Dominique SPEHNER (Grenoble University) Geometry on the set of quantum states and quantum correlations Abstact: I will show that the set of states of a quantum system with a finite- dimensional Hilbert space can be equipped with various Riemannian distances having nice properties from a quantum information viewpoint, namely they are contractive under all physically allowed operations on the system. The corresponding metrics are quantum analogs of the Fisher metric and have been classified by D. Petz. Two distances are particularly relevant physically: the Bogoliubov-Kubo-Mori distance studied by R. Balian, Y. Alhassid and H. Reinhardt, and the Bures distance studied by A. Uhlmann and by S.L. Braunstein and C.M. Caves. The latter gives the quantum Fisher information playing an important role in quantum metrology. A way to measure the amount of quantum correlations (entanglement or quantum discord) in bipartite systems (that is, systems composed of two parties) with the help of these distances will be also discussed. Biography: • Diplôme d'Études Approfondies (DEA) in Theoretical Physics at the École Normale Supérieure de Lyon, 1994 • Civil Service (Service National de la Coopération), Technion Institute of Technology, Haifa, Israel, 1995-1996 • PhD in Theoretical Physics, Université Paul Sabatier, Toulouse, France, 1996- 2000. • Postdoctoral fellow, Pontificia Universidad Católica, Santiago, Chile, 2000-2001 • Research Associate, University of Duisburg-Essen, Germany, 2001-2005 • Maître de Conférences, Université Joseph Fourier, Grenoble, France, 2005-present • Habilitation à diriger des Recherches (HDR), Université Grenoble Alpes, 2015 • Member of the Institut Fourier (since 2005) and the Laboratoire de Physique et Modélisation des Milieux Condensés (since 2013) of the university Grenoble Alpes, France Talk chaired by Roger Balian Guest Speakers Prof. Charles-Michel MARLE (UPMC, France) Actions of Lie groups and Lie algebras on symplectic and Poisson manifolds. Application to Hamiltonian systems Abstact: I will present some tools in Symplectic and Poisson Geometry in view of their applications in Geometric Mechanics and Mathematical Physics. Lie group and Lie algebra actions on symplectic and Poisson manifolds, momentum maps and their equivariance properties, first integrals associated to symmetries of Hamiltonian systems will be discussed. Reduction methods taking advantage of symmetries will be discussed. Biography: Charles-Michel Marle was born in 1934; He studied at Ecole Polytechnique (1953-1955), Ecole Nationale Supérieure des Mines de Paris (1957-1958) and Ecole Nationale Supérieure du Pétrole et des Moteurs (1957-1958). He obtained a doctor's degree in Mathematics at the University of Paris in 1968. From 1959 to 1969 he worked as a research engineer at the Institut Français du Pétrole. He joined the Université de Besançon as Associate Professor in 1969, and the Université Pierre et Marie Curie, first as Associate Professor (1975) and then as full Professor (1981). His resarch works were first about fluid flows through porous media, then about Differential Geometry, Hamiltonian systems and applications in Mechanics and Mathematical Physics. Talk chaired by Frédéric Barbaresco

Keynote speach Matilde Marcolli (chaired by Daniel Bennequin)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic 
 
From Geometry and Physics to Computational Linguistics

From Geometry and Physics to Computational Linguistics Matilde Marcolli Geometric Science of Information, Paris, October 2015 Matilde Marcolli Geometry, Physics, Linguistics A Mathematical Physicist’s adventures in Linguistics Based on: 1 Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 2 Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 3 Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 4 Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you Matilde Marcolli Geometry, Physics, Linguistics What is Linguistics? • Linguistics is the scientific study of language - What is Language? (langage, lenguaje, ...) - What is a Language? (lange, lengua,...) Similar to ‘What is Life?’ or ‘What is an organism?’ in biology • natural language as opposed to artificial (formal, programming, ...) languages • The point of view we will focus on: Language is a kind of Structure - It can be approached mathematically and computationally, like many other kinds of structures - The main purpose of mathematics is the understanding of structures Matilde Marcolli Geometry, Physics, Linguistics • How are di↵erent languages related? What does it mean that they come in families? • How do languages evolve in time? Phylogenetics, Historical Linguistics, Etymology • How does the process of language acquisition work? (Neuroscience) • Semiotic viewpoint (mathematical theory of communication) • Discrete versus Continuum (probabilistic methods, versus discrete structures) • Descriptive or Predictive? to be predictive, a science needs good mathematical models Matilde Marcolli Geometry, Physics, Linguistics A language exists at many di↵erent levels of structure An Analogy: Physics looks very di↵erent at di↵erent scales: General Relativity and Cosmology ( 1010 m) Classical Physics (⇠ 1 m) Quantum Physics ( 10 10 m) Quantum Gravity (10 35 m) Despite dreams of a Unified Theory, we deal with di↵erent mathematical models for di↵erent levels of structure Matilde Marcolli Geometry, Physics, Linguistics Similarly, we view language at di↵erent “scales”: units of sound (phonology) words (morphology) sentences (syntax) global meaning (semantics) We expect to be dealing with di↵erent mathematical structures and di↵erent models at these various di↵erent levels Main level I will focus on: Syntax Matilde Marcolli Geometry, Physics, Linguistics Linguistics view of syntax kind of looks like this... Alexander Calder, Mobile, 1960 Matilde Marcolli Geometry, Physics, Linguistics Modern Syntactic Theory: • grammaticality: judgement on whether a sentence is well formed (grammatical) in a given language, i-language gives people the capacity to decide on grammaticality • generative grammar: produce a set of rules that correctly predict grammaticality of sentences • universal grammar: ability to learn grammar is built in the human brain, e.g. properties like distinction between nouns and verbs are universal ... is universal grammar a falsifiable theory? Matilde Marcolli Geometry, Physics, Linguistics Principles and Parameters (Government and Binding) (Chomsky, 1981) • principles: general rules of grammar • parameters: binary variables (on/o↵ switches) that distinguish languages in terms of syntactic structures • Example of parameter: head-directionality (head-initial versus head-final) English is head-initial, Japanese is head-final VP= verb phrase, TP= tense phrase, DP= determiner phrase Matilde Marcolli Geometry, Physics, Linguistics ...but not always so clear-cut: German can use both structures auf seine Kinder stolze Vater (head-final) or er ist stolz auf seine Kinder (head-initial) AP= adjective phrase, PP= prepositional phrase • Corpora based statistical analysis of head-directionality (Haitao Liu, 2010): a continuum between head-initial and head-final Matilde Marcolli Geometry, Physics, Linguistics Examples of Parameters Head-directionality Subject-side Pro-drop Null-subject Problems • Interdependencies between parameters • Diachronic changes of parameters in language evolution Matilde Marcolli Geometry, Physics, Linguistics Dependent parameters • null-subject parameter: can drop subject Example: among Latin languages, Italian and Spanish have null-subject (+), French does not (-) it rains, piove, llueve, il pleut • pro-drop parameter: can drop pronouns in sentences • Pro-drop controls Null-subject How many independent parameters? Geometry of the space of syntactic parameters? Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntax • Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 Databases of Syntactic Parameters of World Languages: 1 Syntactic Structures of World Languages (SSWL) http://sswl.railsplayground.net/ 2 TerraLing http://www.terraling.com/ 3 World Atlas of Language Structures (WALS) http://wals.info/ Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Data Sets how data cluster around topological shapes at di↵erent scales Matilde Marcolli Geometry, Physics, Linguistics Vietoris–Rips complexes • set X = {x↵} of points in Euclidean space EN, distance d(x, y) = kx yk = ( PN j=1(xj yj )2)1/2 • Vietoris-Rips complex R(X, ✏) of scale ✏ over field K: Rn(X, ✏) is K-vector space spanned by all unordered (n + 1)-tuples of points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distances d(x↵i , x↵j )  ✏ Matilde Marcolli Geometry, Physics, Linguistics • inclusion maps R(X, ✏1) ,! R(X, ✏2) for ✏1 < ✏2 induce maps in homology by functoriality Hn(X, ✏1) ! Hn(X, ✏2) barcode diagrams: births and deaths of persistent generators Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntactic Parameters • Data: 252 languages from SSWL with 115 parameters • if consider all world languages together too much noise in the persistent topology: subdivide by language families • Principal Component Analysis: reduce dimensionality of data • compute Vietoris–Rips complex and barcode diagrams Persistent H0: clustering of data in components – language subfamilies Persistent H1: clustering of data along closed curves (circles) – linguistic meaning? Matilde Marcolli Geometry, Physics, Linguistics Sources of Persistent H1 • “Hopf bifurcation” type phenomenon • two di↵erent branches of a tree closing up in a loop two di↵erent types of phenomena of historical linguistic development within a language family Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Indo-European Languages • Two persistent generators of H0 (Indo-Iranian, European) • One persistent generator of H1 Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Niger–Congo Languages • Three persistent components of H0 (Mande, Atlantic-Congo, Kordofanian) • No persistent H1 Matilde Marcolli Geometry, Physics, Linguistics The origin of persistent H1 of Indo-European Languages? Naive guess: the Anglo-Norman bridge ... but lexical not syntactic Matilde Marcolli Geometry, Physics, Linguistics Answer: No, it is not the Anglo-Norman bridge! Persistent topology of the Germanic+Latin languages Matilde Marcolli Geometry, Physics, Linguistics Answer: It’s all because of Ancient Greek! Persistent topology with Hellenic (and Indo-Iranic) branch removed Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters as Dynamical Variables • Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSV Very uneven distribution across world languages Matilde Marcolli Geometry, Physics, Linguistics • Word order distribution: a neuroscience explanation? - D. Kemmerer, The cross-linguistic prevalence of SOV and SVO word orders reflects the sequential and hierarchical representation of action in Broca’s area, Language and Linguistics Compass, 6 (2012) N.1, 50–66. • Internal reasons for diachronic switch? - F.Antinucci, A.Duranti, L.Gebert, Relative clause structure, relative clause perception, and the change from SOV to SVO, Cognition, Vol.7 (1979) N.2 145–176. Matilde Marcolli Geometry, Physics, Linguistics Changes over time in Word Order • Ancient Greek: switched from Homeric to Classical - A. Taylor, The change from SOV to SVO in Ancient Greek, Language Variation and Change, 6 (1994) 1–37 • Sanskrit: di↵erent word orders allowed, but prevalent one in Vedic Sanskrit is SOV (switched at least twice by influence of Dravidian languages) - F.J. Staal, Word Order in Sanskrit and Universal Grammar, Springer, 1967 • English: switched from Old English (transitional between SOV and SVO) to Middle English (SVO) - J. McLaughlin, Old English Syntax: a handbook, Walter de Gruyter, 1983. Syntactic Parameters are Dynamical in Language Evolution Matilde Marcolli Geometry, Physics, Linguistics Spin Glass Models of Syntax • Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 – focus on linguistic change caused by language interactions – think of syntactic parameters as spin variables – spin interaction tends to align (ferromagnet) – strength of interaction proportional to bilingualism (MediaLab) – role of temperature parameter: probabilistic interpretation of parameters – not all parameters are independent: entailment relations – Metropolis–Hastings algorithm: simulate evolution Matilde Marcolli Geometry, Physics, Linguistics The Ising Model of spin systems on a graph G • configurations of spins s : V (G) ! {±1} • magnetic field B and correlation strength J: Hamiltonian H(s) = J X e2E(G):@(e)={v,v0} sv sv0 B X v2V (G) sv • first term measures degree of alignment of nearby spins • second term measures alignment of spins with direction of magnetic field Matilde Marcolli Geometry, Physics, Linguistics Equilibrium Probability Distribution • Partition Function ZG ( ) ZG ( ) = X s:V (G)!{±1} exp( H(s)) • Probability distribution on the configuration space: Gibbs measure PG, (s) = e H(s) ZG ( ) • low energy states weight most • at low temperature (large ): ground state dominates; at higher temperature ( small) higher energy states also contribute Matilde Marcolli Geometry, Physics, Linguistics Average Spin Magnetization MG ( ) = 1 #V (G) X s:V (G)!{±1} X v2V (G) sv P(s) • Free energy FG ( , B) = log ZG ( , B) MG ( ) = 1 #V (G) 1 ✓ @FG ( , B) @B ◆ |B=0 Ising Model on a 2-dimensional lattice • 9 critical temperature T = Tc where phase transition occurs • for T > Tc equilibrium state has m(T) = 0 (computed with respect to the equilibrium Gibbs measure PG, • demagnetization: on average as many up as down spins • for T < Tc have m(T) > 0: spontaneous magnetization Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters and Ising/Potts Models • characterize set of n = 2N languages Li by binary strings of N syntactic parameters (Ising model) • or by ternary strings (Potts model) if take values ±1 for parameters that are set and 0 for parameters that are not defined in a certain language • a system of n interacting languages = graph G with n = #V (G) • languages Li = vertices of the graph (e.g. language that occupies a certain geographic area) • languages that have interaction with each other = edges E(G) (geographical proximity, or high volume of exchange for other reasons) Matilde Marcolli Geometry, Physics, Linguistics graph of language interaction (detail) from Global Language Network of MIT MediaLab, with interaction strengths Je on edges based on number of book translations (or Wikipedia edits) Matilde Marcolli Geometry, Physics, Linguistics • if only one syntactic parameter, would have an Ising model on the graph G: configurations s : V (G) ! {±1} set the parameter at all the locations on the graph • variable interaction energies along edges (some pairs of languages interact more than others) • magnetic field B and correlation strength J: Hamiltonian H(s) = X e2E(G):@(e)={v,v0} NX i=1 Je sv,i sv0,i • if N parameters, configurations s = (s1, . . . , sN) : V (G) ! {±1}N • if all N parameters are independent, then it would be like having N non-interacting copies of a Ising model on the same graph G (or N independent choices of an initial state in an Ising model on G) Matilde Marcolli Geometry, Physics, Linguistics Metropolis–Hastings • detailed balance condition P(s)P(s ! s0) = P(s0)P(s0 ! s) for probabilities of transitioning between states (Markov process) • transition probabilities P(s ! s0) = ⇡A(s ! s0) · ⇡(s ! s0) with ⇡(s ! s0) conditional probability of proposing state s0 given state s and ⇡A(s ! s0) conditional probability of accepting it • Metropolis–Hastings choice of acceptance distribution (Gibbs) ⇡A(s ! s0 ) = ⇢ 1 if H(s0) H(s)  0 exp( (H(s0) H(s))) if H(s0) H(s) > 0. satisfying detailed balance • selection probabilities ⇡(s ! s0) single-spin-flip dynamics • ergodicity of Markov process ) unique stationary distribution Matilde Marcolli Geometry, Physics, Linguistics Example: Single parameter dynamics Subject-Verb parameter Initial configuration: most languages in SSWL have +1 for Subject-Verb; use interaction energies from MediaLab data Matilde Marcolli Geometry, Physics, Linguistics Equilibrium: low temperature all aligned to +1; high temperature: Temperature: fluctuations in bilingual users between di↵erent structures (“code-switching” in Linguistics) Matilde Marcolli Geometry, Physics, Linguistics Entailment relations among parameters • Example: {p1, p2} = {Strong Deixis, Strong Anaphoricity} p1 p2 `1 +1 +1 `2 1 0 `3 +1 +1 `4 +1 1 {`1, `2, `3, `4} = {English, Welsh, Russian, Bulgarian} Matilde Marcolli Geometry, Physics, Linguistics Modeling Entailment • variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} and Y`,p2 = |S`,p2 | 2 {0, 1} • Hamiltonian H = HE + HV HE = Hp1 + Hp2 = X `,`02languages J``0 ⇣ S`,p1 ,S`0,p1 + S`,p2 ,S`0,p2 ⌘ HV = X ` HV ,` = X ` J` X`,p1 ,Y`,p2 J` > 0 anti-ferromagnetic • two parameters: temperature as before and coupling energy of entailment • if freeze p1 and evolution for p2: Potts model with external magnetic field Matilde Marcolli Geometry, Physics, Linguistics Acceptance probabilities ⇡A(s ! s ± 1 (mod 3)) = ⇢ 1 if H  0 exp( H) if H > 0. H := min{H(s + 1 (mod 3)), H(s 1 (mod 3))} H(s) Equilibrium configuration (p1, p2) HT/HE HT/LE LT/HE LT/LE `1 (+1, 0) (+1, 1) (+1, +1) (+1, 1) `2 (+1, 1) ( 1, 1) (+1, +1) (+1, 1) `3 ( 1, 0) ( 1, +1) (+1, +1) ( 1, 0) `4 (+1, +1) ( 1, 1) (+1, +1) ( 1, 0) Matilde Marcolli Geometry, Physics, Linguistics Average value of spin p1 left and p2 right in low entailment energy case Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters in Kanerva Networks • Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 – Address two issues: relative prevalence of di↵erent syntactic parameters and “degree of recoverability” (as sign of underlying relations between parameters) – If corrupt information about one parameter in data of group of languages can recover it from the data of the other parameters? – Answer: di↵erent parameters have di↵erent degrees of recoverability – Used 21 parameters and 165 languages from SSWL database Matilde Marcolli Geometry, Physics, Linguistics Kanerva networks (sparse distributed memories) • P. Kanerva, Sparse Distributed Memory, MIT Press, 1988. • field F2 = {0, 1}, vector space FN 2 large N • uniform random sample of 2k hard locations with 2k << 2N • median Hamming distance between hard locations • Hamming spheres of radius slightly larger than median value (access sphere) • writing to network: storing datum X 2 FN 2 , each hard location in access sphere of X gets i-th coordinate (initialized at zero) incremented depending on i-th entry ot X • reading at a location: i-th entry determined by majority rule of i-th entries of all stored data in hard locations within access sphere Kanerva networks are good at reconstructing corrupted data Matilde Marcolli Geometry, Physics, Linguistics Procedure • 165 data points (languages) stored in a Kanerva Network in F21 2 (choice of 21 parameters) • corrupting one parameter at a time: analyze recoverability • language bit-string with a single corrupted bit used as read location and resulting bit string compared to original bit-string (Hamming distance) • resulting average Hamming distance used as score of recoverability (lowest = most easily recoverable parameter) Matilde Marcolli Geometry, Physics, Linguistics Parameters and frequencies 01 Subject-Verb (0.64957267) 02 Verb-Subject (0.31623933) 03 Verb-Object (0.61538464) 04 Object-Verb (0.32478634) 05 Subject-Verb-Object (0.56837606) 06 Subject-Object-Verb (0.30769232) 07 Verb-Subject-Object (0.1923077) 08 Verb-Object-Subject (0.15811966) 09 Object-Subject-Verb (0.12393162) 10 Object-Verb-Subject (0.10683761) 11 Adposition-Noun-Phrase (0.58974361) 12 Noun-Phrase-Adposition (0.2905983) 13 Adjective-Noun (0.41025642) 14 Noun-Adjective (0.52564102) 15 Numeral-Noun (0.48290598) 16 Noun-Numeral (0.38034189) 17 Demonstrative-Noun (0.47435898) 18 Noun-Demonstrative (0.38461539) 19 Possessor-Noun (0.38034189) 20 Noun-Possessor (0.49145299) A01 Attributive-Adjective-Agreement (0.46581197) Matilde Marcolli Geometry, Physics, Linguistics Matilde Marcolli Geometry, Physics, Linguistics Overall e↵ect related to relative prevalence of a parameter Matilde Marcolli Geometry, Physics, Linguistics More refined e↵ect after normalizing for prelavence (syntactic dependencies) Matilde Marcolli Geometry, Physics, Linguistics • Overall e↵ect relating recoverability in a Kanerva Network to prevalence of a certain parameter among languages (depends only on frequencies: see in random data with assigned frequencies) • Additional e↵ects (that deviate from random case) which detect possible dependencies among syntactic parameters: increased recoverability beyond what e↵ect based on frequency • Possible neuroscience implications? Kanerva Networks as models of human memory (parameter prevalence linked to neuroscience models) • More refined data if divided by language families? Matilde Marcolli Geometry, Physics, Linguistics Phylogenetic Linguistics (WORK IN PROGRESS) • Constructing family trees for languages (sometimes possibly graphs with loops) • Main information about subgrouping: shared innovation a specific change with respect to other languages in the family that only happens in a certain subset of languages - Example: among Mayan languages: Huastecan branch characterized by initial w becoming voiceless before a vowel and ts becoming t, q becoming k, ... Quichean branch by velar nasal becoming velar fricative, ´c becoming ˇc (prepalatal a↵ricate to palato-alveolar)... Known result by traditional Historical Linguistics methods: Matilde Marcolli Geometry, Physics, Linguistics Mayan Language Tree Matilde Marcolli Geometry, Physics, Linguistics Computational Methods for Phylogenetic Linguistics • Peter Foster, Colin Renfrew, Phylogenetic methods and the prehistory of languages, McDonald Institute Monographs, 2006 • Several computational methods for constructing phylogenetic trees available from mathematical and computational biology • Phylogeny Programs http://evolution.genetics.washington.edu/phylip/software.html • Standardized lexical databases: Swadesh list (100 words, or 207 words) Matilde Marcolli Geometry, Physics, Linguistics • Use Swadesh lists of languages in a given family to look for cognates: - without additional etymological information (keep false positives) - with additional etymological information (remove false positives) • Two further choices about loan words: - remove loan words - keep loan words • Keeping loan words produces graphs that are not trees • Without loan words it should produce trees, but small loops still appear due to ambiguities (di↵erent possible trees matching same data) ... more precisely: coding of lexical data ... Matilde Marcolli Geometry, Physics, Linguistics Coding of lexical data • After compiling lists of cognate words for pairs of languages within a given family (with/without lexical information and loan words) • Produce a binary string S(L1, L2) = (s1, . . . , sN) for each pair of languages L1, L2, with entry 0 or 1 at the i-th word of the lexical list of N words if cognates for that meaning exist in the two languages or not (important to pay attention to synonyms) • lexical Hamming distance between two languages d(L1, L2) = #{i 2 {1, . . . , N} | si = 1} counts words in the list that do not have cognates in L1 and L2 Matilde Marcolli Geometry, Physics, Linguistics Distance-matrix method of phylogenetic inference • after producing a measure of “genetic distance” Hamming metric dH(La, Lb) • hierarchical data clustering: collecting objects in clusters according to their distance • simplest method of tree construction: neighbor joining (1) - create a (leaf) vertex for each index a (ranging over languages in given family) (2) - given distance matrix D = (Dab) distances between each pair Dab = dH(La, Lb) construct a new matrix Q-test Q = (Qab) with Qab = (n 2)Dab nX k=1 Dak nX k=1 Dbk this matrix Q decides first pairs of vertices to join Matilde Marcolli Geometry, Physics, Linguistics (3) - identify entries Qab with lowest values: join each such pair (a, b) of leaf vertices to a newly created vertex vab (4) - set distances to new vertex by d(a, vab) = 1 2 Dab + 1 2(n 2) nX k=1 Dak nX k=1 Dbk ! d(b, vab) = Dab d(a, vab) d(k, vab) = 1 2 (Dak + Dbk Dab) (5) - remove a and b and keep vab and all the remaining vertices and the new distances, compute new Q matrix and repeat until tree is completed Matilde Marcolli Geometry, Physics, Linguistics Neighborhood-Joining Method for Phylogenetic Inference Matilde Marcolli Geometry, Physics, Linguistics Example of a neighbor-joining lexical linguistic phylogenetic tree from Delmestri-Cristianini’s paper Matilde Marcolli Geometry, Physics, Linguistics N. Saitou, M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol. Vol.4 (1987) N. 4, 406-425. R. Mihaescu, D. Levy, L. Pachter, Why neighbor-joining works, arXiv:cs/0602041v3 A. Delmestri, N. Cristianini, Linguistic Phylogenetic Inference by PAM-like Matrices, Journal of Quantitative Linguistics, Vol.19 (2012) N.2, 95-120. F. Petroni, M. Serva, Language distance and tree reconstruction, J. Stat. Mech. (2008) P08012 Matilde Marcolli Geometry, Physics, Linguistics Syntactic Phylogenetic Trees (instead of lexical) • instead of coding lexical data based on cognate words, use binary variables of syntactic parameters • Hamming distance between binary string of parameter values • shown recently that one gets an accurate reconstruction of the phylogenetic tree of Indo-European languages from syntactic parameters only • G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, Towards a syntactic phylogeny of modern Indo-European languages, Journal of Historical Linguistics 3 (2013) N.1, 122–152. • G. Longobardi, C. Guardiano, Evidence for syntax as a signal of historical relatedness, Lingua 119 (2009) 1679–1706. Matilde Marcolli Geometry, Physics, Linguistics Work in Progress • Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you – Assembled a phylogenetic tree of world languages using the SSWL database of syntactic parameters – Ongoing comparison with specific historical linguistic reconstruction of phylogenetic trees – Comparison with Computational Linguistic reconstructions based on lexical data (Swadesh lists) and on phonetical analysis – not all linguistic families have syntactic parameters mapped with same level of completeness... di↵erent levels of accuracy in reconstruction Matilde Marcolli Geometry, Physics, Linguistics

Random Geometry/Homology (chaired by Laurent Decreusefond/Frédéric Chazal)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Let m be a random tessellation in R d , d ≥ 1, observed in the window W p = ρ1/d[0, 1] d , ρ > 0, and let f be a geometrical characteristic. We investigate the asymptotic behaviour of the maximum of f(C) over all cells C ∈ m with nucleus W p as ρ goes to infinity.When the normalized maximum converges, we show that its asymptotic distribution depends on the so-called extremal index. Two examples of extremal indices are provided for Poisson-Voronoi and Poisson-Delaunay tessellations.
 
The extremal index for a random tessellation

Random tessellations Main problem Extremal index The extremal index for a random tessellation Nicolas Chenavier Université Littoral Côte d’Opale October 28, 2015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Plan 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Random tessellations Definition A (convex) random tessellation m in Rd is a partition of the Euclidean space into random polytopes (called cells). We will only consider the particular case where m is a : Poisson-Voronoi tessellation ; Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Voronoi tessellation X, Poisson point process in Rd ; ∀x ∈ X, CX(x) := {y ∈ Rd , |y − x| ≤ |y − x |, x ∈ X} (Voronoi cell with nucleus x) ; mPVT := {CX(x), x ∈ X}, Poisson-Voronoi tessellation ; ∀CX(x) ∈ mPVT , we let z(CX(x)) := x. x CX(x) Mosaique de Poisson-Voronoi Figure: Poisson-Voronoi tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Poisson-Delaunay tessellation X, Poisson point process in Rd ; ∀x, x ∈ X, x and x define an edge if CX(x) ∩ CX(x ) = ∅ ; mPDT , Poisson-Delaunay tessellation ; ∀C ∈ mPDT , we let z(C) as the circumcenter of C. x x z(C) Mosaique de Poisson-Delaunay Figure: Poisson-Delaunay tessellation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Typical cell Definition Let m be a stationary random tessellation. The typical cell of m is a random polytope C in Rd which distribution given as follows : for each bounded translation-invariant function g : {polytopes} → R, we have E [g(C)] := 1 N(B) E     C∈m, z(C)∈B g(C)     , where : B ⊂ R is any Borel subset with finite and non-empty volume ; N(B) is the mean number of cells with nucleus in B. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Main problem Framework : m = mPVT , mPDT ; Wρ := [0, ρ]d , with ρ > 0 ; g : {polytopes} → R, geometrical characteristic. Aim : asymptotic behaviour, when ρ → ∞, of Mg,ρ = max C∈m, z(C)∈Wρ g(C)? Figure: Voronoi cell maximizing the area in the square. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Objective and applications Objective : find ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρt + bg,ρ converges, as ρ → ∞, for each t ∈ R. Applications : regularity of the tessellation ; discrimination of point processes and tessellations ; Poisson-Voronoi approximation. Approximation de Poisson-Voronoi Figure: Poisson-Voronoi approximation. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Asymptotics under a local correlation condition Notation : let vρ := ag,ρt + bρ be a threshold such that ρd · P (g(C) > vρ) −→ ρ→∞ τ, for some τ := τ(t) ≥ 0. Local Correlation Condition (LCC) ρd (log ρ)d · E      (C1,C2)=∈m2, z(C1),z(C2)∈[0,log ρ]d 1g(C1)>vρ,g(C2)>vρ      −→ ρ→∞ 0. Theorem Under (LCC), we have : P (Mg,ρ ≤ vρ) −→ ρ→∞ e−τ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index 1 Random tessellations 2 Main problem 3 Extremal index Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Definition of the extremal index Proposition Assume that for all τ ≥ 0, there exists a threshold v (τ) ρ depending on ρ such that ρd · P(g(C) > v (τ) ρ ) −→ ρ→∞ τ. Then there exists θ ∈ [0, 1] such that, for all τ ≥ 0, lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ , provided that the limit exists. Definition According to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if, for each τ ≥ 0, we have : ρd · P g(C) > v(τ) ρ −→ ρ→∞ τ and lim ρ→∞ P(Mg,ρ ≤ v(τ) ρ ) = e−θτ . Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 1 Framework : m := mPVT : Poisson-Voronoi tessellation ; g(C) := r(C) : inradius of any cell C := CX(x) with x ∈ X, i.e. r(C) := r (CX(x)) := max{r ∈ R+ : B(x, r) ⊂ CX(x)}. rmin,PVT (ρ) := minx∈X∩Wρ r (CX(x)). Extremal index : θ = 1/2 for each d ≥ 1. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Minimum of inradius for a Poisson-Voronoi tessellation (b) Typical Poisson−Voronoï cell with a small inradii x y −1.0 −0.5 0.0 0.5 1.0 −1.0−0.50.00.51.0 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Example 2 Framework : m := mPDT : Poisson-Delaunay tessellation ; g(C) := R(C) : circumradius of any cell C, i.e. R(C) := min{r ∈ R+ : B(x, r) ⊃ C}. Rmax,PDT (ρ) := maxC∈mPDT :z(C)∈Wρ R(C). Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3. Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Maximum of circumradius for a Poisson-Delaunay tessellation (d) Typical Poisson−Delaunay cell with a large circumradii x y −15 −10 −5 0 5 10 15 −15−10−5051015 Nicolas Chenavier The extremal index for a random tessellation Random tessellations Main problem Extremal index Work in progress Joint work with C. Robert (ISFA, Lyon 1) : new characterization of the extremal index (not based on classical block and run estimators appearing in the classical Extreme Value Theory) ; simulation and estimation for the extremal index and cluster size distribution (for Poisson-Voronoi and Poisson-Delaunay tessellations). Nicolas Chenavier The extremal index for a random tessellation

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A model of two-type (or two-color) interacting random balls is introduced. Each colored random set is a union of random balls and the interaction relies on the volume of the intersection between the two random sets. This model is motivated by the detection and quantification of co-localization between two proteins. Simulation and inference are discussed. Since all individual balls cannot been identified, e.g. a ball may contain another one, standard methods of inference as likelihood or pseudolikelihood are not available and we apply the Takacs-Fiksel method with a specific choice of test functions.
 
A two-color interacting random balls model for co-localization analysis of proteins

A testing procedure A model for co-localization Estimation A two-color interacting random balls model for co-localization analysis of proteins. Frédéric Lavancier, Laboratoire de Mathématiques Jean Leray, Nantes INRIA Rennes, Serpico team Joint work with C. Kervrann (INRIA Rennes, Serpico team). GSI’15, 28-30 October 2015. A testing procedure A model for co-localization Estimation Introduction : some data Vesicular trafficking analysis and colocalization quantification by TIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA] ? =⇒ Langerin proteins (left) and Rab11 GTPase proteins (right). Is there colocalization ? ⇔ Is there some spatial dependencies between the two types of proteins ? A testing procedure A model for co-localization Estimation Image pre-processing After segmentation Superposition : ? ⇒ After a Gaussian weights thresholding Superposition : ? ⇒ A testing procedure A model for co-localization Estimation The problem of co-localization can be described as follows : We observe two binary images in a domain Ω : First image (green) : realization of a random set Γ1 ∩ Ω Second image (red) : realization of a random set Γ2 ∩ Ω −→ Is there some dependencies between Γ1 and Γ2 ? −→ If so, can we quantify/model this dependency ? A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A testing procedure A model for co-localization Estimation Testing procedure Let a generic point o ∈ Rd and p1 = P(o ∈ Γ1), p2 = P(o ∈ Γ2), p12 = P(o ∈ Γ1 ∩ Γ2). If Γ1 and Γ2 are independent, then p12 = p1p2. A natural measure of departure from independency is ˆp12 − ˆp1 ˆp2 where ˆp1 = |Ω|−1 x∈Ω 1Γ1 (x), ˆp2 = |Ω|−1 x∈Ω 1Γ2 (x), ˆp12 = |Ω|−1 x∈Ω 1Γ1∩Γ2 (x). A testing procedure A model for co-localization Estimation Testing procedure Assume Γ1 and Γ2 are m-dependent stationary random sets. If Γ1 is independent of Γ2, then as |Ω| tends to infinity, T := |Ω| ˆp12 − ˆp1 ˆp2 x∈Ω y∈Ω ˆC1(x − y) ˆC2(x − y) → N(0, 1) where ˆC1 and ˆC2 are the empirical covariance functions of Γ1 ∩ Ω and Γ2 ∩ Ω respectively. Hence to test the null hypothesis of independence between Γ1 and Γ2 p-value = 2(1 − Φ(|T|)) where Φ is the c.d.f. of the standard normal distribution. A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls A testing procedure A model for co-localization Estimation Some simulations Simulations when Γ1 and Γ2 are union of random balls Independent case (and each color ∼ Poisson) Number of p−values < 0.05 over 100 realizations : 4. A testing procedure A model for co-localization Estimation Some simulations Dependent case (see later for the model) Number of p−values < 0.05 over 100 realizations : 100. A testing procedure A model for co-localization Estimation Some simulations Independent case, larger radii Number of p−values < 0.05 over 100 realizations : 5. A testing procedure A model for co-localization Estimation Some simulations Dependent case, larger radii and "small" dependence Number of p−values < 0.05 over 100 realizations : 97. A testing procedure A model for co-localization Estimation Real Data Depending on the pre-processing : T = 9.9 T = 17 p − value = 0 p − value = 0 A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. A testing procedure A model for co-localization Estimation We view each set Γ1 and Γ2 as a union of random balls. We model the superposition of the two images, i.e. Γ1 ∪ Γ2. The reference model is a two-type (two colors) Boolean model with equiprobable marks, where the radii follow some distribution µ on [Rmin, Rmax]. Notation : (ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}. → viewed as a marked point, marked by R and i. xi : collection of all marked points with color i. Hence Γi = (ξ,R)i∈xi (ξ, R)i x = x1 ∪ x2 : collection of all marked points. A testing procedure A model for co-localization Estimation Example : three realizations of the reference process A testing procedure A model for co-localization Estimation The model We consider a density on any bounded domain Ω with respect to the reference model f(x) ∝ zn1 1 zn2 2 eθ |Γ1∩ Γ2| where n1 : number of green balls and n2 : number of red balls. This density depends on 3 parameters z1 : rules the mean number of green balls z2 : rules the mean number of red balls θ : interaction parameter. If θ > 0 : attraction (co-localization) between Γ1 and Γ2 If θ = 0 : back to the reference model, up to the intensities (independence between Γ1 and Γ2). A testing procedure A model for co-localization Estimation Simulation Realizations can be generated by a standard birth-death Metropolis-Hastings algorithm. Examples : A testing procedure A model for co-localization Estimation 1 A testing procedure 2 A model for co-localization 3 Estimation problem A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. A testing procedure A model for co-localization Estimation Estimation problem Aim : Assume that the law µ of the radii is known. Given a realization of Γ1 ∪ Γ2 on Ω, estimate z1, z2 and θ in f(x) = 1 c(z1, z2, θ) zn1 1 zn2 2 eθ |Γ1∩ Γ2| , where c(z1, z2, θ) is the normalizing constant. Issue : The number of balls n1 and n2 is not observed. ⇒ likelihood or pseudo-likelihood based inference is not feasible. = A testing procedure A model for co-localization Estimation An equilibrium equation Consider, for any non-negative function h, C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) and for i = 1, 2, Ii(θ; h) = Rmax Rmin Ω h((ξ, R)i, x) λ((ξ, R)i, x) 2zi dξ µ(dR). Denoting by z∗ 1 , z∗ 2 and θ∗ the true unknown values of the parameters, we know from the Georgii-Nguyen-Zessin equation that for any h E(C(z∗ 1 , z∗ 2 , θ∗ ; h)) = 0. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable A testing procedure A model for co-localization Estimation Takacs Fiksel estimator Given K test functions (hk)1≤k≤K, the Takacs-Fiksel estimator is defined by (ˆz1, ˆz2, ˆθ) := arg min z1,z2,θ K k=1 C(z1, z2, θ; hk)2 . (1) Consistency and asymptotic normality studied in Coeurjolly et al. 2012. Recall that C(z1, z2, θ; h) = S(h) − z1I1(θ; h) − z2I2(θ; h) where S(h) = (ξ,R)∈x,ξ∈Ω h((ξ, R), x\(ξ, R)) To be able to compute (1), we must find test functions hk such that S(h) is computable How many ? At least K = 3 because 3 parameters to estimate. A testing procedure A model for co-localization Estimation A first possibility : h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} where S(ξ, R) is the sphere {y, ||y − ξ|| = R}. ⇓ ⇓ ⇓ ⇓ A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = A testing procedure A model for co-localization Estimation What about S(h1) = (ξ,R)∈x,ξ∈Ω h1((ξ, R), x\(ξ, R)) ? = ⇒ S(h1) = P(Γ1) (the perimeter of Γ1) A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). A testing procedure A model for co-localization Estimation So, for h1((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1)c 1{i=1} S(h1) = P(Γ1) and the Takacs-Fiksel contrast function C(z1, z2, θ; h1) is computable. Similarly, Let h2((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ2)c 1{i=2} then S(h2) = P(Γ2). Let h3((ξ, R)i, x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2)c then S(h3) = P(Γ1 ∪ Γ2). A testing procedure A model for co-localization Estimation Simulations with test functions h1, h2 and h3 over 100 realizations θ = 0.2 (and small radii) θ = 0.05 (and large radii) Frequency 0.15 0.20 0.25 0.30 05101520 Frequency 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 010203040 A testing procedure A model for co-localization Estimation Real Data We assume the law of the radii is uniform on [Rmin, Rmax]. (each image is embedded in [0, 250] × [0, 280]) Rmin = 0.5, Rmax = 2.5 Rmin = 0.5, Rmax = 10 ˆθ = 0.45 ˆθ = 0.03 A testing procedure A model for co-localization Estimation Conclusion The testing procedure allows to detect co-localization between two binary images is easy and fast to implement does not depend too much on the image pre-processing The model for co-localization relies on geometric features (area of intersection) can be fitted by the Takacs-Fiksel method allows to compare the degree of co-localization θ between two pairs of images if the laws of radii are similar

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The characteristic independence property of Poisson point processes gives an intuitive way to explain why a sequence of point processes becoming less and less repulsive can converge to a Poisson point process. The aim of this paper is to show this convergence for sequences built by superposing, thinning or rescaling determinantal processes. We use Papangelou intensities and Stein’s method to prove this result with a topology based on total variation distance.
 
Asymptotics of superposition of point processes

I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications 2nd conference on Geometric Science of Information Aurélien VASSEUR Asymptotics of some Point Processes Transformations Ecole Polytechnique, Paris-Saclay, October 28, 2015 1/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Mobile network in Paris - Motivation −2000 0 2000 4000 100020003000 −2000 0 2000 4000 100020003000 Figure: On the left, positions of all BS in Paris. On the right, locations of BS for one frequency band. 2/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Table of Contents I-Generalities on point processes Correlation function, Papangelou intensity and repulsiveness Determinantal point processes II-Kantorovich-Rubinstein distance Convergence dened by dKR dKR(PPP, Φ) ≤ "nice" upper bound III-Applications to transformations of point processes Superposition Thinning Rescaling 3/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Framework Y a locally compact metric space µ a diuse and locally nite measure of reference on Y NY the space of congurations on Y NY the space of nite congurations on Y 4/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Correlation function - Papangelou intensity Correlation function ρ of a point process Φ: E[ α∈NY α⊂Φ f (α)] = +∞ k=0 1 k! ˆ Yk f · ρ({x1, . . . , xk})µ(dx1) . . . µ(dxk) ρ(α) ≈ probability of nding a point in at least each point of α Papangelou intensity c of a point process Φ: E[ x∈Φ f (x, Φ \ {x})] = ˆ Y E[c(x, Φ)f (x, Φ)]µ(dx) c(x, ξ) ≈ conditionnal probability of nding a point in x given ξ 5/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Point process Properties Intensity measure: A ∈ FY → ´ A ρ({x})µ(dx) ρ({x}) = E[c(x, Φ)] If Φ is nite, then: IP(|Φ| = 1) = ˆ Y c(x, ∅)µ(dx) IP(|Φ| = 0). 6/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Poisson point process Properties Φ PPP with intensity M(dy) = m(y)dy Correlation function: ρ(α) = x∈α m(x) Papangelou intensity: c(x, ξ) = m(x) 7/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Repulsive point process Denition Point process repulsive if φ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ) Point process weakly repulsive if c(x, ξ) ≤ c(x, ∅) 8/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Determinantal point process Denition Determinantal point process DPP(K, µ): ρ({x1, · · · , xk}) = det(K(xi , xj ), 1 ≤ i, j ≤ k) Proposition Papangelou intensity of DPP(K, µ): c(x0, {x1, · · · , xk}) = det(J(xi , xj ), 0 ≤ i, j ≤ k) det(J(xi , xj ), 1 ≤ i, j ≤ k) where J = (I − K)−1K. 9/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process Ginibre point process Denition Ginibre point process on B(0, R): K(x, y) = 1 π e−1 2 (|x|2 +|y|2 ) exy 1{x∈B(0,R)}1{y∈B(0,R)} β-Ginibre point process on B(0, R): Kβ(x, y) = 1 π e − 1 2β (|x|2 +|y|2 ) e 1 β xy 1{x∈B(0,R)} 1{y∈B(0,R)} 10/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Framework Determinantal point process β-Ginibre point processes 11/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Kantorovich-Rubinstein distance Total variation distance: dTV(ν1, ν2) := sup A∈FY ν1(A),ν2(A)<∞ |ν1(A) − ν2(A)| F : NY → IR is 1-Lipschitz (F ∈ Lip1) if |F(φ1) − F(φ2)| ≤ dTV (φ1, φ2) for all φ1, φ2 ∈ NY Kantorovich-Rubinstein distance: dKR(IP1, IP2) = sup F∈Lip1 ˆ NY F(φ) IP1(dφ) − ˆ NY F(φ) IP2(dφ) Convergence in K.-R. distance =⇒ strictly Convergence in law 12/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Upper bound theorem Theorem (L. Decreusefond, AV) Φ a nite point process on Y ζM a PPP with nite control measure M(dy) = m(y)µ(dy). Then, we have: dKR(IPΦ, IPζM ) ≤ ˆ Y ˆ NY |m(y) − c(y, φ)|IPΦ(dφ)µ(dy). 13/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Φn,1, . . . , Φn,n: n independent, nite and weakly repulsive point processes on Y Φn := n i=1 Φn,i Rn := ´ Y | n i=1 ρn,i (x) − m(x)|µ(dx) ζM a PPP with control measure M(dx) = m(x)µ(dx) 14/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Superposition of weakly repulsive point processes Proposition (LD, AV) Φn = n i=1 Φn,i ζM a PPP with control measure M(dx) = m(x)µ(dx) dKR(IPΦn , IPζM ) ≤ Rn + max 1≤i≤n ˆ Y ρn,i (x)µ(dx) 15/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Consequence Corollary (LD, AV) f pdf on [0; 1] such that f (0+) := limx→0+ f (x) ∈ IR Λ compact subset of IR+ X1, . . . , Xn i.i.d. with pdf fn = 1 n f (1 n ·) Φn = {X1, . . . , Xn} ∩ Λ dKR(Φn, ζ) ≤ ˆ Λ f 1 n x − f (0+) dx + 1 n ˆ Λ f 1 n x dx where ζ is the PPP(f (0+)) reduced to Λ. 16/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning β-Ginibre point processes Proposition (LD, AV) Φn the βn-Ginibre process reduced to a compact set Λ ζ the PPP with intensity 1/π on Λ dKR(IPΦn , IPζ) ≤ Cβn 17/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Kallenberg's theorem Theorem (O. Kallenberg) Φn a nite point process on Y pn : Y → [0; 1) uniformly −−−−−→ 0 Φn the pn-thinning of Φn γM a Cox process (pnΦn) law −−→ M ⇐⇒ (Φn) law −−→ γM 18/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Polish distance (fn) a sequence in the space of real continuous functions with compact support generating FY d∗(ν1, ν2) = n≥1 1 2n Ψ(|ν1(fn) − ν2(fn)|) with Ψ(x) = x 1 + x d∗ KR the Kantorovich-Rubinstein distance associated to the distance d∗ 19/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning Thinned point processes Proposition (LD, AV) Φn a nite point process on Y pn : Y → [0; 1) Φn the pn-thinning of Φn γM a Cox process Then, we have: d∗ KR(IPΦn , IPγM ) ≤ 2E[ x∈Φn p2 n(x)] + d∗ KR(IPM, IPpnΦn ). 20/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Application to superposition Application to β-Ginibre point processes Application to thinning References L.Decreusefond, and A.Vasseur, Asymptotics of superposition of point processes, 2015. H.O. Georgii, and H.J. Yoo, Conditional intensity and gibbsianness of determinantal point processes, J. Statist. Phys. (118), January 2004. J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P. Martins, and Wei Chen, A Case Study on Regularity in Cellular Network Deployment, IEEE Wireless Communications Letters, 2015. A.F. Karr, Point Processes and their Statistical Inference, Ann. Probab. 15 (1987), no. 3, 12261227. 21/22 Aurélien VASSEUR Télécom ParisTech I-Generalities on point processes II-Kantorovich-Rubinstein distance III-Applications Thank you ... ... for your attention. Questions? 22/22 Aurélien VASSEUR Télécom ParisTech

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Random polytopes have constituted some of the central objects of stochastic geometry for more than 150 years. They are in general generated as convex hulls of a random set of points in the Euclidean space. The study of such models requires the use of ingredients coming from both convex geometry and probability theory. In the last decades, the study has been focused on their asymptotic properties and in particular expectation and variance estimates. In several joint works with Tomasz Schreiber and J. E. Yukich, we have investigated the scaling limit of several models (uniform model in the unit-ball, uniform model in a smooth convex body, Gaussian model) and have deduced from it limiting variances for several geometric characteristics including the number of k-dimensional faces and the volume. In this paper, we survey the most recent advances on these questions and we emphasize the particular cases of random polytopes in the unit-ball and Gaussian polytopes.
 
Asymptotic properties of random polytopes

Asymptotic properties of random polytopes Pierre Calka 2nd conference on Geometric Science of Information ´Ecole Polytechnique, Paris-Saclay, 28 October 2015 default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Joint work with Joseph Yukich (Lehigh University, USA) & Tomasz Schreiber (Toru´n University, Poland) default Outline Random polytopes: an overview Uniform polytopes Gaussian polytopes Expectation asymptotics Main results: variance asymptotics Sketch of proof: Gaussian case default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K50, K ball K50, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K100, K ball K100, K square default Uniform polytopes Binomial model K := convex body of Rd (Xk,k ∈ N∗):= independent and uniformly distributed in K Kn := Conv(X1, · · · , Xn), n ≥ 1 K500, K ball K500, K square default Uniform polytopes Poissonian model K := convex body of Rd Pλ, λ > 0:= Poisson point process of intensity measure λdx Kλ := Conv(Pλ ∩ K) K500, K ball K500, K square default Gaussian polytopes Binomial model Φd (x) := 1 (2π)d/2 e− x 2/2, x ∈ Rd, d ≥ 2 (Xk, k ∈ N∗):= independent and with density Φd Kn := Conv(X1, · · · , Xn) Poissonian model Pλ, λ > 0:= Poisson point process of intensity measure λΦd(x)dx Kλ := Conv(Pλ) default Gaussian polytopes K50 K100 K500 default Gaussian polytopes: spherical shape K50 K100 K500 default Asymptotic spherical shape of the Gaussian polytope Geffroy (1961) : dH(Kn, B(0, 2 log(n))) → n→∞ 0 a.s. K50000 default Expectation asymptotics Considered functionals fk(·) := number of k-dimensional faces, 0 ≤ k ≤ d Vol(·) := volume B. Efron’s relation (1965): Ef0(Kn) = n 1 − EVol(Kn−1) Vol(K) Uniform polytope, K smooth E[fk(Kλ)] ∼ λ→∞ cd,k ∂K κ 1 d+1 s ds λ d−1 d+1 κs := Gaussian curvature of ∂K Uniform polytope, K polytope E[fk(Kλ)] ∼ λ→∞ c′ d,kF(K) logd−1 (λ) F(K) := number of flags of K Gaussian polytope E[fk(Kλ)] ∼ λ→∞ c′′ d,k log d−1 2 (λ) A. R´enyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Affentranger & R. Schneider (1992) default Outline Random polytopes: an overview Main results: variance asymptotics Uniform model, K smooth Uniform model, K polytope Gaussian model Sketch of proof: Gaussian case default Uniform model, K smooth K := convex body of Rd with volume 1 and with a C3 boundary κ := Gaussian curvature of ∂K lim λ→∞ λ−(d−1)/(d+1) Var[fk(Kλ)] = ck,d ∂K κ(z)1/(d+1) dz lim λ→∞ λ(d+3)/(d+1) Var [Vol(Kλ)] = c′ d ∂K κ(z)1/(d+1) dz (ck,d , c′ d explicit positive constants) M. Reitzner (2005): Var[fk (Kλ)] = Θ(λ(d−1)/(d+1) ) default Uniform model, K polytope K := simple polytope of Rd with volume 1 i.e. each vertex of K is included in exactly d facets. lim λ→∞ log−(d−1) (λ)Var[fk(Kλ)] = cd,kf0(K) lim λ→∞ λ2 log−(d−1) (λ)Var[Vol(Kλ)] = c′ d,k f0(K) (ck,d , c′ k,d explicit positive constants) I. B´ar´any & M. Reitzner (2010): Var[fk (Kλ)] = Θ(log(d−1) (λ)) default Gaussian model lim λ→∞ log− d−1 2 (λ)Var[fk(Kλ)] = ck,d lim λ→∞ log−k+ d+3 2 (λ)Var[Vol(Kλ)] = c′ k,d E Vol(Kλ) Vol(B(0, 2 log(n))) = λ→∞ 1 − d log(log(λ)) 4 log(λ) + O 1 log(λ) (ck,d , c′ k,d explicit positive constants) D. Hug & M. Reitzner (2005), I. B´ar´any & V. Vu (2007): Var[fk (Kλ)] = Θ(log(d−1)/2 (λ)) default Outline Random polytopes: an overview Main results: variance asymptotics Sketch of proof: Gaussian case Calculation of the expectation of fk(Kλ) Calculation of the variance of fk(Kλ) Scaling transform default Calculation of the expectation of fk(Kλ) 1. Decomposition: E[fk(Kλ)] = E   x∈Pλ ξ(x, Pλ)   ξ(x, Pλ) := 1 k+1 #k-face containing x if x extreme 0 if not 2. Mecke-Slivnyak formula E[fk(Kλ)] = λ E[ξ(x, Pλ ∪ {x})]Φd (x)dx 3. Limit of the expectation of one score default Calculation of the variance of fk(Kλ) Var[fk (Kλ)] = E   x∈Pλ ξ2 (x, Pλ) + x=y∈Pλ ξ(x, Pλ)ξ(y, Pλ)   − (E[fk (Kλ)]) 2 = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 E[ξ(x, Pλ ∪ {x, y})ξ(y, Pλ ∪ {x, y})]Φd (x)Φd (y)dxdy − λ2 E[ξ(x, Pλ ∪ {x})]E[ξ(y, Pλ ∪ {y})]Φd (x)Φd (y)dxdy = λ E[ξ2 (x, Pλ ∪ {x})]Φd(x)dx + λ2 ”Cov”(ξ(x, Pλ ∪ {x}), ξ(y, Pλ ∪ {y}))Φd (x)Φd (y)dxdy default Scaling transform Question : Limits of E[ξ(x, Pλ)] and ”Cov”(ξ(x, Pλ), ξ(y, Pλ)) ? Answer : definition of limit scores in a new space ◮ Critical radius Rλ := 2 log λ − log(2 · (2π)d · log λ) ◮ Scaling transform : Tλ : Rd \ {0} −→ Rd−1 × R x −→ Rλ exp−1 d−1 x |x|, R2 λ(1 − |x| Rλ ) expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1 ◮ Image of a score : ξ(λ)(Tλ(x), Tλ(Pλ)) := ξ(x, Pλ) ◮ Convergence of Pλ : Tλ(Pλ) D → P o`u P : Poisson point process in Rd−1 × R of intensity measure ehdvdh default Action of the scaling transform Π↑ := {(v, h) ∈ Rd−1 × R : h ≥ v 2 2 } Π↓ := {(v, h) ∈ Rd−1 × R : h ≤ − v 2 2 } Half-space Translate of Π↓ Sphere containing O Translate of ∂Π↑ Convexity Parabolic convexity Extreme point (x + Π↑) not fully covered k-face of Kλ Parabolic k-face RλVol Vol default Limiting picture Ψ := x∈P(x + Π↑) In red : image of the balls of diameter [0, x] where x is extreme default Limiting picture Φ := x∈Rd−1×R:x+Π↓∩P=∅(x + Π↓) In green : image of the boundary of the convex hull Kλ default Thank you for your attention!

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Asymmetric information distances are used to define asymmetric norms and quasimetrics on the statistical manifold and its dual space of random variables. Quasimetric topology, generated by the Kullback-Leibler (KL) divergence, is considered as the main example, and some of its topological properties are investigated.
 
Asymmetric Topologies on Statistical Manifolds

Asymmetric Topologies on Statistical Manifolds Roman V. Belavkin School of Science and Technology Middlesex University, London NW4 4BT, UK GSI2015, October 28, 2015 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 1 / 16 Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 2 / 16 Sources and Consequences of Asymmetry Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 3 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1} sup x {Ep−q{x} : Eq{ex − 1 − x} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Asymmetric Information Distances Kullback-Leibler divergence D[p, q] = Eq{ln(p/q)} D[p1⊗p2, q1⊗q2] = D[p1, q1]+D[p2, q2] ln : (R+, ×) → (R, +) q Asymmetry of the KL-divergence D[p, q] = D[q, p] D[q + (p − q), q] = D[q − (p − q), q] p − q = inf{α−1 > 0 : D[q + α|(p − q)|, q] ≤ 1} sup x {Ep−q{x} : Eq{e|x| − 1 − |x|} ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 4 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Functional Analysis in Asymmetric Spaces Theorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982)) Every topological space with a countable base is quasi-pseudometrizable. An asymmetric seminormed space can be T0, but not T1 (and hence not Hausdorff T2). Dual quasimetrics ρ(x, y) and ρ−1(x, y) = ρ(y, x) induce two different topologies. There are 7 notions of Cauchy sequences: left (right) Cauchy, left (right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy. This gives 14 notions of completeness (with respect to ρ or ρ−1). Compactness is related to outer precompactness or precompactness, which are strictly weaker properties than total boundedness. An asymmetric seminormed space may fail to be a topological vector space, because y → αy can be discontinuous (Borodin, 2001). Practically all other results have to be reconsidered (e.g. Baire category theorem, Alaoglu-Bourbaki, etc). (Cobzas, 2013). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 5 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Random Variables as the Source of Asymmetry M◦ := {x : x, y ≤ 1, ∀ y ∈ M} M Minkowski functional: µM◦ (x) = inf{α > 0 : x/α ∈ M◦ } M◦ {y : D∗[x, 0] ≤ 1} D∗[x, 0] = ex − 1 − x, z Support function = sM(x) = sup{ x, y : y ∈ M} M = {u : D[(1 + u)z, z] ≤ 1} D = (1 + u) ln(1 + u) − u, z Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 6 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Sources and Consequences of Asymmetry Examples Example (St. Peterbourgh lottery) x = 2n, q = 2−n, n ∈ N. Eq{x} = ∞ n=1(2n/2n) → ∞ Ep{x} < ∞ for all biased p = 2−(1+α)n, α > 0. 2n /∈ dom Eq{ex}, −2n ∈ dom Eq{ex} 0 /∈ Int(dom Eq{ex}) Example (Error minimization) Minimize x = 1 2 a − b 2 2 subject to DKL[w, q ⊗ p] ≤ λ, a, b ∈ Rn. Ew{x} < ∞ minimized at w ∝ e−βxq ⊗ p. Maximization of x has no solution. 1 2 a − b 2 2 /∈ dom Eq⊗p{ex}, −1 2 a − b 2 2 ∈ dom Eq⊗p{ex} 0 /∈ Int(dom Eq⊗p{ex}) Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 7 / 16 Method: Symmetric Sandwich Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 8 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µM◦ ≤ µ(−M◦ ) ∨ µM◦ µ(−M)co ∧ µM ≤ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Method: Symmetric Sandwich s[−A ∩ A] ≤ sA ≤ s[−A ∪ A] µco [−A◦ ∪ A◦] ≤ µA◦ ≤ µ[−A◦ ∩ A◦] s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y } s[−A ∪ A] = s(−A) ∨ sA µ(−M◦ )co ∧ µM◦ ≤ µM◦ µM ≤ µ(−M) ∨ µM Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 9 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Method: Symmetric Sandwich Lower and upper Luxemburg (Orlicz) norms −2 −1 0 1 2 ϕ∗ (x) = ex − 1 − x ϕ∗ +(x) = ϕ∗ (|x|) /∈ ∆2 ϕ∗ −(x) = ϕ∗ (−|x|) ∈ ∆2 x|∗ ϕ = µ{x : ϕ∗ (x), z ≤ 1} −2 −1 0 1 2 ϕ(u) = (1 + u) ln(1 + u) − u ϕ+(u) = ϕ(|u|) ∈ ∆2 ϕ−(u) = ϕ(−|u|) /∈ ∆2 u|ϕ = µ{u : ϕ(u), z ≤ 1} Proposition · ∗ ϕ+, · ∗ ϕ− are Luxemburg norms and x ∗ ϕ− ≤ x|∗ ϕ ≤ x ∗ ϕ+ · ϕ+, · ϕ− are Luxemburg norms and u ϕ+ ≤ u|ϕ ≤ u ϕ− Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 10 / 16 Results Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 11 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results KL Induces Hausdorff (T2) Asymmetric Topology Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is Hausdorff. Proof. u ϕ+ ≤ u|ϕ (resp. x ϕ− ≤ x|ϕ) implies (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is finer than normed space (Y, · ϕ+) (resp. (X, · ∗ ϕ−)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 12 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Separable Subspaces Theorem (Y, · ϕ+) (resp. (X, · ∗ ϕ−)) is a separable Orlicz subspace of (Y, · |ϕ) (resp. (X, · |∗ ϕ)). Proof. ϕ+(u) = (1 + |u|) ln(1 + |u|) − |u| ∈ ∆2 (resp. ϕ∗ −(x) = e−|x| − 1 + |x| ∈ ∆2). Note that ϕ− /∈ ∆2 and ϕ∗ + /∈ ∆2. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 13 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Completeness Theorem (Y, · |ϕ) (resp. (X, · |∗ ϕ)) is 1 Bi-Complete: ρs-Cauchy yn ρs → y. 2 ρ-sequentially complete: ρs-Cauchy yn ρ → y. 3 Right K-sequentially complete: right K-Cauchy yn ρ → y. Proof. ρs(y, z) = z − y|ϕ ∨ y − z|ϕ ≤ y − z ϕ−, where (Y, · ϕ−) is Banach. Then use theorems of Reilly et al. (1982) and Chen et al. (2007). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 14 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 Results Summary and Further Questions Topologies induced by asymmetric information divergences may not have the same properties as their symmetrized counterparts (e.g. Banach spaces), and therefore many properties have to be re-examined. We have proved that topologies induced by the KL-divergence are: Hausdorff. Bi-complete, ρ-sequentially complete and right K-sequentially complete. Contain a separable Orlicz subspace. Total boundedness, compactness? Other asymmetric information distances (e.g. Renyi divergence). Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 15 / 16 References Sources and Consequences of Asymmetry Method: Symmetric Sandwich Results Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16 Results Borodin, P. A. (2001). The Banach-Mazur theorem for spaces with asymmetric norm. Mathematical Notes, 69(3–4), 298–305. Chen, S.-A., Li, W., Zou, D., & Chen, S.-B. (2007, Aug). Fixed point theorems in quasi-metric spaces. In Machine learning and cybernetics, 2007 international conference on (Vol. 5, p. 2499-2504). IEEE. Cobzas, S. (2013). Functional analysis in asymmetric normed spaces. Birkh¨auser. Fletcher, P., & Lindgren, W. F. (1982). Quasi-uniform spaces (Vol. 77). New York: Marcel Dekker. Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982). Cauchy sequences in quasi-pseudo-metric spaces. Monatshefte f¨ur Mathematik, 93, 127–140. Roman Belavkin (Middlesex University) Asymmetric Topologies October 28, 2015 16 / 16

Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.
 
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling R. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, Canada GSI 2015, October 28th 2015 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Key points In CIG, the multinomial model ∆k = (π0, . . . , πk) : πi ≥ 0, i πi = 1 provides a universal model. 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 Cressie-Read power divergence λ-family - equivalent to Amari’s α-family asymptotic properties of two test statistics: Pearson’s χ2-test and deviance simulation study for other statistics within power divergence family 3 k-asymptotics instead of N-asymptotics Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Big data Statistical Theory and Methods for Complex, High-Dimensional Data programme, Isaac Newton Institute (2008): . . . the practical environment has changed dramatically over the last twenty years, with the spectacular evolution of computing facilities and the emergence of applications in which the number of experimental units is relatively small but the underlying dimension is massive. . . . Areas of application include image analysis, microarray analysis, finance, document classification, astronomy and atmospheric science. continuous data - High dimensional low sample size data (HDLSS) discrete data databases image analysis Sparsity (N << k) changes everything! Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Image analysis - example Figure: m1 = 10, m2 = 10 Dimension of a state space: k = 2m1m2 − 1 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Sparsity changes everything S. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in Log-Linear Models Despite the widespread usage of these [log-linear] models, the applicability and statistical properties of log-linear models under sparse settings are still very poorly understood. As a result, even though high-dimensional sparse contingency tables constitute a type of data that is common in practice, their analysis remains exceptionally difficult. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Extended multinomial distribution Let n = (ni) ∼ Mult(N, (πi)), i = 0, 1, . . . , k, where each πi≥0. Goodness-of-fit test H0 : π = π∗ . Pearson’s χ2 test (Wald, score statistic) W := k i=0 (π∗ i − ni/N)2 π∗ i ≡ 1 N2 k i=0 n2 i π∗ i − 1. Rule of thumb (for accuracy of χ2 k asymptotic approximation) Nπi ≥ 5 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - example 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 02000400060008000 (b) Sample of Wald Statistic Index WaldStatistic Figure: N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Performance of Pearson’s χ2 test on the boundary - theory Theorem For k > 1 and N ≥ 6, the first three moments of W are: E(W) = k N , var(W) = π(−1) − (k + 1)2 + 2k(N − 1) N3 and E[{W − E(W)}3 ] given by π(−2) − (k + 1)3 − (3k + 25 − 22N) π(−1) − (k + 1)2 + g(k, N) N5 where g(k, N) = 4(N − 1)k(k + 2N − 5) > 0 and π(a) := i πa i . In particular, for fixed k and N, as πmin → 0 var(W) → ∞ and γ(W) → +∞ where γ(W) := E[{W − E(W)}3 ]/{var(W)}3/2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary The deviance statistic Define the deviance D via D/2 = {0≤i≤k:ni>0} {ni log(ni/N) − log(πi)} = {0≤i≤k:ni>0} ni log(ni/N) + log 1 πi = {0≤i≤k:ni>0} ni log(ni/µi), where µi := E(ni) = Nπi. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Distribution of deviance let {n∗ i , i = 0, . . . , k} be mutually independent, with n∗ i ∼ Po(µi) then N∗ := k i=0 n∗ i ∼ Po(N) and ni = (n∗ i |N∗ = N) ∼ Mult(N, πi) define S∗ := N∗ D∗ /2 = k i=0 n∗ i n∗ i log(n∗ i /µi) define ν, τ and ρ via N ν := E(S∗ ) = N k i=0 E(n∗ i log {n∗ i /µi}) , N ρτ √ N · τ2 := cov(S∗ ) = N k i=0 Ci · k i=0 Vi , where Ci := Cov(n∗ i , n∗ i log(n∗ i /µi)) and Vi := V ar(n∗ i log(n∗ i /µi)). Then under equicontinuity D/2 D −−−−→ k→∞ N1(ν, τ2 (1 − ρ2 )). Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity near the boundary 0 50 100 150 200 0.000.010.020.030.040.05 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 0500150025003500 (b) Sample of Wald Statistic Index WaldStatistic 0 200 400 600 800 1000 5060708090100110 (c) Sample of Deviance Statistic Index Deviance Figure: Stability of sampling distributions - Pearson’s χ2 and deviance statistic, N = 50, k = 200, exponentially decreasing πi Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Asymptotic approximations normal approximation can be improved χ2 approximation, correction for skewness symmetrised deviance statistics 40 60 80 100 120 5060708090 Normal Approximation Deviance quantiles Normalquantiles 60 80 100 120 5060708090100 Chi−squared Approximation Deviance quantiles Chi−squaredquantiles 40 60 80 100 120 5060708090 Symmetrised Deviance Symmetric Deviance quantiles Normalquantiles Figure: Quality of k-asymptotics approximations near the boundary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments does k-asymptotic approximation hold uniformly across the simplex? rewrite deviance as D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i log(n∗ i /µi) = Γ∗ + ∆∗ where Γ∗ := k i=0 αin∗ i and ∆∗ := {0≤i≤k:n∗ i >1} n∗ i log n∗ i ≥ 0 and αi := − log µi. how well is the moment generating function of the (standardised) Γ∗ approximated by that of a (standard) normal? Mγ(t) = exp − E(Γ∗ )t V ar(Γ∗) exp   k i=0    ∞ h=1 (−1)h h! µi(log µi)h t V ar(Γ∗) h      Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and higher moments maximise skewness k i=0 µi(log µi)3 for fixed E(Γ∗ ) = − k i=0 µi log(µi) and V ar(Γ∗ ) = k i=0 µi(log µi)2 . solution: distribution with three distinct values for µi 0 50 100 150 200 0.0000.0020.0040.006 (a) Null distribution Rank of cell probability Cellprobability (b) Sample of Wald Statistic (out1) WaldStatistic 160 180 200 220 240 260 280 300 050100150200 (c) Sample of Deviance Statistic outDeviance 110 115 120 125 130 135 050100150200 Figure: Worst case solution for normality of Γ∗ Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness Worst case for asymptotic normality? Where? Why? Pearson χ2 boundary ’unstable’ deviance centre discreteness D∗ /2 = {0≤i≤k:n∗ i >0} n∗ i (log n∗ i − logµi) = Γ∗ + ∆∗ For the distribution of any discrete random variable to be well approximated by a continuous one, it is necessary that it have a large number of support points, close together. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 115120125130135 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −101234 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 30, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Uniformity and discreteness, continued 0 50 100 150 200 0.0000.0010.0020.0030.0040.005 (a) Null distribution Rank of cell probability Cellprobability 0 200 400 600 800 1000 150160170180190 (b) Sample of Deviance Statistic Index Deviance −3 −2 −1 0 1 2 3 −2−10123 (c) QQplot Deviance Statistic Theoretical Quantiles StandardisedDeviance Figure: Behaviour at the centre of the simplex, N = 60, k = 200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Comparison of performance of different test statistics belonging to power divergence family as we are approaching the boundary (exponentially decreasing values of π) 2NIλ (ni/N, π∗ ) = 2 λ(λ + 1) k i=1 ni ni Nπ∗ i λ − 1 , where α = 1 + 2λ α = 3 Pearson’s χ2 statistic α = 7/3 Cressie-Read recommendation α = 1 deviance α = 0 Hellinger statistic α = −1 Kullback MDI α = −3 Neyman χ2 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Pearson's χ2 , α= 3 Frequency 0 1000 2000 3000 4000 0200400600800 Cressie-Read, α= 7/3 Frequency 0 100 200 300 400 500 0100300500 deviance, α= 1 Frequency 40 60 80 100 050100150 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Numerical comparison of test statistics belonging to power divergence family 0 50 100 150 200 0.000.020.04 Index pi.base Hellinger distance, α= 0 Frequency 60 80 100 120 140 050100150 Kullback MDI, α= -1 Frequency 30 40 50 60 70 80 90 050100150 Neyman χ2 , α= -3 Frequency 10 15 20 25 050100200 Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Outline 1 Introduction 2 Pearson’s χ2 versus the deviance 3 Other test statistics from power divergence family 4 Summary Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary Summary - key points 1 goodness-of-fit testing in large sparse extended multinomial contexts 2 k-asymptotics instead of N-asymptotics 3 Cressie-Read power divergence λ-family asymptotic properties of two test statistics: Pearson’s χ2 statistic and deviance simulation study for other statistics within power divergence family Radka Sabolová Geometry of GOF Testing in HDLSS Modelling Introduction Pearson’s χ2 versus the deviance Other test statistics from power divergence family Summary References A. Agresti (2002): Categorical Data Analysis. Wiley: Hoboken NJ. K. Anaya-Izquierdo, F. Critchley, and P. Marriott (2014): When are first order asymptotics adequate? a diagnostic. STAT, 3: 17 – 22. K. Anaya-Izquierdo, F. Critchley, P. Marriott, and P. Vos (2013): Computational information geometry: foundations. Proceedings of GSI 2013, LNCS. F. Critchley and Marriott P (2014): Computational information geometry in statistics: theory and practice. Entropy, 16: 2454 – 2471. S.E. Fienberg and A. Rinaldo (2012): Maximum likelihood estimation in log-linear models. Annals of Statistics, 40: 996 – 1023. L. Holst (1972): Asymptotic normality and efficiency for certain goodnes-of-fit tests, Biometrika, 59: 137 – 145. C. Morris (1975): Central limit theorems for multinomial sums, Annals of Statistics, 3: 165 – 188. Radka Sabolová Geometry of GOF Testing in HDLSS Modelling

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Local mixture models give an inferentially tractable but still flexible alternative to general mixture models. Their parameter space naturally includes boundaries; near these the behaviour of the likelihood is not standard. This paper shows how convex and differential geometries help in characterising these boundaries. In particular the geometry of polytopes, ruled and developable surfaces is exploited to develop efficient inferential algorithms.
 
Computing Boundaries in Local Mixture Models

Computing Boundaries in Local Mixture Models Computing Boundaries in Local Mixture Models Vahed Maroufy & Paul Marriott Department of Statistics and Actuarial Science University of Waterloo October 28 GSI 2015, Paris Computing Boundaries in Local Mixture Models Outline Outline 1 Influence of boundaries on parameter inference 2 Local mixture models (LMM) 3 Parameter space and boundaries Hard boundaries and Soft boundaries 4 Computing the boundaries for LMMs 5 Summary and future direction Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models Boundary influence When boundary exits: MLE does not exist =⇒ find the Extended MLE MLE exists, but does not satisfy the regular properties Examples Binomial distribution, logistic regression, contingency table, log-linear and graphical models Geyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013) Computing boundary is a hard problem, Fukuda (2004) Many mathematical results in the literature polytope approximation, Boroczky and Fodor (2008), Barvinok (2013) smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models LMMs Local Mixture Models Definition Marriott (2002) g(x; µ, λ) = f (x; µ) + k j=2 λj f (j) (x; µ), λ ∈ Λµ ⊂ Rk−1 Properties Anaya-Izquierdo and Marriott (2007) g is identifiable in all parameters and the parametrization (µ, λ) is orthogonal at λ = 0 The log likelihood function of g is a concave function of λ at a fixed µ0 Λµ is convex Approximate continuous mixture models when mixing is “small” M f (x, µ) dQ(µ) Family of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation Example LMM of Normal f (x; µ) = φ(x; µ, σ2 ), (σ2 is known). g(x; µ, λ) = φ(x; µ, σ2 ) 1 + k j=2 λj pj (x) , λ ∈ Λµ pj (x) polynomial of degree j. Why we care about λ and Λµ? They are interpretable    µ (2) g = σ2 + 2λ2 µ (3) g = 6λ3 µ (4) g = µ (4) φ + 12σ2 λ2 + 24λ4 (1) λ represents the mixing distribution Q via its moments in M f (x, µ) dQ(µ) Computing Boundaries in Local Mixture Models Example and Motivation The costs for all these good properties and flexibility are Hard boundary =⇒ Positivity (boundary of Λµ) Soft boundary =⇒ Mixture behavior We compute them for two models here: Poisson and Normal We fix k = 4 Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Boundaries Hard boundary Λµ = λ | 1 + k j=2 λj qj (x; µ) ≥ 0, ∀x ∈ S , Λµ is intersection of half-spaces so convex Hard boundary is constructed by a set of (hyper-)planes Soft boundary Definition For a density function f (x; µ) with k finite moments let, Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). and for compact M define C = convhull{Mr (f )|µ ∈ M} Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Poisson model Λµ = λ | A2(x) λ2 + A3(x)λ3 + A4(x) λ4 + 1 ≥ 0, ∀x ∈ Z+ , Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3. Theorem For a LMM of a Poisson distribution, for each µ, the space Λµ can be arbitrarily well approximated, as measured by volume for example, by a finite polytope. Computing Boundaries in Local Mixture Models Computing hard boundary Normal model let y = x−µ σ2 Λµ = {λ | (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}. We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Ruled and developable surfaces Definition Ruled surface: Γ(x, γ) = α(x) + γ · β(x), x ∈ I ⊂ R, γ ∈ Rk Developable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Definition The family of planes, A = {λ ∈ R3 | a(x) · λ + d(x) = 0, x ∈ R}, each determined by an x ∈ R, is called a one-parameter infinite family of planes. Each element of the set {λ ∈ R3 |a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R} is called a characteristic line of the surface at x and the union is called the envelope of the family. A characteristic line is the intersection of two consecutive planes The envelope is a developable surface Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Hard boundary of for Normal LMM (y2 − 1)λ2 + (y3 − 3y)λ3 + (y4 − 6y2 + 3)λ4 + 1 = 0, ∀y ∈ R . λ2 λ3 λ4 λ4 λ3 λ2 Figure : Left: The hard boundary for the normal LMM (shaded) as a subset of a self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Soft boundary of for Normal LMM recap : Mk (f ) := (Ef (X), Ef (X2 ), · · · , Ef (Xk )). For visualization purposes let k = 3, (µ ∈ M, fix σ) M3(f ) = (µ, µ2 + σ2 , µ3 + 3µσ2 ), M3(g) = (µ, µ2 + σ2 + 2λ2, µ3 + 3µσ2 + 6µλ2 + 6λ3). Figure : the 3-D curve ϕ(µ); Middle: the bounding ruled surface γa(µ, u); Right: the convex subspace restricted to soft boundary. Computing Boundaries in Local Mixture Models Ruled and developable surfaces Boundaries for Normal LMM Ruled surface parametrization Two boundary surfaces, each constructed by a curve and a set of lines attached to it. γa(µ, u) = ϕ(µ) + u La(µ) γb(µ, u) = ϕ(µ) + u Lb(µ) where for M = [a, b] and ϕ(µ) = M3(f ) La(µ): lines between ϕ(a) and ϕ(µ) Lb(µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture Models Summary Summary Understanding these boundaries is important if we want to exploit the nice statistical properties of LMM The boundaries described in this paper have both discrete aspects and smooth aspects The two example discussed represent the structure for almost all exponential family models It is a interesting problem to design optimization algorithms on these boundaries for finding boundary maximizers of likelihood Computing Boundaries in Local Mixture Models References Anaya-Izquierdo, K., Critchley, F., and Marriott, P. (2013). when are first order asymptotics adequate? a diagnostic. Stat, 3(1):17–22. Anaya-Izquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640. Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, rnt078. Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20. Boroczky, K. and Fodor, F. (2008). Approximating 3-dimensional convex bodies by polytopes with a restricted number of edges. Contributions to Algebra and Geometry, 49(1):177–193. Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal of Symbolic Computation, 38(4):1261–1272. Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal of Statistics, 3:259–289. Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of Differential Geometry, 57(2):239–271. Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society, 36(4):483–492. Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93. Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture Models END Thank You

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We generalize the O(dnϵ2)-time (1 + ε)-approximation algorithm for the smallest enclosing Euclidean ball [2,10] to point sets in hyperbolic geometry of arbitrary dimension. We guarantee a O(1/ϵ2) convergence time by using a closed-form formula to compute the geodesic α-midpoint between any two points. Those results allow us to apply the hyperbolic k-center clustering for statistical location-scale families or for multivariate spherical normal distributions by using their Fisher information matrix as the underlying Riemannian hyperbolic metric.
 
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen1 Ga¨etan Hadjeres2 ´Ecole Polytechnique 1 Sony Computer Science Laboratories, Inc 1,2 Conference on Geometric Science of Information c 2015 Frank Nielsen - Ga¨etan Hadjeres 1 The Minimum Enclosing Ball problem Finding the Minimum Enclosing Ball (or the 1-center) of a finite point set P = {p1, . . . , pn} in the metric space (X, dX (., .)) consists in finding c ∈ X such that c = argminc ∈X max p∈P dX (c , p) Figure : A finite point set P and its minimum enclosing ball MEB(P) c 2015 Frank Nielsen - Ga¨etan Hadjeres 2 The approximating minimum enclosing ball problem In a euclidean setting, this problem is well-defined: uniqueness of the center c∗ and radius R∗ of the MEB computationally intractable in high dimensions. We fix an > 0 and focus on the Approximate Minimum Enclosing Ball problem of finding an -approximation c ∈ X of MEB(P) such that dX (c, p) ≤ (1 + )R∗ ∀p ∈ P. c 2015 Frank Nielsen - Ga¨etan Hadjeres 3 The approximating minimum enclosing ball problem: prior work Approximate solution in the euclidean case are given by Badoiu and Clarkson’s algorithm [Badoiu and Clarkson, 2008]: Initialize center c1 ∈ P Repeat 1/ 2 times the following update: ci+1 = ci + fi − ci i + 1 where fi ∈ P is the farthest point from ci . How to deal with point sets whose underlying geometry is not euclidean ? c 2015 Frank Nielsen - Ga¨etan Hadjeres 4 The approximating minimum enclosing ball problem: prior work This algorithm has been generalized to dually flat manifolds [Nock and Nielsen, 2005] Riemannian manifolds [Arnaudon and Nielsen, 2013] Applying these results to hyperbolic geometry give the existence and uniqueness of MEB(P), but give no explicit bounds on the number of iterations assume that we are able to precisely cut geodesics. c 2015 Frank Nielsen - Ga¨etan Hadjeres 5 The approximating minimum enclosing ball problem: our contribution We analyze the case of point sets whose underlying geometry is hyperbolic. Using a closed-form formula to compute geodesic α-midpoints, we obtain a intrinsic (1 + )-approximation algorithm to the approximate minimum enclosing ball problem a O(1/ 2) convergence time guarantee a one-class clustering algorithm for specific subfamilies of normal distributions using their Fisher information metric c 2015 Frank Nielsen - Ga¨etan Hadjeres 6 Model of d-dimensional hyperbolic geometry: The Poincar´e ball model The Poincar´e ball model (Bd , ρ(., .)) consists in the open unit ball Bd = {x ∈ Rd : x < 1} together with the hyperbolic distance ρ (p, q) = arcosh 1 + 2 p − q 2 (1 − p 2) (1 − q 2) , ∀p, q ∈ Bd . This distance induces on the metric space (Bd , ρ) a Riemannian structure. c 2015 Frank Nielsen - Ga¨etan Hadjeres 7 Geodesics in the Poincar´e ball model Shorter paths between two points (geodesics) are exactly straight (euclidean) lines passing through the origin circle arcs orthogonal to the unit sphere Figure : “Straight” lines in the Poincar´e ball model c 2015 Frank Nielsen - Ga¨etan Hadjeres 8 Circles in the Poincar´e ball model Circles in the Poincar´e ball model look like euclidean circles but with different center Figure : Difference between euclidean MEB (in blue) and hyperbolic MEB (in red) for the set of blue points in hyperbolic Poincar´e disk (in black). The red cross is the hyperbolic center of the red circle while the pink one is its euclidean center. c 2015 Frank Nielsen - Ga¨etan Hadjeres 9 Translations in the Poincar´e ball model Tp (x) = 1 − p 2 x + x 2 + 2 x, p + 1 p p 2 x 2 + 2 x, p + 1 Figure : Tiling of the hyperbolic plane by squares c 2015 Frank Nielsen - Ga¨etan Hadjeres 10 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints A point m is the α-midpoint p#αq of two points p, q for α ∈ [0, 1] if m belongs to the geodesic joining the two points p, q m verifies ρ (p, mα) = αρ (p, q) . For the special case p = (0, . . . , 0), q = (xq, 0, . . . , 0), we have p#αq := (xα, 0, . . . , 0) with xα = cα,q − 1 cα,q + 1 , where cα,q := eαρ(p,q) = 1 + xq 1 − xq α . c 2015 Frank Nielsen - Ga¨etan Hadjeres 11 Closed-form formula for computing α-midpoints Noting that p#αq = Tp (T−p (p) #αT−p (q)) ∀p, q ∈ Bd we obtain a closed-form formula for computing p#αq how to compute p#αq in linear time O(d) that these transformations are exact. c 2015 Frank Nielsen - Ga¨etan Hadjeres 12 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius For a fixed radius r > R∗, we can find c ∈ Bd such that ρ (c, P) ≤ (1 + )r ∀p ∈ P with Algorithm 1: (1 + )-approximation of EHB(P, r) 1: c0 := p1 2: t := 0 3: while ∃p ∈ P such that p /∈ B (ct, (1 + ) r) do 4: let p ∈ P be such a point 5: α := ρ(ct ,p)−r ρ(ct ,p) 6: ct+1 := ct#αp 7: t := t+1 8: end while 9: return ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 13 Idea of the proof By the hyperbolic law of cosines : ch (ρt) ≥ ch (h) ch (ρt+1) ch (ρ1) ≥ ch (h)T ≥ ch ( r)T . ct+1 ct c∗ pt h > r ρt+1 ρt r ≤ rr θ θ Figure : Update of ct c 2015 Frank Nielsen - Ga¨etan Hadjeres 14 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1+ )-approximation of an hyperbolic enclosing ball of fixed radius The EHB(P, r) algorithm is a O(1/ 2)-time algorithm which returns the center of a hyperbolic enclosing ball with radius (1 + )r in less than 4/ 2 iterations. Our error with the true MEHB center c∗ verifies ρ (c, c∗ ) ≤ arcosh ch ((1 + ) r) ch (R∗) c 2015 Frank Nielsen - Ga¨etan Hadjeres 15 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) In fact, as R∗ is unknown in general, the EHB algorithm returns for any r: an (1 + )-approximation of EHB(P) if r ≥ R∗ the fact that r < R∗ if the result obtained after more than 4/ 2 iterations is not good enough. This suggests to implement a dichotomic search in order to compute an approximation of the minimal hyperbolic enclosing ball. We obtain a O(1 + + 2/4)-approximation of MEHB(P) in O N 2 log 1 iterations. c 2015 Frank Nielsen - Ga¨etan Hadjeres 16 (1 + + 2 /4)-approximation of MEHB(P) algorithm Algorithm 2: (1 + )-approximation of MEHB(P) 1: c := p1 2: rmax := ρ (c, P); rmin = rmax 2 ; tmax := +∞ 3: r := rmax; 4: repeat 5: ctemp := Alg1 P, r, 2 , interrupt if t > tmax in Alg1 6: if call of Alg1 has been interrupted then 7: rmin := r 8: else 9: rmax := r ; c := ctemp 10: end if 11: dr := rmax−rmin 2 ; r := rmin + dr ; tmax := log(ch(1+ /2)r)−log(ch(rmin)) log(ch(r /2)) 12: until 2dr < rmin 2 13: return c c 2015 Frank Nielsen - Ga¨etan Hadjeres 17 Experimental results The number of iterations does not depend on d. Figure : Number of α-midpoint calculations as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 18 Experimental results The running time is approximately O(dn 2 ) (vertical translation in logarithmic scale). Figure : execution time as a function of in logarithmic scale for different values of d. c 2015 Frank Nielsen - Ga¨etan Hadjeres 19 Applications Hyperbolic geometry arises when considering certain subfamilies of multivariate normal distributions. For instance, the following subfamilies N µ, σ2In of n-variate normal distributions with scalar covariance matrix (In is the n × n identity matrix), N µ, diag σ2 1, . . . , σ2 n of n-variate normal distributions with diagonal covariance matrix N(µ0, Σ) of d-variate normal distributions with fixed mean µ0 and arbitrary positive definite covariance matrix Σ are statistical manifolds whose Fisher information metric is hyperbolic. c 2015 Frank Nielsen - Ga¨etan Hadjeres 20 Applications In particular, our results apply to the two-dimensional location-scale subfamily: Figure : MEHB (D) of probability density functions (left) in the (µ, σ) superior half-plane (right). P = {A, B, C}. c 2015 Frank Nielsen - Ga¨etan Hadjeres 21 Openings Plugging the EHB and MEHB algorithms to compute clusters centers in the approximation algorithm by [Gonzalez, 1985], we obtain approximate algorithms for covering in hyperbolic spaces the k-center problem in O kNd 2 log 1 c 2015 Frank Nielsen - Ga¨etan Hadjeres 22 Algorithm 3: Gonzalez farthest-first traversal approximation algo- rithm 1: C1 := P, i = 0 2: while i ≤ k do 3: ∀j ≤ i, compute cj := MEB(Cj ) 4: ∀j ≤ i, set fj := argmaxp∈P ρ(p, cj ) 5: find f ∈ {fj } whose distance to its cluster center is maximal 6: create cluster Ci containing f 7: add to Ci all points whose distance to f is inferior to the distance to their cluster center 8: increment i 9: end while 10: return {Ci }i c 2015 Frank Nielsen - Ga¨etan Hadjeres 23 Openings The computation of the minimum enclosing hyperbolic ball does not necessarily involve all points p ∈ P. Core-sets in hyperbolic geometry the MEHB obtained by the algorithm is an -core-set differences with the euclidean setting: core-sets are of size at most 1/ [Badoiu and Clarkson, 2008] c 2015 Frank Nielsen - Ga¨etan Hadjeres 24 Thank you! c 2015 Frank Nielsen - Ga¨etan Hadjeres 25 Bibliography I Arnaudon, M. and Nielsen, F. (2013). On approximating the Riemannian 1-center. Computational Geometry, 46(1):93–104. Badoiu, M. and Clarkson, K. L. (2008). Optimal core-sets for balls. Comput. Geom., 40(1):14–22. Gonzalez, T. F. (1985). Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293–306. Nock, R. and Nielsen, F. (2005). Fitting the smallest enclosing Bregman ball. In Machine Learning: ECML 2005, pages 649–656. Springer. c 2015 Frank Nielsen - Ga¨etan Hadjeres 26

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Brain Computer Interfaces (BCI) based on electroencephalography (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices, and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably, a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset, as the invariances embedded in those distances have an influence on the classification accuracy. Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load.
 
From Euclidean to Riemannian Means Information Geometry for SSVEP Classification

From Euclidean to Riemannian Means: Information Geometry for SSVEP Classification Emmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al. F’SATI - Tshawne University of Technology (South Africa) LISV - Université de Versailles Saint-Quentin (France) Mensia Technologies (France) sylvain.chevallier@uvsq.fr 28 October 2015 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Cerebral interfaces Context Rehabilitation and disability compensation ) Out-of-the-lab solutions ) Open to a wider population Problem Intra-subject variabilities ) Online methods, adaptative algorithms Inter-subject variabilities ) Good generalization, fast convergence Opportunities New generation of BCI (Congedo & Barachant) • Growing interest in EEG community • Large community, available datasets • Challenging situations and problems S. Chevallier 28/10/2015 GSI 2 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Outline Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances S. Chevallier 28/10/2015 GSI 3 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction based on brain activity Brain-Computer Interface (BCI) for non-muscular communication • Medical applications • Possible applications for wider population Recording at what scale ? • Neuron !LFP • Neuronal group !ECoG !SEEG • Brain !EEG !MEG !IRMf !TEP S. Chevallier 28/10/2015 GSI 4 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback S. Chevallier 28/10/2015 GSI 5 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Electroencephalography Most BCI rely on EEG ) Efficient to capture brain waves • Lightweight system • Low cost • Mature technologies • High temporal resolution • No trepanation S. Chevallier 28/10/2015 GSI 6 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Origins of EEG • Local field potentials • Electric potential difference between dendrite and soma • Maxwell’s equation • Quasi-static approximation • Volume conduction effect • Sensitive to conductivity of brain skull • Sensitive to tissue anisotropies S. Chevallier 28/10/2015 GSI 7 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental paradigms Different brain signals for BCI : • Motor imagery : (de)synchronization in premotor cortex • Evoked responses : low amplitude potentials induced by stimulus Steady-State Visually Evoked Potentials 8 electrodes in occipital region SSVEP stimulation LEDs 13 Hz 17 Hz 21 Hz • Neural synchronization with visual stimulation • No learning required, based on visual attention • Strong induced activation S. Chevallier 28/10/2015 GSI 8 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances BCI Challenges Limitations • Data scarsity ) A few sources are non-linearly mixed on all electrodes • Individual variabilities ) Effect of mental fatigue • Inter-session variabilities ) Electronic impedances, localizations of electrodes • Inter-individual variabilities ) State of the art approaches fail with 20% of subjects Desired properties : • Online systems ) Continously adapt to the user’s variations • No calibration phase ) Non negligible cognitive load, raises fatigue • Generic model classifiers and transfert learning ) Use data from one subject to enhance the results for another S. Chevallier 28/10/2015 GSI 9 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Spatial covariance matrices Common approach : spatial filtering • Efficient on clean datasets • Specific to each user and session ) Require user calibration • Two step training with feature selection ) Overfitting risk, curse of dimensionality Working with covariance matrices • Good generalization across subjects • Fast convergence • Existing online algorithms • Efficient implementations S. Chevallier 28/10/2015 GSI 10 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG • An EEG trial : X 2 RC⇥N , C electrodes, N time samples • Assuming that X ⇠ N(0, ⌃) • Covariance matrices ⌃ belong to MC = ⌃ 2 RC⇥C : ⌃ = ⌃| and x| ⌃x > 0, 8x 2 RC \0 • Mean of the set {⌃i }i=1,...,I is ¯⌃ = argmin⌃2MC PI i=1 dm (⌃i , ⌃) • Each EEG class is represented by its mean • Classification based on those means • How to obtain a robust and efficient algorithm ? Congedo, 2013 S. Chevallier 28/10/2015 GSI 11 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Minimum distance to Riemannian mean Simple and robust classifier • Compute the center ⌃ (k) E of each of the K classes • Assign a given unlabelled ˆ⌃ to the closest class k⇤ = argmin k (ˆ⌃, ⌃ (k) E ) Trajectories on tangent space at mean of all trials ¯⌃µ −4 −2 0 2 4 −4 −2 0 2 4 6 Resting class 13Hz class 21Hz class 17Hz class Delay S. Chevallier 28/10/2015 GSI 12 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Riemannian potato Removing outliers and artifacts Reject any ⌃i that lies too far from the mean of all trials ¯⌃µ z( i ) = i µ > zth , i is d(⌃i , ¯⌃), µ and are the mean and standard deviation of distances { i } I i=1 Raw matrices Riemannian potato filtering S. Chevallier 28/10/2015 GSI 13 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Covariance matrices for EEG-based BCI Riemannian approaches in BCI : • Achieve state of the art results ! performing like spatial filtering or sensor-space methods • Rely on simpler algorithms ! less error-prone, computationally efficient What are the reason of this success ? • Invariances embedded with Riemannian distances ! invariance to rescaling, normalization, whitening ! invariance to electrode permutation or positionning • Equivalent to working in an optimal source space ! spatial filtering are sensitive to outliers and user-specific ! no question on "sensors or sources" methods ) What are the most desirable invariances for EEG ? S. Chevallier 28/10/2015 GSI 14 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Considered distances and divergences Euclidean dE(⌃1, ⌃2) = k⌃1 ⌃2kF Log-Euclidean dLE(⌃1, ⌃2) = klog(⌃1) log(⌃2)kF V. Arsigny et al., 2006, 2007 Affine-invariant dAI(⌃1, ⌃2) = klog(⌃ 1 1 ⌃2)kF T. Fletcher & S. Joshi, 2004 , M. Moakher, 2005 ↵-divergence d↵ D(⌃1, ⌃2) 1<↵<1 = 4 1 ↵2 log det( 1 ↵ 2 ⌃1+ 1+↵ 2 ⌃2) det(⌃1) 1 ↵ 2 det(⌃2) 1+↵ 2 Z. Chebbi & M. Moakher, 2012 Bhattacharyya dB(⌃1, ⌃2) = ⇣ log det 1 2 (⌃1+⌃2) (det(⌃1) det(⌃2))1/2 ⌘1/2 Z. Chebbi & M. Moakher, 2012 S. Chevallier 28/10/2015 GSI 15 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Experimental results • Euclidean distances yield the lowest results ! Usually attributed to the invariance under inversion that is not guaranteed ! Displays swelling effect • Riemannian approaches outperform state-of-the-art methods (CCA+SVM) • ↵-divergence shows the best performances ! but requires a costly optimisation to find the best ↵ value • Bhattacharyya has the lowest computational cost and a good accuracy −1 −0.5 0 0.5 1 20 30 40 50 60 70 80 90 Accuracy(%) Alpha values (α) −1 −0.5 0 0.5 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 CPUtime(s) S. Chevallier 28/10/2015 GSI 16 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Conclusion Working with covariance matrices in BCI • Achieves very good results • Simple algorithms work well : MDM, Riemannian potato • Need for robust and online methods Interesting applications for IG : • Many freely available datasets • Several competitions • Many open source toolboxes for manipulating EEG Several open questions : • Handling electrodes misplacements and others artifacts • Missing data and covariance matrices of lower rank • Inter- and intra-individual variabilities S. Chevallier 28/10/2015 GSI 17 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Thank you ! S. Chevallier 28/10/2015 GSI 18 / 19 Brain-Computer Interfaces Spatial covariance matrices for BCI Experimental assessment of distances Interaction loop BCI loop 1 Acquisition 2 Preprocessing 3 Translation 4 User feedback First systems in early ’70 S. Chevallier 28/10/2015 GSI 19 / 19

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We consider the geodesic equation on the elliptical model, which is a generalization of the normal model. More precisely, we characterize this manifold from the group theoretical view point and formulate Eriksen’s procedure to obtain geodesics on normal model and give an alternative proof for it.
 
Group Theoretical Study on Geodesics for the Elliptical Models

Group Theoretical Study on Geodesics for the Elliptical Models Hiroto Inoue Kyushu University, Japan October 28, 2015 GSI2015, ´Ecole Polytechnique, Paris-Saclay, France Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 1 / 14 Overview 1 Eriksen’s construction of geodesics on normal model Problem 2 Reconsideration of Eriksen’s argument Embedding Nn → Sym+ n+1(R) 3 Geodesic equation on Elliptical model 4 Future work Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 2 / 14 Eriksen’s construction of geodesics on normal model Let Sym+ n (R) be the set of n-dimensional positive-definite matrices. The normal model Nn = (M, ds2) is a Riemannian manifold defined by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 ). The geodesic equation on Nn is ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0. (1) The solution of this geodesic equation has been obtained by Eriksen. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 3 / 14 Theorem ([Eriksen 1987]) For any x ∈ Rn, B ∈ Symn(R), define a matrix exponential Λ(t) by Λ(t) =   ∆ δ Φ tδ tγ tΦ γ Γ   := exp(−tA), A :=   B x 0 tx 0 −tx 0 −x −B   ∈ Mat2n+1. (2) Then, the curve (µ(t), Σ(t)) := (−∆−1δ, ∆−1) is the geodesic on Nn satisfiying the initial condition (µ(0), Σ(0)) = (0, In), ( ˙µ(0), ˙Σ(0)) = (x, B). (proof) We see that by the definition, (µ(t), Σ(t)) satisfies the geodesic equation. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 4 / 14 Problem 1 Explain Eriksen’s theorem, to clarify the relation between the normal model and symmetric spaces. 2 Extend Eriksen’s theorem to the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 5 / 14 Reconsideration of Eriksen’s argument Sym+ n+1(R) Notice that the positive-definite symmetric matrices Sym+ n+1(R) is a symmetric space by G/K Sym+ n+1(R) gK → g · tg, where G = GLn+1(R), K = O(n + 1). This space G/K has the G-invariant Riemannian metric ds2 = 1 2 tr (S−1 dS)2 . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 6 / 14 Embedding Nn → Sym+ n+1(R) Put an affine subgroup GA := P µ 0 1 P ∈ GLn(R), µ ∈ Rn ⊂ GLn+1(R). Define a Riemannian submanifold as the orbit GA · In+1 = {g · t g| g ∈ GA} ⊂ Sym+ n+1(R). Theorem (Ref. [Calvo, Oller 2001]) We have the following isometry Nn ∼ −→ GA · In+1 ⊂ Sym+ n+1(R), (Σ, µ) → Σ + µtµ µ tµ 1 . (3) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 7 / 14 Embedding Nn → Sym+ n+1(R) By using the above embedding, we get a simpler expression of the metric and the geodesic equation. Nn ∼= GA · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = Σ + µtµ µ tµ 1 metric ds2 = (tdµ)Σ−1(dµ) +1 2tr((Σ−1dΣ)2) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ = 0 ⇔ (In, 0)(S−1 ˙S) = (B, x) Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 8 / 14 Reconsideration of Eriksen’s argument We can interpret the Eriksen’s argument as follows. Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (B, x) A =   B x 0 t x 0 −t x 0 −x −B   −→ e−tA =   ∆ δ ∗ t δ ∗ ∗ ∗ ∗   −→ S := ∆ δ t δ −1 ∈ ∈ ∈ {A : JAJ = −A} −→ {Λ : JΛJ = Λ−1 } −→ Essential! Nn ∼= GA · In+1 ∩ ∩ ∩ sym2n+1(R) −→ exp Sym+ 2n+1(R) −→ projection Sym+ n+1(R) Here J =   In 1 In  . Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 9 / 14 Geodesic equation on Elliptical model Definition Let us define a Riemannian manifold En(α) = (M, ds2) by M = (µ, Σ) ∈ Rn × Sym+ n (R) , ds2 = (t dµ)Σ−1 (dµ) + 1 2 tr((Σ−1 dΣ)2 )+ 1 2 dα tr(Σ−1 dΣ) 2 . (4) where dα = (n + 1)α2 + 2α, α ∈ C. Then En(0) = Nn. The geodesic equation on En(α) is    ¨µ − ˙ΣΣ−1 ˙µ = 0, ¨Σ + ˙µt ˙µ − ˙ΣΣ−1 ˙Σ− dα ndα + 1 t ˙µΣ−1 ˙µΣ = 0. (5) This is equivalent to the geodesic equation on the elliptical model. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 10 / 14 Geodesic equation on Elliptical model The manifold En(α) is also embedded into positive-definite symmetric matrices Sym+ n+1(R), ref. [Calvo, Oller 2001], and we have simpler expression of the geodesic equation. En(α) ∼= ∃GA(α) · In+1 ⊂ Sym+ n+1(R) coordinate (Σ, µ) → S = |Σ|α Σ + µtµ µ tµ 1 metric (4) ⇔ ds2 = 1 2 tr (S−1dS)2 geodesic eq. (5) ⇔ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) |A| = det A Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 11 / 14 Geodesic equation on Elliptical model But, in general, we do not ever construct any submanifold N ⊂ Sym+ 2n+1(R) such that its projection is En(α): Differential equation Geodesic equation Λ−1 ˙Λ = −A −→ (In, 0)(S−1 ˙S) = (C, x) − α(log |S|) (In, 0) Λ(t) −→ S(t) ∈ ∈ N −→ En(α) ∼= GA(α) · In+1 ∩ ∩ Sym+ 2n+1(R) −→ projection Sym+ n+1(R) The geodesic equation on elliptical model has not been solved. Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 12 / 14 Future work 1 Extend Eriksen’s theorem for elliptical models (ongoing) 2 Find Eriksen type theorem for general symmetric spaces G/K Sketch of the problem: For a projection p : G/K → G/K, find a geodesic submanifold N ⊂ G/K, such that p|N maps all the geodesics to the geodesics: ∀Λ(t): Geodesic −→ p(Λ(t)): Geodesic ∈ ∈ N −→ p|N p(N) ∩ ∩ G/K −→ p:projection G/K Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 13 / 14 References Calvo, M., Oller, J.M. A distance between elliptical distributions based in an embedding into the Siegel group, J. Comput. Appl. Math. 145, 319–334 (2002). Eriksen, P.S. Geodesics connected with the Fisher metric on the multivariate normal manifold, pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987). Hiroto Inoue (Kyushu Uni.) Group Theoretical Study on Geodesics October 28, 2015 14 / 14

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce a class of paths or one-parameter models connecting arbitrary two probability density functions (pdf’s). The class is derived by employing the Kolmogorov-Nagumo average between the two pdf’s. There is a variety of such path connectedness on the space of pdf’s since the Kolmogorov-Nagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The one-parameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.
 
Path connectedness on a space of probability density functions

Path connectedness on a space of probability density functions Osamu Komori1 , Shinto Eguchi2 University of Fukui1 , Japan The Institute of Statistical Mathematics2 , Japan Ecole Polytechnique, Paris-Saclay (France) October 28, 2015 Komori, O. (University of Fukui) GSI2015 October 28, 2015 1 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 2 / 18 Setting Terminology . . X : data space P : probability measure on X FP: space of probability density functions associated with P We consider a path connecting f and g, where f, g ∈ FP, and investigate the property from a viewpoint of information geometry. Komori, O. (University of Fukui) GSI2015 October 28, 2015 3 / 18 Kolmogorov-Nagumo (K-N) average Let ϕ : (0, ∞) → R be an monotonic increasing and concave continuous function. Then for f and g in Fp The Kolmogorov-Nagumo (K-N) average . . ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) for 0 ≤ t ≤ 1. Remark 1 . . ϕ−1 is monotone increasing, convex and continuous on (0, ∞) Komori, O. (University of Fukui) GSI2015 October 28, 2015 4 / 18 ϕ-path Based on K-N average, we consider ϕ-path connecting f and g in FP: ϕ-path . . ft(x, ϕ) = ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) , where κt ≤ 0 is a normalizing factor, where the equality holds if t = 0 or t = 1. Komori, O. (University of Fukui) GSI2015 October 28, 2015 5 / 18 Existence of κt Theorem 1 . . There uniquely exists κt such that ∫ X ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) − κt ) dP(x) = 1 Proof From the convexity of ϕ−1 , we have 0 ≤ ∫ ϕ−1 ( (1 − t)ϕ(f(x)) + tϕ(g(x)) ) dP(x) ≤ ∫ {(1 − t)f(x) + tg(x)}dP(x) ≤ 1 And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing. Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equation above. Komori, O. (University of Fukui) GSI2015 October 28, 2015 6 / 18 Illustration of ϕ-path Komori, O. (University of Fukui) GSI2015 October 28, 2015 7 / 18 Examples of ϕ-path Example 1 . 1 ϕ0(x) = log(x). The ϕ0-path is given by ft(x, ϕ0) = exp((1 − t) log f(x) + t log g(x) − κt), where κt = log ∫ exp((1 − t) log f(x) + t log g(x))dP(x). 2 ϕη(x) = log(x + η) with η ≥ 0. The ϕη-path is given by ft(x, ϕη) = exp [ (1 − t) log{ f(x) + η} + t log{g(x) + η} − κt ] , where κt = log [ ∫ exp{(1 − t) log{f(x) + η} + t log{g(x) + η}}dP(x) − η ] . 3 ϕβ(x) = (xβ − 1)/β with β ≤ 1. The ϕβ-path is given by ft(x, ϕβ) = {(1 − t)f(x)β + tg(x)β − κt} 1 β , where κt does not have an explicit form. Komori, O. (University of Fukui) GSI2015 October 28, 2015 8 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 9 / 18 Extended expectation For a function a(x): X → R, we consider Extended expectation . . E(ϕ) f {a(X)} = ∫ X 1 ϕ′(f(x)) a(x)dP(x) ∫ X 1 ϕ′(f(x)) dP(x) , where ϕ: (0, ∞) → R is a generator function. Remark 2 If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation. Komori, O. (University of Fukui) GSI2015 October 28, 2015 10 / 18 Properties of extended expectation We note that 1 E(ϕ) f (c) = c for any constant c. 2 E(ϕ) f {ca(X)} = cE(ϕ) f {a(X)} for any constant c. 3 E(ϕ) f {a(X) + b(X)} = E(ϕ) f {a(X)} + E(ϕ) f {b(X)}. 4 E(ϕ) f {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 for P-almost everywhere x in X. Remark 3 If we define f(ϕ) (x) = 1/ϕ′ ( f(x))/ ∫ X 1/ϕ′ (f(x))dP(x), then E(ϕ) f {a(X)} = Ef(ϕ) {a(X)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 11 / 18 Tangent space of FP Let Hf be a Hilbert space with the inner product defined by ⟨a, b⟩f = E(ϕ) f {a(X)b(X)}, and the tangent space Tangent space associated with extended expectation . . Tf = {a ∈ Hf : ⟨a, 1⟩f = 0}. For a statistical model M = { fθ(x)}θ∈Θ we have E(ϕ) fθ {∂iϕ(fθ(X))} = 0 for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi)i=1,··· ,p. Further, E(ϕ) fθ {∂i∂jϕ(fθ(X))} = E(ϕ) fθ { ϕ′′ ( fθ(X)) ϕ′(fθ(X))2 ∂iϕ(fθ(X))∂iϕ(fθ(X)) } . Komori, O. (University of Fukui) GSI2015 October 28, 2015 12 / 18 Parallel displacement A(ϕ) t Define A(ϕ) t (x) in Tft by the solution for a differential equation ˙A(ϕ) t (x) − E(ϕ) ft { A(ϕ) t ˙ft ϕ′′ ( ft) ϕ′(ft) } = 0, where ft is a path connecting f and g such that f0 = f and f1 = g. ˙A(ϕ) t (x) is the derivative of A(ϕ) t (x) with respect to t. Theorem 2 The geodesic curve {ft}0≤t≤1 by the parallel displacement A(ϕ) t is the ϕ-path. Komori, O. (University of Fukui) GSI2015 October 28, 2015 13 / 18 Contents 1 Kolmogorov-Nagumo (K-N) average 2 parallel displacement A(ϕ) t characterizing ϕ-path 3 U-divergence and its associated geodesic Komori, O. (University of Fukui) GSI2015 October 28, 2015 14 / 18 U-divergence Assume that U(s) is a convex and increasing function of a scalar s and let ξ(t) = argmaxs{st − U(s)} . Then we have U-divergence . . DU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP − ∫ {U(ξ(f)) − fξ( f)}dP. In fact, U-divergence is the difference of the cross entropy CU( f, g) with the diagonal entropy CU( f, f), where CU(f, g) = ∫ {U(ξ(g)) − fξ(g)}dP. Komori, O. (University of Fukui) GSI2015 October 28, 2015 15 / 18 Connections based on U-divergence For a manifold of finite dimension M = { fθ(x) : θ ∈ Θ} and vector fields X and Y on M, the Riemannian metric is G(U) (X, Y)(f) = ∫ X f Yξ( f)dP for f ∈ M and linear connections ∇(U) and ∇∗(U) are G(U) (∇(U) X Y, Z)(f) = ∫ XY f Zξ(f)dP and G(U) (∇∗ X (U) Y, Z)(f) = ∫ Z f XYξ(f)dP. See Eguchi (1992) for details. Komori, O. (University of Fukui) GSI2015 October 28, 2015 16 / 18 Equivalence between ∇∗ -geodesic and ξ-path Let ∇(U) and ∇∗(U) be linear connections associated with U-divergence DU, and let C(ϕ) = {ft(x, ϕ) : 0 ≤ t ≤ 1} be the ϕ path connecting f and g of FP. Then, we have Theorem 3 A ∇(U) -geodesic curve connecting f and g is equal to C(id) , where id denotes the identity function; while a ∇∗(U) -geodesic curve connecting f and g is equal to C(ξ) , where ξ(t) = argmaxs{st − U(s)}. Komori, O. (University of Fukui) GSI2015 October 28, 2015 17 / 18 Summary 1 We consider ϕ-path based on Kolmogorov-Nagumo average. 2 The relation between U-divergence and ϕ-path was investigated (ϕ corresponds to ξ). 3 The idea of ϕ-path can be applied to probability density estimation as well as classification problems. 4 Divergence associated with ϕ-path can be considered, where a special case would be Bhattacharyya divergence. Komori, O. (University of Fukui) GSI2015 October 28, 2015 18 / 18

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Computational Information Geometry: mixture modelling

Computational Information Geometry... ...in mixture modelling Computational Information Geometry: mixture modelling Germain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 . 1 The Open University (EPSRC grant EP/L010429/1), United Kingdom 2 University of Waterloo, USA GSI15, 28-30 October 2015, Paris Germain Van Bever CIG for mixtures 1/19 Computational Information Geometry... ...in mixture modelling Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 2/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 3/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Generalities The use of geometry in statistics gave birth to many different approaches. Traditionally, Information geometry refers to the application of differential geometry to statistical theory and practice. The main ingredients of IG in exponential families (Amari, 1985) are 1 the manifold of parameters M, 2 the Riemannian (Fisher information) metric g, and 3 the set of affine connections { −1 , +1 } (mixture and exponential connections). These allow to define notions of curvature, dimension reduction or information loss and invariant higher order expansions. Two affine structures (maps on M) are used simultaneously: -1: Mixture affine geometry on probability measures: λf(x) + (1 − λ)g(x). +1: Exponential affine geometry on probability measures: C(λ)f(x)λ g(x)(1−λ) Germain Van Bever CIG for mixtures 4/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Computational Information Geometry This talk is about Computational Information Geometry (CIG, Critchley and Marriott, 2014). 1 In CIG, the multinomial model provides, modulo, discretization, a universal model. It therefore moves from the manifold-based systems to simplex-based geometries and allows for different supports in the extended simplex. 2 It provides a unifying framework for different geometries. 3 Tractability of the geometry allows for efficient algorithms in a computational framework. It is inherently finite and discrete. The impact of discretization is studied. A working model will be a subset of the simplex. Germain Van Bever CIG for mixtures 5/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Multinomial distributions X ∼ Mult(π0, . . . , πk), π = (π0, . . . , πk) ∈ int(∆k ), with ∆k := π : πi ≥ 0, k i=0 πi = 1 . In this case, π(0) = (π1 , . . . , πk ) is the mean parameter, while η = log(π(0) /π0) is the natural parameter. Studying limits gives extended exponential families on the closed simplex (Csiszár and Matúš, 2005). 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 mixed geodesics in -1-space π1 π2 -6 -4 -2 0 2 4 6 -6-4-20246 mixed geodesics in +1-space η1 η2 Germain Van Bever CIG for mixtures 6/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Restricting to the multinomials families Under regular exponential families with compact support, the cost of discretization on the components of Information Geometry is bounded! The same holds true for the MLE and the log-likelihood function. The log-likelihood (x, π) = k i=0 ni log(πi) is (i) strictly concave (in the −1-representation) on the observed face (counts ni > 0), (ii) strictly decreasing in the normal direction towards the unobserved face (ni = 0), and, otherwise, (iii) constant. Considering an infinite-dimensional simplex allows to remove the compactness assumption (Critchley and Marriott, 2014). Germain Van Bever CIG for mixtures 7/19 Computational Information Geometry... ...in mixture modelling Information Geometry CIG Binomial subfamilies A (discrete) example: Binomial distributions as a subfamily of multinomial distributions. Let X ∼ Bin(k, p). Then, X can be seen as a subfamily of M = {X|X ∼ Mult(π0, . . . , πk)} , with πi(p) = k i pi (1 − p)k−i . Figure: Left: Embedded binomial (k = 2) in the 2-simplex. Right: Embedded binomial (k = 3) in the 3-simplex. Germain Van Bever CIG for mixtures 8/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Outline 1 Computational Information Geometry... Information Geometry CIG 2 ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Germain Van Bever CIG for mixtures 9/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Mixture distributions The generic mixture distribution is f(x; Q) = f(x; θ)dQ(θ), that is, a mixture of (regular) parametric distributions. Regularity: same support S, abs. cont. with respect to measure ν. Mixture distributions arise naturally in many statistical problems, including Overdispersed models Random effects ANOVA Random coefficient regression models and measurement error models Graphical models and many more Germain Van Bever CIG for mixtures 10/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Hard mixture problems Inference in the class of mixture distributions generates well-known difficulties: Identifiability issues: Without imposing constraints on the mixing distribution Q, there may exist Q1 and Q2 such that f(x; Q1) = f(x; θ)dQ1(θ) = f(x; θ)dQ2(θ) = f(x; Q2). Byproduct: parametrisation issues. Byproduct: multimodal likelihood functions. Boundary problems. Byproduct: singularities in the likelihood function. Germain Van Bever CIG for mixtures 11/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions NPMLE Finite mixtures are essential to the geometry. Lindsay argues that nonparametric estimation of Q is necessary. Also, Theorem The loglikelihood (Q) = n s=1 log Ls(Q) = n s=1 log f(xs; θ)dQ(θ) , has a unique maximum over the space of all distribution functions Q. Furthermore, the maximiser ˆQ is a discrete distribution with no more than D distinct points of support, where D is the number of distinct points in (x1, . . . , xn). The likelihood on the space of mixtures is therefore defined on the convex hull of the image of θ → (L1(θ), . . . , LD(θ)). Finding the NPMLE amounts to maximize a concave function over this convex set. Germain Van Bever CIG for mixtures 12/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Limits to convex geometry Knowing the shape of the likelihood on the whole simplex (and not only on the observed face) give extra insight. Convex geometry correctly captures the −1-geometry of the simplex but NOT the 0 and +1 geometries (for example, Fisher information requires to know the full sample space). Understanding the (C)IG of mixtures in the simplex will therefore provide extra tools (and algorithms) in mixture modelling. In this talk, we mention results on 1 (−1)-dimensionality of exponential families in the simplex. 2 convex polytopes approximation algorithms: Information geometry can give efficient approximation of high dimensional convex hulls by polytopes Germain Van Bever CIG for mixtures 13/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Local mixture models (IG) Parametric vs nonparametric dilemma. Geometric analysis allows low-dimensional approximation in local setups. Theorem (Marriott, 2002) If f(x; θ) is a n-dim exponential family with regularity conditions, Qλ(θ) is a local mixing around θ0, then f(x; Qλ) = f(x; θ)dQλ(θ) has the expansion f(x; Qλ) − f(x; θ0) − n i=1 λi ∂ ∂θi f(x; θ0) − n i,j=1 λij ∂2 ∂θi∂θj f(x; θ0) = O(λ−3 ). This is equivalent to f(x; Qλ) + O(λ−3 ) ∈ T2 Mθ0 . If the density f(x; θ) and all its derivatives are bounded, then the approximation will be uniform in x. Germain Van Bever CIG for mixtures 14/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Dimensionality in CIG It is therefore possible to approximate mixture distributions with low-dimensional families. In contrast, the (−1)−representation of any generic exponential family on the simplex will always have full dimension. The following result is even more general. Theorem (VB et al.) The −1-convex hull of an open subset of a exponential subfamily of M with tangent dimension k − d has dimension at least k − d. Corollary (Critchley and Marriott, 2014) The −1-convex hull of an open subset of a generic one dimensional subfamily of M is of full dimension. The tangent dimension is the maximal number of different components of any (+1) tangent vector to the exponential family. Generic ↔ tangent dimension= k, i.e. the tangent vector has distinct components. Germain Van Bever CIG for mixtures 15/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Example: Mixture of binomials As mentioned, IG gives efficient approximation by polytopes. IG maximises concave function on (convex) polytopes. Example: toxicological data (Kupper and Haseman, 1978). ‘simple one-parameter binomial [...] models generally provides poor fits to this type of binary data’. Germain Van Bever CIG for mixtures 16/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Approximation in CIG Define the norm ||π||π0 = k i=1 π2 i /πi,0 (preferred point metric, Critchley et al., 1993). Let π(θ) be an exponential family and ∪Si be a polytope surface. Define the distance function as d(π(θ), π0) := inf π∈∪Si ||π(θ) − π||π0 . Theorem (Anaya-Izquierdo et al.) Let ∪Si be such that d(π(θ)) ≤ for all θ. Then (ˆπNP MLE ) − (ˆπ) ≤ N||(ˆπG − ˆπNP MLE )||ˆπ + o( ), where (ˆπG )i = ni/N and ˆπ is the NPMLE on ∪Si. Germain Van Bever CIG for mixtures 17/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions Summary High-dimensional (extended) multinomial space is used as a proxy for the ‘space of all models’. This computational approach encompasses Amari’s information geometry and Lindsay’s convex geometry... ...while having a tractable and mostly explicit geometry, which allows for a computational theory. Future work Converse of the dimensionality result (−1 to +1) Long term aim: implementing geometric theories within a R package/software. Germain Van Bever CIG for mixtures 18/19 Computational Information Geometry... ...in mixture modelling Introduction Lindsay’s convex geometry (C)IG for mixture distributions References: Amari, S-I (1985), Differential-geometrical methods in statistics, Springer-Verlag. Anaya-Izquierdo, K., Critchley, F., Marriott, P. and Vos, P. (2012), Computational information geometry: theory and practice, Arxiv report, 1209.1988v1. Critchley, F., Marriott, P. and Salmon, M. (1993), Preferred point geometry and statistical manifolds, The Annals of Statistics, 21, 3, 1197-1224. Critchley, F. and Marriott, P. (2014), Computational Information Geometry in Statistics: Theory and Practice, Entropy, 16, 2454-2471. Csiszár, I. and Matúš, F. (2005), Closures of exponential families, The Annals of Probabilities, 33, 2, 582-600. Kupper L.L., and Haseman J.K., (1978), The Use of a Correlated Binomial Model for the Analysis of Certain Toxicological Experiments, Biometrics, 34, 1, 69-76. Marriott, P. (2002), On the local geometry of mixture models, Biometrika, 89, 1, 77-93. Germain Van Bever CIG for mixtures 19/19

Bayesian and Information Geometry for Inverse Problems (chaired by Ali Mohammad-Djafari, Olivier Swander)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We review the manifold projection method for stochastic nonlinear filtering in a more general setting than in our previous paper in Geometric Science of Information 2013. We still use a Hilbert space structure on a space of probability densities to project the infinite dimensional stochastic partial differential equation for the optimal filter onto a finite dimensional exponential or mixture family, respectively, with two different metrics, the Hellinger distance and the L2 direct metric. This reduces the problem to finite dimensional stochastic differential equations. In this paper we summarize a previous equivalence result between Assumed Density Filters (ADF) and Hellinger/Exponential projection filters, and introduce a new equivalence between Galerkin method based filters and Direct metric/Mixture projection filters. This result allows us to give a rigorous geometric interpretation to ADF and Galerkin filters. We also discuss the different finite-dimensional filters obtained when projecting the stochastic partial differential equation for either the normalized (Kushner-Stratonovich) or a specific unnormalized (Zakai) density of the optimal filter.
 
Stochastic PDE projection on manifolds Assumed-Density and Galerkin Filters

Stochastic PDE projection on manifolds: Assumed-Density and Galerkin Filters GSI 2015, Oct 28, 2015, Paris Damiano Brigo Dept. of Mathematics, Imperial College, London www.damianobrigo.it — Joint work with John Armstrong Dept. of Mathematics, King’s College, London — Full paper to appear in MCSS, see also arXiv.org D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 1 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Spaces of densities Spaces of probability densities Consider a parametric family of probability densities S = {p(·, θ), θ ∈ Θ ⊂ Rm }, S1/2 = { p(·, θ), θ ∈ Θ ⊂ Rm }. If S (or S1/2) is a subset of a function space having an L2 structure (⇒ inner product, norm & metric), then we may ask whether p(·, θ) → θ Rm , ( p(·, θ) → θ respectively) is a Chart of a m-dim manifold (?) S (S1/2). The topology & differential structure in the chart is the L2 structure, but two possibilities: S : d2(p1, p2) = p1 − p2 (L2 direct distance), p1,2 ∈ L2 S1/2 : dH( √ p1, √ p2) = √ p1 − √ p2 (Hellinger distance), p1,2 ∈ L1 where · is the norm of Hilbert space L2. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 2 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Inner Products, Metrics and Projections Manifolds, Charts and Tangent Vectors Tangent vectors, metrics and projection If ϕ : θ → p(·, θ) (θ → p(·, θ) resp.) is the inverse of a chart then { ∂ϕ(·, θ) ∂θ1 , · · · , ∂ϕ(·, θ) ∂θm } are linearly independent L2(λ) vector that span Tangent Space at θ. The inner product of 2 basis elements is defined (L2 structure) ∂p(·, θ) ∂θi ∂p(·, θ) ∂θj = 1 4 ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 γij(θ) . ∂ √ p ∂θi ∂ √ p ∂θj = 1 4 1 p(x, θ) ∂p(x, θ) ∂θi ∂p(x, θ) ∂θj dx = 1 4 gij(θ) . γ(θ): direct L2 matrix (d2); g(θ): famous Fisher-Rao matrix (dH) d2 ort. projection: Πγ θ [v] = m i=1 [ m j=1 γij (θ) v, ∂p(·, θ) ∂θj ] ∂p(·, θ) ∂θi (dH proj. analogous inserting √ · and replacing γ with g) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 3 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dXt = ft (Xt ) dt + σt (Xt ) dWt , X0, (signal) dYt = bt (Xt ) dt + dVt , Y0 = 0 (noisy observation) (1) These are Itˆo SDE’s. We use both Itˆo and Stratonovich (Str) SDE’s. Str SDE’s are necessary to deal with manifolds, since second order Itˆo terms not clear in terms of manifolds [16], although we are working on a direct projection of Ito equations with good optimality properties (John Armstrong) The nonlinear filtering problem consists in finding the conditional probability distribution πt of the state Xt given the observations up to time t, i.e. πt (dx) := P[Xt ∈ dx | Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t). Assume πt has a density pt : then pt satisfies the Str SPDE: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 4 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Nonlinear filtering problem The nonlinear filtering problem for diffusion signals dpt = L∗ t pt dt − 1 2 pt [|bt |2 − Ept {|bt |2 }] dt + d k=1 pt [bk t − Ept {bk t }] ◦ dYk t . with the forward operator L∗ t φ = − n i=1 ∂ ∂xi [fi t φ] + 1 2 n i,j=1 ∂2 ∂xi ∂xj [aij t φ] ∞-dimensional SPDE. Solutions for even toy systems the like cubic sensor, f = 0, σ = 1, b = x3, do not belong in any finite dim p(·, θ) [19]. We need finite dimensional approximations. We can project SPDE according to either the L2 direct metric (γ(θ)) or, by deriving the analogous equation for √ pt , according to the Hellinger metric (g(θ)). Projection transforms the SPDE to a finite dimensional SDE for θ via the chain rule (hence Str calculus): dp(·, θt ) = m j=1 ∂p(·,θ) ∂θj ◦ dθj(t). With Ito calculus we would have terms ∂2p(·,θ) ∂θi ∂θj d θi, θj (not tang vec) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 5 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Nonlinear Projection Filtering Projection Filters Projection filter in the metrics h (L2) and g (Fisher) dθi t =   m j=1 γij (θt ) L∗ t p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 γij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 γij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . The above is the projected equation in d2 metric and Πγ . Instead, using the Hellinger distance & the Fisher metric with projection Πg dθi t =   m j=1 gij (θt ) L∗ t p(x, θt ) p(x, θt ) ∂p(x, θt ) ∂θj dx − m j=1 gij (θt ) 1 2 |bt (x)|2 ∂p ∂θj dx   dt + d k=1 [ m j=1 gij (θt ) bk t (x) ∂p(x, θt ) ∂θj dx] ◦ dYk t , θi 0 . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 6 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Exponential Families Choosing the family/manifold: Exponential In past literature and in several papers in Bernoulli, IEEE Automatic Control etc, B. Hanzon and LeGland have developed a theory for the projection filter using the Fisher metric g and exponential families p(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination: The tangent space has a simple structure: square roots do not complicate issues thanks to the exponential structure. The Fisher matrix has a simple structure: ∂2 θi ,θj ψ(θ) = gij(θ) The structure of the projection Πg is simple for exp families Special exp family with Y-function b among c(x) exponents makes filter correction step (projection of dY term) exact One can define both a local and global filtering error through dH Alternative coordinates, expectation param., η = Eθ[c] = ∂θψ(θ). Projection filter in η coincides with classical approx filter: assumed density filter (based on generalized “moment matching”) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 7 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families However, exponential families do not couple as well with the metric γ(θ). Is there some important family for which the metric γ(θ) is preferable to the classical Fisher metric g(θ), in that the metric, the tangent space and the filter equations are simpler? The answer is affirmative, and this is the mixture family. We define a simple mixture family as follows. Given m + 1 fixed squared integrable probability densities q = [q1, q2, . . . , qm+1]T , define ˆθ(θ) := [θ1, θ2, . . . , θm, 1 − θ1 − θ2 − . . . − θm]T for all θ ∈ Rm. We write ˆθ instead of ˆθ(θ). Mixture family (simplex): SM (q) = {ˆθ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1} D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 8 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Choice of the family Mixture Families Mixture families If we consider the L2 / γ(θ) distance, the metric γ(θ) itself and the related projection become very simple. Indeed, ∂p(·, θ) ∂θi = qi −qm+1 and γij(θ) = (qi(x)−qm(x))(qj(x)−qm(x))dx (NO inline numeric integr). The L2 metric does not depend on the specific point θ of the manifold. The same holds for the tangent space at p(·, θ), which is given by span{q1 − qm+1, q2 − qm+1, · · · , qm − qm+1} Also the L2 projection becomes particularly simple. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 9 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter Armstrong and B. (MCSS 2016 [3]) show that the mixture family + metric γ(θ) lead to a Projection filter that is the same as approximate filtering via Galerkin [5] methods. See the full paper for the details. Summing up: Family → Exponential Basic Mixture Metric ↓ Hellinger dH Good Nothing special Fisher g(θ) ∼ADF ≈ local moment matching Direct L2 d2 Nothing special Good matrix γ(θ) (∼Galerkin) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 10 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter Mixture Projection Filter However, despite the simplicity above, the mixture family has an important drawback: for all θ, filter mean is constrained min i mean of qi ≤ mean of p(·, θ) ≤ max i mean of qi As a consequence, we are going to enrich our family to a mixture where some of the parameters are also in the core densities q. Specifically, we consider a mixture of GAUSSIAN DENSITIES with MEANS AND VARIANCES in each component not fixed. For example for a mixture of two Gaussians we have 5 parameters. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x), param. θ, µ1, v1, µ2, v2 We are now going to illustrate the Gaussian mixture projection filter (GMPF) in a fundamental example. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 11 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor The quadratic sensor Consider the quadratic sensor dXt = σdWt dYt = X2 dt + σdVt . The measurements tell us nothing about the sign of X Once it seems likely that the state has moved past the origin, the distribution will become nearly symmetrical We expect a bimodal distribution θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (pink) vs EKF (N) (blue) vs exact (green, finite diff. method, grid 1000 state & 5000 time) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 12 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 0 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 13 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 1 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 14 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 2 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 15 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 3 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 16 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 4 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 17 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 5 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 18 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 6 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 19 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 7 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 20 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 8 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 21 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 9 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 22 / 37 Mixture Projection Filter The quadratic sensor Simulation for the Quadratic Sensor 0 0.2 0.4 0.6 0.8 1 -8 -6 -4 -2 0 2 4 6 8 X Distribution at time 10 Projection Exact Extended Kalman Exponential D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 23 / 37 Mixture Projection Filter The quadratic sensor Comparing local approximation errors (L2 residuals) εt ε2 t = (pexact,t (x) − papprox,t (x))2 dx papprox,t (x): three possible choices. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (blue) vs EKF (N) (green) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 24 / 37 Mixture Projection Filter The quadratic sensor L2 residuals for the quadratic sensor 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 25 / 37 Mixture Projection Filter The quadratic sensor Comparing local approx errors (Prokhorov residuals) εt εt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) + ∀x} with F the CDF of p’s. Levy-Prokhorov metric works well with singular densities like particles where L2 metric not ideal. θpN(µ1,v1)(x) + (1 − θ)pN(µ2,v2)(x) (red) vs eθ1x+θ2x2+θ3x3+θ4x4−ψ(θ) (green) vs best three particles (blue) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 26 / 37 Mixture Projection Filter The quadratic sensor L´evy residuals for the quadratic sensor 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 1 2 3 4 5 6 7 8 9 10 Time ProkhorovResiduals Prokhorov Residual (L2NM) Prokhorov Residual (HE) Best possible residual (3Deltas) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 27 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Mixture Projection Filter Cubic sensors Cubic sensors 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 Time Residuals Projection Residual (L2 norm) Extended Kalman Residual (L2 norm) Hellinger Projection Residual (L2 norm) Qualitatively similar results up to a stopping time As one approaches the boundary γij becomes singular The solution is to dynamically change the parameterization and even the dimension of the manifold. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 28 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Conclusions Approximate finite-dimensional filtering by rigorous projection on a chosen manifold of densities Projection uses overarching L2 structure Two different metrics: direct L2 and Hellinger/Fisher (L2 on √ .) Fisher works well with exponential families: multimodality, correction step exact, simplicity of implementation equivalence with Assumed Density Filters “moment matching” Direct L2 works well with mixture families even simpler filter equations, no inline numerical integration basic version equivalent to Galerkin methods suited also for multimodality (quadratic sensor tests, L2 global error) comparable with particle methods Further investigation: convergence, more on optimality? Optimality: introducing new projections (forthcoming J. Armstrong) D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 29 / 37 Conclusions and References Thanks With thanks to the organizing committee. Thank you for your attention. Questions and comments welcome D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 30 / 37 Conclusions and References References I [1] J. Aggrawal: Sur l’information de Fisher. In: Theories de l’Information (J. Kampe de Feriet, ed.), Springer-Verlag, Berlin–New York 1974, pp. 111-117. [2] Amari, S. Differential-geometrical methods in statistics, Lecture notes in statistics, Springer-Verlag, Berlin, 1985 [3] Armstrong, J., and Brigo, D. (2016). Nonlinear filtering via stochastic PDE projection on mixture manifolds in L2 direct metric, Mathematics of Control, Signals and Systems, 2016, accepted. [4] Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W. (1999). Nonlinear Projection Filter based on Galerkin approximation. AIAA Journal of Guidance Control and Dynamics, 22 (2): 258-266. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 31 / 37 Conclusions and References References II [5] Beard, R. and Gunther, J. (1997). Galerkin Approximations of the Kushner Equation in Nonlinear Estimation. Working Paper, Brigham Young University. [6] Barndorff-Nielsen, O.E. (1978). Information and Exponential Families. John Wiley and Sons, New York. [7] Brigo, D. Diffusion Processes, Manifolds of Exponential Densities, and Nonlinear Filtering, In: Ole E. Barndorff-Nielsen and Eva B. Vedel Jensen, editor, Geometry in Present Day Science, World Scientific, 1999 [8] Brigo, D, On SDEs with marginal laws evolving in finite-dimensional exponential families, STAT PROBABIL LETT, 2000, Vol: 49, Pages: 127 – 134 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 32 / 37 Conclusions and References References III [9] Brigo, D. (2011). The direct L2 geometric structure on a manifold of probability densities with applications to Filtering. Available on arXiv.org and damianobrigo.it [10] Brigo, D, Hanzon, B, LeGland, F, A differential geometric approach to nonlinear filtering: The projection filter, IEEE T AUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252 [11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear filtering by projection on exponential manifolds of densities, BERNOULLI, 1999, Vol: 5, Pages: 495 – 534 [12] D. Brigo, Filtering by Projection on the Manifold of Exponential Densities, PhD Thesis, Free University of Amsterdam, 1996. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 33 / 37 Conclusions and References References IV [13] Brigo, D., and Pistone, G. (1996). Projecting the Fokker-Planck Equation onto a finite dimensional exponential family. Available at arXiv.org [14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbook of Nonlinear Filtering, Oxford University Press. [15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear filtering, in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: The Mathematics of Filtering and Identification and Applications (Reidel, Dordrecht, 1981) 53–75. [16] Elworthy, D. (1982). Stochastic Differential Equations on Manifolds. LMS Lecture Notes. D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 34 / 37 Conclusions and References References V [17] Hanzon, B. A differential-geometric approach to approximate nonlinear filtering. In C.T.J. Dodson, Geometrization of Statistical Theory, pages 219 – 223,ULMD Publications, University of Lancaster, 1987. [18] B. Hanzon, Identifiability, recursive identification and spaces of linear dynamical systems, CWI Tracts 63 and 64, CWI, Amsterdam, 1989 [19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence of finite dimensional filters for conditional statistics of the cubic sensor problem, Systems and Control Letters 3 (1983) 331–340. [20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes. Grundlehren der Mathematischen Wissenschaften, vol. 288 (1987), Springer-Verlag, Berlin, D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 35 / 37 Conclusions and References References VI [21] A. H. Jazwinski, Stochastic Processes and Filtering Theory, Academic Press, New York, 1970. [22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochastic differential equations for the non linear filtering problem. Osaka J. Math. Volume 9, Number 1 (1972), 19-40. [23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets of Probability Distributions. Presented at the 1st International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 29 June - 2 July 1999 [24] R. Z. Khasminskii (1980). Stochastic Stability of Differential Equations. Alphen aan den Reijn [25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I, General Theory (Springer Verlag, Berlin, 1978). D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 36 / 37 Conclusions and References References VII [26] M. Murray and J. Rice - Differential geometry and statistics, Monographs on Statistics and Applied Probability 48, Chapman and Hall, 1993. [27] D. Ocone, E. Pardoux, A Lie algebraic criterion for non-existence of finite dimensionally computable filters, Lecture notes in mathematics 1390, 197–204 (Springer Verlag, 1989) [28] Pistone, G., and Sempi, C. (1995). An Infinite Dimensional Geometric Structure On the space of All the Probability Measures Equivalent to a Given one. The Annals of Statistics 23(5), 1995 D. Brigo and J. Armstrong (ICL and KCL) SPDE Projection Filters GSI 2015 37 / 37

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Clustering, classification and Pattern Recognition in a set of data are between the most important tasks in statistical researches and in many applications. In this paper, we propose to use a mixture of Student-t distribution model for the data via a hierarchical graphical model and the Bayesian framework to do these tasks. The main advantages of this model is that the model accounts for the uncertainties of variances and covariances and we can use the Variational Bayesian Approximation (VBA) methods to obtain fast algorithms to be able to handle large data sets.
 
Variational Bayesian Approximation method for Classification and Clustering with a mixture of Studen

. Variational Bayesian Approximation method for Classification and Clustering with a mixture of Student-t model Ali Mohammad-Djafari Laboratoire des Signaux et Syst`emes (L2S) UMR8506 CNRS-CentraleSup´elec-UNIV PARIS SUD SUPELEC, 91192 Gif-sur-Yvette, France http://lss.centralesupelec.fr Email: djafari@lss.supelec.fr http://djafari.free.fr http://publicationslist.org/djafari A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 1/20 Contents 1. Mixture models 2. Different problems related to classification and clustering Training Supervised classification Semi-supervised classification Clustering or unsupervised classification 3. Mixture of Student-t 4. Variational Bayesian Approximation 5. VBA for Mixture of Student-t 6. Conclusion A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 2/20 Mixture models General mixture model p(x|a, Θ, K) = K k=1 ak pk(xk|θk), 0 < ak < 1 Same family pk(xk|θk) = p(xk|θk), ∀k Gaussian p(xk|θk) = N(xk|µk, Σk) with θk = (µk, Σk) Data X = {xn, n = 1, · · · , N} where each element xn can be in one of these classes cn. ak = p(cn = k), a = {ak, k = 1, · · · , K}, Θ = {θk, k = 1, · · · , K} p(Xn, cn = k|a, θ) = N n=1 p(xn, cn = k|a, θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 3/20 Different problems Training: Given a set of (training) data X and classes c, estimate the parameters a and Θ. Supervised classification: Given a sample xm and the parameters K, a and Θ determine its class k∗ = arg max k {p(cm = k|xm, a, Θ, K)} . Semi-supervised classification (Proportions are not known): Given sample xm and the parameters K and Θ, determine its class k∗ = arg max k {p(cm = k|xm, Θ, K)} . Clustering or unsupervised classification (Number of classes K is not known): Given a set of data X, determine K and c. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 4/20 Training Given a set of (training) data X and classes c, estimate the parameters a and Θ. Maximum Likelihood (ML): (a, Θ) = arg max (a,Θ) {p(X, c|a, Θ, K)} . Bayesian: Assign priors p(a|K) and p(Θ|K) = K k=1 p(θk) and write the expression of the joint posterior laws: p(a, Θ|X, c, K) = p(X, c|a, Θ, K) p(a|K) p(Θ|K) p(X, c|K) where p(X, c|K) = p(X, c|a, Θ|K)p(a|K) p(Θ|K) da dΘ Infer on a and Θ either as the Maximum A Posteriori (MAP) or Posterior Mean (PM). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 5/20 Supervised classification Given a sample xm and the parameters K, a and Θ determine p(cm = k|xm, a, Θ, K) = p(xm, cm = k|a, Θ, K) p(xm|a, Θ, K) where p(xm, cm = k|a, Θ, K) = akp(xm|θk) and p(xm|a, Θ, K) = K k=1 ak p(xm|θk) Best class k∗: k∗ = arg max k {p(cm = k|xm, a, Θ, K)} A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 6/20 Semi-supervised classification Given sample xm and the parameters K and Θ (not the proportions a), determine the probabilities p(cm = k|xm, Θ, K) = p(xm, cm = k|Θ, K) p(xm|Θ, K) where p(xm, cm = k|Θ, K) = p(xm, cm = k|a, Θ, K)p(a|K) da and p(xm|Θ, K) = K k=1 p(xm, cm = k|Θ, K) Best class k∗, for example the MAP solution: k∗ = arg max k {p(cm = k|xm, Θ, K)} . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 7/20 Clustering or non-supervised classification Given a set of data X, determine K and c. Determination of the number of classes: p(K = L|X) = p(X, K = L) p(X) = p(X|K = L) p(K = L) p(X) and p(X) = L0 L=1 p(K = L) p(X|K = L), where L0 is the a priori maximum number of classes and p(X|K = L) = n L k=1 akp(xn, cn = k|θk)p(a|K) p(Θ|K) da dΘ When K and c are determined, we can also determine the characteristics of those classes a and Θ. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 8/20 Mixture of Student-t model Student-t and its Infinite Gaussian Scaled Model (IGSM): T (x|ν, µ, Σ) = ∞ 0 N(x|µ, z−1 Σ) G(z| ν 2 , ν 2 ) dz where N(x|µ, Σ)= |2πΣ|−1 2 exp −1 2(x − µ) Σ−1 (x − µ) = |2πΣ|−1 2 exp −1 2Tr (x − µ)Σ−1 (x − µ) and G(z|α, β) = βα Γ(α) zα−1 exp [−βz] . Mixture of Student-t: p(x|{νk, ak, µk, Σk, k = 1, · · · , K}, K) = K k=1 ak T (xn|νk, µk, Σk). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 9/20 Mixture of Student-t model Introducing znk, zk = {znk, n = 1, · · · , N}, Z = {znk}, c = {cn, n = 1, · · · , N}, θk = {νk, ak, µk, Σk}, Θ = {θk, k = 1, · · · , K} Assigning the priors p(Θ) = k p(θk), we can write: p(X, c, Z, Θ|K) = n k akN(xn|µk, z−1 n,k Σk) G(znk|νk 2 , νk 2 ) p(θk) Joint posterior law: p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K) p(X|K) . The main task now is to propose some approximations to it in such a way that we can use it easily in all the above mentioned tasks of classification or clustering. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA) Main idea: to propose easy computational approximation q(c, Z, Θ) for p(c, Z, Θ|X, K). Criterion: KL(q : p) Interestingly, by noting that p(c, Z, Θ|X, K) = p(X, c, Z, Θ|K)/p(X|K) we have: KL(q : p) = −F(q) + ln p(X|K) where F(q) = − ln p(X, c, Z, Θ|K) q is called free energy of q and we have the following properties: – Maximizing F(q) or minimizing KL(q : p) are equivalent and both give un upper bound to the evidence of the model ln p(X|K). – When the optimum q∗ is obtained, F(q∗) can be used as a criterion for model selection. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 11/20 VBA: choosing the good families Using KL(q : p) has the very interesting property that using q to compute the means we obtain the same values if we have used p (Conservation of the means). Unfortunately, this is not the case for variances or other moments. If p is in the exponential family, then choosing appropriate conjugate priors, the structure of q will be the same and we can obtain appropriate fast optimization algorithms. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 12/20 Hierarchical graphical model ξ0 d d‚    © αk   βk   znk   E γ0, Σ0 c Σk   µ0, η0 c µk   k0 c a   d d‚    © d d‚    © ¨ ¨¨¨ ¨¨%xn   E Figure : Graphical representation of the model. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 13/20 VBA for mixture of Student-t In our case, noting that p(X, c, Z, Θ|K) = n k p(xn, cn, znk|ak, µk, Σk, νk) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with p(xn, cn, znk|ak, µk, Σk, νk) = N(xn|µk, z−1 n,k Σk) G(znk|αk, βk) is separable, in one side for [c, Z] and in other size in components of Θ, we propose to use q(c, Z, Θ) = q(c, Z) q(Θ). A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 14/20 VBA for mixture of Student-t With this decomposition, the expression of the Kullback-Leibler divergence becomes: KL(q1(c, Z)q2(Θ) : p(c, Z, Θ|X, K) = c q1(c, Z)q2(Θ) ln q1(c, Z)q2(Θ) p(c, Z, Θ|X, K) dΘ dZ The expression of the Free energy becomes: F(q1(c, Z)q2(Θ)) = c q1(c, Z)q2(Θ) ln p(X, c, Z|Θ, K)p(Θ|K) q1(c, Z)q2(Θ) dΘ dZ A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Student-t priors model Using a generalized Student-t obtained by replacing G(zn,k|νk 2 , νk 2 ) by G(zn,k|αk, βk) it will be easier to propose conjugate priors for αk, βk than for νk. p(xn, cn = k, znk|ak, µk, Σk, αk, βk, K) = ak N(xn|µk, z−1 n,k Σk) G(zn,k|αk, βk). In the following, noting by Θ = {(ak, µk, Σk, αk, βk), k = 1, · · · , K}, we propose to use the factorized prior laws: p(Θ) = p(a) k [p(αk) p(βk) p(µk|Σk) p(Σk)] with the following components:    p(a) = D(a|k0), k0 = [k0, · · · , k0] = k01 p(αk) = E(αk|ζ0) = G(αk|1, ζ0) p(βk) = E(βk|ζ0) = G(αk|1, ζ0) p(µk|Σk) = N(µk|µ01, η−1 0 Σk) p(Σk) = IW(Σk|γ0, γ0Σ0) A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Student-t priors model where D(a|k) = Γ( l kk) l Γ(kl ) l akl −1 l is the Dirichlet pdf, E(t|ζ0) = ζ0 exp [−ζ0t] is the Exponential pdf, G(t|a, b) = ba Γ(a) ta−1 exp [−bt] is the Gamma pdf and IW(Σ|γ, γ∆) = |1 2∆|γ/2 exp −1 2Tr ∆Σ−1 ΓD(γ/2)|Σ| γ+D+1 2 . is the inverse Wishart pdf. With these prior laws and the likelihood: joint posterior law: pk(c, Z, Θ|X) = p(X, c, Z, Θ) p(X) . A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 17/20 Expressions of q q(c, Z, Θ) = q(c, Z) q(Θ) = n k[q(cn = k|znk) q(znk)] k[q(αk) q(βk) q(µk|Σk) q(Σk)] q(a). with:    q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ] q(αk) = G(αk|˜ζk, ˜ηk) q(βk) = G(βk|˜ζk, ˜ηk) q(µk|Σk) = N(µk|µ, ˜η−1Σk) q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ) With these choices, we have F(q(c, Z, Θ)) = ln p(X, c, Z, Θ|K) q(c,Z,Θ) = k n F1kn + k F2k F1kn = ln p(xn, cn, znk, θk) q(cn=k|znk )q(znk ) F2k = ln p(xn, cn, znk, θk) q(θk )A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 18/20 VBA Algorithm step Expressions of the updating expressions of the tilded parameters are obtained by following three steps: E step: Optimizing F with respect to q(c, Z) when keeping q(Θ) fixed, we obtain the expression of q(cn = k|znk) = ˜ak, q(znk) = G(znk|αk, βk). M step: Optimizing F with respect to q(Θ) when keeping q(c, Z) fixed, we obtain the expression of q(a) = D(a|˜k), ˜k = [˜k1, · · · , ˜kK ], q(αk) = G(αk|˜ζk, ˜ηk), q(βk) = G(βk|˜ζk, ˜ηk), q(µk|Σk) = N(µk|µ, ˜η−1Σk), and q(Σk) = IW(Σk|˜γ, ˜γ ˜Σ), which gives the updating algorithm for the corresponding tilded parameters. F evaluation: After each E step and M step, we can also evaluate the expression of F(q) which can be used for stopping rule of the iterative algorithm. Final value of F(q) for each value of K, noted Fk, can be used as a criterion for model selection, i.e.; the determination of the number of clusters. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 19/20 Conclusions Clustering and classification of a set of data are between the most important tasks in statistical researches for many applications such as data mining in biology. Mixture models and in particular Mixture of Gaussians are classical models for these tasks. We proposed to use a mixture of generalised Student-t distribution model for the data via a hierarchical graphical model. To obtain fast algorithms and be able to handle large data sets, we used conjugate priors everywhere it was possible. The proposed algorithm has been used for clustering, classification and discriminant analysis of some biological data (Cancer research related), but in this paper, we only presented the main algorithm. A. Mohammad-Djafari, VBA for Classification and Clustering..., GSI2015, October 28-30, 2015, Polytechnique, France 20/20

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix in order to draw a parallel coordinate plot. In this paper, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a geometrical viewpoint. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are restricted to be full-rank.
 
Differential geometric properties of textile plot

What is textile plot? Textile set Main result Other results Summary Geometric Properties of textile plot Tomonari SEI and Ushio TANAKA University of Tokyo and Osaka Prefecture University at ´Ecole Polytechnique, Oct 28, 2015 1 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary Introduction The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix into another matrix, Rn×p X → Y ∈ Rn×p , in order to draw a parallel coordinate plot. The parallel coordinate plot is a standard 2-dimensional graphical tool for visualizing multivariate data at a glance. In this talk, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a differential geometrical point of view. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are “generic”. 2 / 23 What is textile plot? Textile set Main result Other results Summary 1 What is textile plot? 2 Textile set 3 Main result 4 Other results 5 Summary 3 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Example (Kumasaka and Shibata, 2008) Textile plot for the iris data. (150 cases, 5 attributes) Each variate is transformed by a location-scale transformation. Categorical data is quantified. Missing data is admitted. Order of axes can be maintained. Specie s Sepal.Length Sepal.W id th Petal.Length Petal.W id th setosa versicolor virginica 4.3 7.9 2 4.4 1 6.9 0.1 2.5 4 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Let us recall the method of the textile plot. For simplicity, we assume no categorical variate and no missing value. Let X = (x1, . . . , xp) ∈ Rn×p be the data matrix. Without loss of generality, assume the sample mean and sample variance of each xj are 0 and 1, respectively. The data is transformed into Y = (y1, . . . , yp), where yj = aj + bj xj , aj , bj ∈ R, j = 1, . . . , p. The coefficients aj and bj are determined by the following procedure. 5 / 23 What is textile plot? Textile set Main result Other results Summary Textile plot Coefficients a = (aj ) and b = (bj ) are the solution of the following minimization problem: Minimize a,b n∑ t=1 p∑ j=1 (ytj − ¯yt·)2 subject to yj = aj + bj xj , p∑ j=1 yj 2 = 1. Intuition: as horizontal as possible. Solution: a = 0 and b is the eigenvector corresponding to the maximum eigenvalue of the covariance matrix of X. yt1 yt2 yt3 yt4 yt5 yt. 6 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = 100, p = 4) X ∈ R100×4. Each row ∼ N(0, Σ), Σ =   1 −0.6 0.5 0.1 −0.6 1 −0.6 −0.2 0.5 −0.6 1 0.0 0.1 −0.2 0.0 1  . −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 −2.71 2.98 −3.93 3.27 −2.72 2.43 −2.58 2.23 (a) raw data X (b) textile plot Y 7 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Our motivation The textile plot transforms the data matrix X into Y. Denote the map by Y = τ(X). What is the image τ(Rn×p)? We can show that Y ∈ τ(Rn×p) satisfies two conditions: ∃λ ≥ 0, ∀i = 1, . . . , p, p∑ j=1 yi yj = λ yi 2 and p∑ j=1 yj 2 = 1. This motivates the following definition of the textile set. 8 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Textile set Definition The textile set is defined by Tn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 , ∑ j yj 2 = 1 }, The unnormalized textile set is defined by Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i, ∑ j yi yj = λ yi 2 }. We are interested in mathematical properties of Tn,p and Un,p. Bad news: statistical implication such is a future work. Let us begin with small p case. 9 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Tn,p with small p Lemma (p = 1) Tn,1 = Sn−1, the unit sphere. Lemma (p = 2) Tn,2 = A ∪ B, where A = {(y1, y2) | y1 = y2 = 1/ √ 2}, B = {(y1, y2) | y1 − y2 = y1 + y2 = 1}, each of which is diffeomorphic to Sn−1 × Sn−1. Their intersection A ∩ B is diffeomorphic to the Stiefel manifold Vn,2. → See next slide for n = p = 2 case. 10 / 23 What is textile plot? Textile set Main result Other results Summary Example (n = p = 2) T2,2 ⊂ R4 is the union of two tori, glued along O(2). θ φ ξ η T2,2 = { 1 √ 2 ( cos θ cos φ sin θ sin φ )} ∪ { 1 2 ( cos ξ + cos η cos ξ − cos η sin ξ + sin η sin ξ − sin η )} 11 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary For general dimension p To state our main result, we define two concepts: noncompact Stiefel manifold and canonical form. Definition (e.g. Absil et al. (2008)) Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices: V ∗ := { Y ∈ Rn×p | rank(Y) = p }. V ∗ is called the noncompact Stiefel manifold. Note that dim(V ∗) = np and V ∗ = Rn×p. The orthogonal group O(n) acts on V ∗. By the Gram-Schmidt orthonormalization, the quotient space V ∗/O(n) is identified with upper-triangular matrices with positive diagonals. → see next slide. 12 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Noncompact Stiefel manifold and canonical form Definition (Canonical form) Let us denote by V ∗∗ the set of all matrices written as            y11 · · · y1p 0 ... ... ... ... ypp 0 · · · 0 ... ... 0 · · · 0            , yii > 0, 1 ≤ i ≤ p. We call it a canonical form. Note that V ∗∗ ⊂ V ∗ and V ∗/O(n) V ∗∗. 13 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary Restriction of unnormalized textile set V ∗: non-compact Stiefel manifold, V ∗∗: set of canonical forms. Definition Denote the restriction of Un,p to V ∗ and V ∗∗ by U∗ n,p = Un,p ∩ V ∗ , U∗∗ n,p = Un,p ∩ V ∗∗ , respectively. The group O(n) acts on U∗ n,p. The quotient space U∗ n,p/O(n) is identified with U∗∗ n,p. So it is essential to study U∗∗ n,p. 14 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary U∗∗ n,p for small p Let us check examples. Example (n = p = 1) U∗∗ 1,1 = {(1)}. Example (n = p = 2) Let Y = ( y11 y12 0 y22 ) with y11, y22 > 0. Then U∗∗ 2,2 = {y12 = 0} ∪ {y2 11 = y2 12 + y2 22}, union of a plane and a cone. 15 / 23 What is textile plot? Textile set Main result Other results Summary Main theorem The differential geometrical property of U∗∗ n,p is given as follows: Theorem Let n ≥ p ≥ 3. Then we have the following decomposition U∗∗ n,p = M1 ∪ M2, where each Mi is a differentiable manifold, the dimensions of which are given by dim M1 = p(p + 1) 2 − (p − 1), dim M2 = p(p + 1) 2 − p, respectively. M2 is connected while M1 may not. 16 / 23 What is textile plot? Textile set Main result Other results Summary Example U∗∗ 3,3 is the union of 4-dim and 3-dim manifolds. We look at a cross section with y11 = y22 = 1: y12 y13 y33 Union of a surface and a vertical line. 17 / 23 What is textile plot? Textile set Main result Other results Summary Corollary Let n ≥ p ≥ 3. Then we have U∗ n,p = π−1 (M1) ∪ π−1 (M2), where π denotes the map of Gram-Schmidt orthonormalization. The dimensions are dim π−1 (M1) = np − (p − 1), dim π−1 (M2) = np − p. 18 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Other results We state other results. First we have n = 1 case. Lemma If n = 1, then the textile set T1,p is the union of a (p − 2)-dimensional manifold and 2(2p − 1) isolated points. Example U∗∗ 1,3 consists of a circle and 14 points: U∗∗ 1,3 = (S2 ∩ {y1 + y2 + y3 = 1}) ∪ {±( 1√ 3 , 1√ 3 , 1√ 3 ), ±( 1√ 2 , 1√ 2 , 0), ±( 1√ 2 , 0, 1√ 2 ), ±(0, 1√ 2 , 1√ 2 ), ± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)} . 19 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by fλ(y1, . . . , yp) :=       ∑ j y1 yj − λ y1 2 ... ∑ j yp yj − λ yp 2 ∑ j yj 2 − 1       . Lemma We have a classification of Tn,p, namely Tn,p = λ≥0 fλ −1 (O) = 0≤λ≤n fλ −1 (O). 20 / 23 What is textile plot? Textile set Main result Other results Summary Differential geometrical characterization of fλ −1 (O) Lastly, we state a characterization of fλ −1 (O) from the viewpoint of differential geometry. Theorem Let λ ≥ 0. fλ −1 (O) is a regular sub-manifold of Rn×p with codimension p + 1 whenever λ > 0, y11yjj − y1j yj1 = 0, j = 2, . . . , p, ∃ ∈ { 2, . . . , p }; p∑ j=2 yij + yi (1 − 2λ) = 0, i = 1, . . . , n. 21 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary Present and future study Summary: We defined the textile set Tn,p and find its geometric properties. Present and future study: . 1 Characterize the classification fλ −1 (O) with induced Riemannian metric from Rnp by (global) Riemannian geometry: geodesic, curvature etc. . 2 Investigate differential geometrical and topological properties of Tn,p and fλ −1 (O), including its group action. 3 Can one find statistical implication such as sample distribution theory? Merci beaucoup! 22 / 23 What is textile plot? Textile set Main result Other results Summary References . 1 Absil, P.-A., Mahony, R., and Sepulchre, R. (2008), Optimization Algorithms on Matrix Manifolds, Princeton University Press. . 2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot, Proceedings of the Institute of Statistical Mathematics, 55, 69–83. . 3 Inselberg, A. (2009), Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Springer. 4 Kumasaka, N. and Shibata, R. (2008), High-dimensional data visualisation: The textile plot, Computational Statistics and Data Analysis, 52, 3616–3644. 23 / 23

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In anomalous statistical physics, deformed algebraic structures are important objects. Heavily tailed probability distributions, such as Student’s t-distributions, are characterized by deformed algebras. In addition, deformed algebras cause deformations of expectations and independences of random variables. Hence, a generalization of independence for multivariate Student’s t-distribution is studied in this paper. Even if two random variables which follow to univariate Student’s t-distributions are independent, the joint probability distribution of these two distributions is not a bivariate Student’s t-distribution. It is shown that a bivariate Student’s t-distribution is obtained from two univariate Student’s t-distributions under q-deformed independence.
 
A generalization of independence and multivariate Student's t-distributions

A generalization of independence and multivariate Student’s t-distributions MATSUZOE Hiroshi Nagoya Institute of Technology joint works with SAKAMOTO Monta (Efrei, Paris) 1 Deformed exponential family 2 Non-additive differentials and expectation functionals 3 Geometry of deformed exponential families 4 Generalization of independence 5 q-independence and Student’s t-distributions 6 Appendix   Notions of expectations, independence are determined from the choice of statistical models.  Introduction: Geometry and statistics • Geometry for the sample space • Geometry for the parameter space • Wasserstein geometry • Optimal transport theory • A pdf is regarded as a distribution of mass • Information geometry • Convexity of entropy and free energy • Duality of estimating function

Hessian Information Geometry (chaired by Shun-Ichi Amari, Michel Boyom)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We define a metric and a family of α-connections in statistical manifolds, based on ϕ-divergence, which emerges in the framework of ϕ-families of probability distributions. This metric and α-connections generalize the Fisher information metric and Amari’s α-connections. We also investigate the parallel transport associated with the α-connection for α = 1.
 
New metric and connections in statistical manifolds

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Curvature properties for statistical structures are studied. The study deals with the curvature tensor of statistical connections and their duals as well as the Ricci tensor of the connections, Laplacians and the curvature operator. Two concepts of sectional curvature are introduced. The meaning of the notions is illustrated by presenting few exemplary theorems.
 
Curvatures of Statistical Structures

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We show that Hessian manifolds of dimensions 4 and above must have vanishing Pontryagin forms. This gives a topological obstruction to the existence of Hessian metrics. We find an additional explicit curvature identity for Hessian 4-manifolds. By contrast, we show that all analytic Riemannian 2-manifolds are Hessian.
 
no preview

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Based on the theory of compact normal left-symmetric algebra (clan), we realize every homogeneous cone as a set of positive definite real symmetric matrices, where homogeneous Hessian metrics as well as a transitive group action on the cone are described efficiently.
 
no preview

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this article, we derive an inequality satisfied by the squared norm of the imbedding curvature tensor of Multiply CR-warped product statistical submanifolds N of holomorphic statistical space forms M. Furthermore, we prove that under certain geometric conditions, N and M become Einstein.
 
no preview

Topological forms and Information (chaired by Daniel Bennequin, Pierre Baudot)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this lecture we will present joint work with Ryan Thorngren on thermodynamic semirings and entropy operads, with Nicolas Tedeschi on Birkhoff factorization in thermodynamic semirings, ongoing work with Marcus Bintz on tropicalization of Feynman graph hypersurfaces and Potts model hypersurfaces, and their thermodynamic deformations, and ongoing work by the author on applications of thermodynamic semirings to models of morphology and syntax in Computational Linguistics.
 
no preview

Information Algebras and their Applications Matilde Marcolli Geometric Science of Information, Paris, October 2015 Matilde Marcolli Information Algebras Based on: M. Marcolli, R. Thorngren, Thermodynamic semirings, J. Noncommut. Geom. 8 (2014), no. 2, 337–392 M. Marcolli, N. Tedeschi, Entropy algebras and Birkhoff factorization, J. Geom. Phys. 97 (2015) 243–265 Matilde Marcolli Information Algebras Min-Plus Algebra (Tropical Semiring) min-plus (or tropical) semiring T = R ∪ {∞} • operations ⊕ and x ⊕ y = min{x, y} with identity ∞ x y = x + y with identity 0 • operations ⊕ and satisfy: associativity commutativity left/right identity distributivity of product over sum ⊕ Matilde Marcolli Information Algebras Thermodynamic semirings Tβ,S = (R ∪ {∞}, ⊕β,S , ) • deformation of the tropical addition ⊕β,S x ⊕β,S y = min p {px + (1 − p)y − 1 β S(p)} β thermodynamic inverse temperature parameter S(p) = S(p, 1 − p) binary information measure, p ∈ [0, 1] • for β → ∞ (zero temperature) recovers unperturbed idempotent addition ⊕ • multiplication = + is undeformed • for S = Shannon entropy considered first in relation to F1-geometry in A. Connes, C. Consani, From monoids to hyperstructures: in search of an absolute arithmetic, arXiv:1006.4810 Matilde Marcolli Information Algebras Khinchin axioms Sh(p) = −C(p log p + (1 − p) log(1 − p)) • Axiomatic characterization of Shannon entropy S(p) = Sh(p) 1 symmetry S(p) = S(1 − p) 2 minima S(0) = S(1) = 0 3 extensivity S(pq) + (1 − pq)S(p(1 − q)/(1 − pq)) = S(p) + pS(q) • correspond to algebraic properties of semiring Tβ,S 1 commutativity of ⊕β,S 2 left and right identity for ⊕β,S 3 associativity of ⊕β,S ⇒ Tβ,S commutative, unital, associative iff S(p) = Sh(p) Matilde Marcolli Information Algebras Khinchin axioms n-ary form Given S as above, define Sn : ∆n−1 → R 0 by Sn(p1, . . . , pn) = 1 j n−1 (1 − 1 i 0: exists ⊕-additive map T : S → S with T(f1) T(f2) = T(T(f1) f2) ⊕ T(f1 T(f2)) ⊕ T(f1 f2) log λ • Rota–Baxter semiring (S, ⊕, ) weight λ < 0: exists ⊕-additive map T : S → S with T(f1) T(f2)⊕T(f1 f2) log(−λ) = T(T(f1) f2)⊕T(f1 T(f2)) Matilde Marcolli Information Algebras Birkhoff factorization in min-plus semirings (weight +1) • Bogolyubov-Parashchuk preparation ˜ψ(x) = min{ψ(x), ψ−(x ) + ψ(x )} = ψ(x) ⊕ ψ−(x ) ψ(x ) (x , x ) ranges over non-primitive part of coproduct ∆(x) = x ⊗ 1 + 1 ⊗ x + x ⊗ x • ψ− defined inductively on lower degree x in Hopf algebra ψ−(x) := T( ˜ψ(x)) = T(min{ψ(x), ψ−(x ) + ψ(x )}) = T ψ(x) ⊕ ψ−(x ) ψ(x ) Matilde Marcolli Information Algebras • by ⊕-linearity of T same as ψ−(x) = min{T(ψ(x)), T(ψ−(x ) + ψ(x ))} = T(ψ(x)) ⊕ T(ψ−(x ) ψ(x )) • then ψ+ by convolution ψ+(x) := (ψ− ψ)(x) = min{ψ−(x), ψ(x), ψ−(x ) + ψ(x )} = min{ψ−(x), ˜ψ(x)} = ψ−(x) ⊕ ˜ψ(x) • key step: associativity and commutativity of ⊕ and ⊕-additivity of T, plus Rota-Baxter identity weight +1 gives ψ−(xy) = ψ−(x) + ψ−(y) hence ψ+ also as convolution Matilde Marcolli Information Algebras Thermodynamic Rota–Baxter structures • Sβ,S thermodynamic Rota–Baxter semiring weight λ > 0: there is ⊕β,S -additive map T : Sβ,S → Sβ,S T(f1) T(f2) = T(T(f1) f2)⊕β,S T(f1 T(f2))⊕β,S T(f1 f2) log λ • Sβ,S thermodynamic Rota–Baxter semiring weight λ < 0: there is ⊕β,S -additive map T : Sβ,S → Sβ,S T(f1) T(f2)⊕β,S T(f1 f2) log(−λ) = T(T(f1) f2)⊕β,S T(f1 T(f2)) like previous case but with ⊕ replaced with deformed ⊕β,S Matilde Marcolli Information Algebras • (R, S) logarithmically related pair: T : S → S determines T : R → R with T (e−βf ) := e−βT(f ), for a = e−βf in Dom(log) ⊂ R T Rota-Baxter weight λβ on R ⇔ T Rota-Baxter weight λ on Sβ,S with S = Sh and λβ = λ−β, for λ > 0, or λβ = −|λ|−β for λ < 0 T (e−βf1 )T (e−βf2 ) = T (T (e−βf1 )e−βf2 ) + T (e−βf1 T (e−βf2 )) +λβ T (e−βf1 e−βf2 ) • T is R-linear iff T is ⊕β,S -linear Matilde Marcolli Information Algebras Birkhoff factorization in thermodynamic Rota–Baxter semirings (weight +1) • T : Sβ,S → Sβ,S Rota–Baxter of weight λ = +1 • Bogolyubov–Parashchuk preparation of ψ : H → Sβ,S ˜ψβ,S (x) = ψ(x) ⊕β,S β,S ψ−(x ) + ψ(x ) = −β−1 log e−βψ(x) + e−β(ψ−(x )+ψ(x )) • φβ(x) := e−βψ(x) in R: Bogolyubov–Parashchuk preparation ˜φβ(x) = e−β ˜ψ(x) ˜φβ(x) := φβ(x) + T (˜φβ(x ))φβ(x ) with T (e−βf ) := e−βT(f ) and T (−e−βf ) := −T (e−βf ) Matilde Marcolli Information Algebras • Birkhoff factorization ψβ,+ = ψβ,− β ψ ψβ,−(x) = T( ˜ψβ(x)) = −β−1 log e−βT(ψ(x)) + e−βT(ψ−(x )+ψ(x )) ψβ,+(x) = −β−1 log e−βψβ,−(x) + e−β ˜ψβ(x) satisfying ψβ,±(xy) = ψβ,±(x) + ψβ,±(y) • in limit β → ∞ thermodynamic Birkhoff factorization converges to min-plus Birkhoff factorization Matilde Marcolli Information Algebras Example: Witt rings • commutative ring R, Witt ring W (R) = 1 + tR[[t]]: addition is product of formal power series, multiplication determined by (1 − at)−1 (1 − bt)−1 = (1 − abt)−1 a, b ∈ R • injective ring homomorphism g : W (R) → RN, ghost coordinates coefficients of t 1 α dα dt = r≥1 αr tr for α = exp( r≥1 αr tr /r) • component-wise addition and multiplication on RN Matilde Marcolli Information Algebras • linear operator T : R[[t]] → R[[t]] is Rota–Baxter weight λ iff TW : W (R) → W (R) defined by g(TW (α)) = T (g(α)) satisfies TW (α1) TW (α2) = TW (α1 TW (α2)) +W TW (TW (α1) α2) +W λ TW (α1 α2) with +W addition of W (R) and convolution product α γ := exp   n≥1 ( r+ =n αr γ ) tn n   for α = exp( r≥1 αr tr /r) and γ = exp( r≥1 γr tr /r) Matilde Marcolli Information Algebras • Example: R = RN Rota–Baxter weight +1 T : (a1, a2, . . . , an, . . .) → (0, a1, a1 + a2, . . . , n−1 k=1 ak, . . .) resulting Rota–Baxter TW weight +1 on Witt ring W (R) TW (α) = α I convolution product with multiplicative unit I = (1 − t)−1 • Hasse–Weil zeta functions of varieties over Fq Z(X, t) = exp   r≥1 #X(Fqr ) tr r   elements in Witt ring: Z(X Y , t) = Z(X, t)Z(Y , t) and Z(X×Y , t) = Z(X, t) Z(Y , t) Rota–Baxter operator weight +1 TW (Z(X, t)) = Z(X, t) Z(Spec(Fq), t) Matilde Marcolli Information Algebras Computation examples • inclusion–exclusion “cost functions”: Γ = Γ1 ∪ Γ2, γ = Γ1 ∩ Γ2 ψ(Γ) = ψ(Γ1) + ψ(Γ2) − ψ(γ) determine ψ : H → T character ψ(xy) = ψ(x) + ψ(y) • class of machines ψn(Γ) step-counting function of n-th machine: when it outputs on computation Γ (Hopf algebra of flow charts) • Rota–Baxter operator weight +1 of partial sums: Bogolyubov–Parashchuck preparation ˜ψn(Γ) = min{ψn(Γ), ψn(Γ/γ) + n−1 k=1 ˜ψk(γ)} • a graph Γ with ψn(Γ) = ∞ (n-th machine does not halt) can have ˜ψn(Γ) < ∞ if both - source of infinity was localized in γ ∂γ, so ψn(Γ/γ) < ∞ - ψk(γ) < ∞ for all previous machines “renormalization of computational infinities” in Manin’s sense Matilde Marcolli Information Algebras Polynomial countability • in perturbative quantum field theory: graph hypersurfaces XΓ = {ΨΓ = 0} ⊂ A#EΓ ΨΓ(t) = T e /∈E(T) te sum over spanning trees • X variety over Z, reductions Xp over Fp counting function N(X, q) := #Xp(Fq) Polynomially countable X if counting function polynomial PX (q) Matilde Marcolli Information Algebras • Question: when are graph hypersurfaces XΓ polynomially countable? or equivalently complements YΓ = A#EΓ XΓ • max-plus character ψ : H → Tmax with N(YΓ, q) ∼ qψ(Γ) leading order if YΓ polynomially countable or ψ(Γ) := −∞ if not • when YΓ not polynomially countable ˜ψ(Γ) = max{ψ(Γ), ˜ψ(γ) + ψ(Γ/γ)} = max{ψ(Γ), N j=1 ψ(γj ) + ψ(γj−1/γj )} identifies chains of subgraphs and quotient graphs whose hypersurfaces are polynomially countable Matilde Marcolli Information Algebras Work in progress: tropical geometry • tropical polynomial p : Rn → R piecewise linear p(x1, . . . , xn) = ⊕m j=1aj x kj1 1 · · · x kjn n = min{a1+k11x1+· · ·+k1nxn, a2+k21x1+· · ·+k2nxn, · · · , am+km1x1+· · ·+kmnxn}. tropical hypersurface where tropical polynomial non-differentiable • Entropical geometry: thermodynamic deformations of T pβ,S(x1, . . . , xn) = ⊕β,S,j aj x kj1 1 · · · x kjn n = min p=(pj ) { j pj (aj + kj1x1 + · · · + kjnxn) − 1 β Sn(p1, . . . , pn)} • Goal: entropical geometry of graph hypersurfaces of QFT Matilde Marcolli Information Algebras Work in progress: lexicographic semirings • Min-plus type semirings widely used as “lexicographic semirings” in computational models of morphology and syntax in Linguistics • Goal: use entropy deformations of these linguistics models, introducing an inverse temperature parameter β in finite-state representations of n-gram models based on the tropical semiring (modelled on thermodynamic formalism in dynamical systems) • ... coming soon! Matilde Marcolli Information Algebras

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We show that the entropy function–and hence the finite 1-logarithm–behaves a lot like certain derivations. We recall its cohomological interpretation as a 2-cocycle and also deduce 2n-cocycles for any n. Finally, we give some identities for finite multiple polylogarithms together with number theoretic applications.
 
no preview

Finite polylogarithms, their multiple analogues and the Shannon entropy Geometric Sciences of Information 2015 Session “Topological Forms and Information” École Polytechnique (France), 28 October 2015 Philippe Elbaz-Vincent(Université Grenoble Alpes) & Herbert Gangl (Durham University) Content of this talk Information theory, Entropy and Polylogarithms (review of past works), Algebraic interpretation of the entropy function, Cohomological interpretation of formal entropy functions, Finite multiple polylogarithms, applications and open problems. 2 / 13 Information theory, Entropy and Polylogarithms (1/4) The Shannon entropy can be characterised in the framework of information theory, assuming that the propagation of information follows a Markovian model (Shannon, 1948). If H is the Shannon entropy, it fulfills the equation, often called the Fundamental Equation of Information Theory (FEITH) H(x) + (1 − x)H y 1 − x − H(y) − (1 − y)H x 1 − y = 0 . (FEITH) It is known (Aczel and Dhombres, 1989), that if g is a real function locally integrable on ]0, 1[ and if, moreover, g fulfills FEITH, then there exists c ∈ R such that g = cH (we can also restrict the hypothesis to Lebesgue measurable). 3 / 13 Information theory, Entropy and Polylogarithms (2/4) It turns out that FEITH can be derived, in a precise formal sense (Elbaz-Vincent and Gangl, 2002), from the 5-term equation of the classical (or p-adic) dilogarithm. Cathelineau (1996) found that an appropriate derivative of the Bloch–Wigner dilogarithm coincides with the classical entropy function, and that the five term relation satisfied by the former implies the four term relation of the latter. More precisely, we define Lim(z) = ∞ n=1 zn nm , |z| < 1, the m-logarithm. We set D2(z) = i Im Li2(z) + log(1 − z) log |z| , Then D2 satisfies the following 5-term equation D2 (a) − D2 (b) + D2 b a − D2 1 − b 1 − a + D2 1 − b−1 1 − a−1 = 0, whenever such an expression makes sense. The relation is the famous five term equation for the dilogarithm (first stated by Abel). 4 / 13 Information theory, Entropy and Polylogarithms (3/4) It can be shown formarly (see Cathelineau, Elbaz-Vincent and Gangl) that FEITH is an infinitesimal version of this 5-term equation. Kontsevich (1995) discovered that the truncated finite logarithm over a finite field Fp, with p prime, defined by £1(x) = p−1 k=1 xk k , satisfies FEITH. In our previous work, we showed how one can expand this relationship for “higher analogues" in order to produce and prove similar functional identities for finite polylogarithms from those for classical polylogarithms (using mod p reduction of p-adic polylogarithms and their infinitesimal version). It was also shown that functional equations for finite polylogarithms often hold even as polynomial identities over finite fields. 5 / 13 Information theory, Entropy and Polylogarithms (4/4) Entropy and FEITH arise from the infinitesimal picture (for both archimedean and non-archimedean structure) and their finite analogs associated to the dilogarithm. Does their exist higher analogue of the Shannon entropy associated to m-logarithms ? It could be connected to the higher degrees of the information cohomology space of Baudot and Bennequin (Entropy 2015). 6 / 13 Algebraic interpretation of the entropy function (1/2) Let R be a (commutative) ring and let D be a map from R to R. We will say that D is a unitary derivation over R if the following axioms hold : 1 “Leibniz’s rule” : for all x, y ∈ R, we have D(xy) = xD(y) + yD(x). 2 “Additivity on partitions of unity” : for all x ∈ R, we have D(x) + D(1 − x) = 0. We will denote by Deru(R) the set of unitary derivations over R. We will say that a map f : R → R is an abstract symmetric information function of degree 1 if the two following conditions hold : for all x, y ∈ R such that x, y, 1 − x, 1 − y ∈ R×, the functional equation FEITH holds and for all x ∈ R, we have f (x) = f (1 − x). Denote by IF1(R) the set of abstract symmetric information functions of degree 1 over R. Then IF1(R) is an R-module. Let Leib(R) be the set of Leibniz functions over R (i.e. which fulfill the “Leibniz rule”). 7 / 13 Algebraic interpretation of the entropy function (2/2) Proposition : We have a morphism of R-modules h : Leib(R) → IF1(R), defined by h(ϕ) = ϕ + ϕ ◦ τ, with τ(x) = 1 − x. Furthermore, Ker(h) = Deru(R). Hence, if h is onto, abstract information function are naturally associated to formal derivations. Nevertheless, h can be also 0. Indeed, if R = Fq, is a finite field, then Leib(Fq) = 0, but IF1(Fq) = 0 (it is generated by £1). 8 / 13 Cohomological interpretation of formal entropy functions The following results are classical in origin (Cathelineau, 1988 and Kontsevich, 1995) Proposition : Let F be a finite prime field and H : F → F a function which fulfills the following conditions : H(x) = H(1 − x), the functional equation (FEITH) holds for H and H(0) = 0. Then the function ϕ : F × F → F defined by ϕ(x, y) = (x + y)H( x x+y ) if x + y = 0 and 0 otherwise, is a non-trivial 2-cocycle. sketch of proof : Suppose that ϕ is a 2-coboundary. Then, there exists a map Q : F → F, such that ϕ(x, y) = Q(x + y) − Q(x) − Q(y). The function ψλ(x) = Q(λx) − λQ(x) is an additive morphism F → F, hence entirely determined by ψλ(1). The map ψλ(1) fulfills the Leibniz chain rule on F× . We deduce from it that ϕ = 0 (which is not possible, so it is not a coboundary !) We deduce that £1 is unique (up to a constant). In the real or complex we use other type of cohomological arguments (see also the relationship with Baudot and Bennequin, 2015). 9 / 13 Finite multiple polylogarithms (1/3) While classical polylogarithms play an important role in the theory of mixed Tate motives over a field, it turns out that it is often preferable to also consider the larger class of multiple polylogarithms (cf. Goncharov’s work). In a similar way it is useful to investigate their finite analogues. We are mainly concerned with finite double polylogarithms which are given as functions Z/p × Z/p → Z/p by £a,b(x, y) = 0

0 be divisible by 3, and put ω = n/3 − 1. Then ω j=0 ω j 2ω j £n−(j+1),j+1 [a, b]−[ 1 a , a b]−ap bp [b, 1 a b ]+bp [a b, 1 b ] = 0. Questions : what is the interpretation in term of information theory for the multiple polylogs ? 12 / 13 Finite polylogarithms and Fermat’s last theorem Several classical criteria used by Kummer, Mirimanoff and Wieferich to prove certain cases of Fermat’s Last Theorem can be rephrased in terms of functional equations and evaluations of finite (multiple) polylogarithms. For example, Mirimanoff was led to the study of (nowadays called) Mirimanoff polynomials (cf. Ribenboim book on FLT) ϕj (T) = p−1 j=1 kj−1Tk, which are nothing else but finite polylogarithms... The Mirimanoff congruences (op.cit) can be reformulated as follows : for any solution (x, y, z) of xp + yp + zp = 0 in pairwise prime integers not divisible by p (i.e. a Fermat triple) and for t = −x y we have £1(t) = 0 , £j (t)£p−j (t) = 0 (j = 2, . . . , p − 1 2 ) . One can prove these congruences using an identity expressing £p−j−1,j+1(1, T) in terms of £n(T). 13 / 13

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a dictionary between arithmetic geometry of toric varieties and convex analysis. This correspondence allows for effective computations of arithmetic invariants of these varieties. In particular, combined with a closed formula for the integration of a class of functions over polytopes, it gives a number of new values for the height (arithmetic analog of the degree) of toric varieties, with respect to interesting metrics arising from polytopes. In some cases these heights are interpreted as the average entropy of a family of random processes.
 
no preview

”GSI’15” ´Ecole Polytechnique, October 28, 2015 Heights of toric varieties, entropy and integration over polytopes Jos´e Ignacio Burgos Gil, Patrice Philippon & Mart´ın Sombra Patrice Philippon, IMJ-PRG UMR 7586 - CNRS 1 Toric varieties Toric varieties form a remarkable class of algebraic varieties, endowed with an action of a torus having one Zariski dense open orbit. Toric divisors are those invariant by the action of the torus. Together with their toric divisors, they can be described in terms of combinatorial objects such as lattice fans, support functions or lattice polytopes (u1,u2)→0 (u1,u2)→−u1 (u1,u2)→−u2 2 Each cone corresponds to an affine toric variety and the fan encodes how they glue together. If the fan is complete then the toric variety is proper. The support function determines a toric divisor D on each affine toric chart. By duality, the stability set of the support function is a polytope ∆, which may be empty but which is of dimension n as soon as D is nef, which is equivalent to the support function being concave. One fundamental result is: if D is a toric nef divisor then degD(X) = n!voln(∆). 3 Heights A height measures the complexity of objects over the field of rational numbers, say. For a/b ∈ Q× and d = gcd(a, b): h(a/b) = log max(|a/d|, |b/d|) = v log max(|a|v, |b|v), thanks to the product formula: v |d|v = 1 for any d ∈ Q× and where v runs over all the (normalised) absolute values on Q (usual and p-adic). 4 Heights A height measures the complexity of objects over the field of rational numbers, say. For a/b ∈ Q× and d = gcd(a, b): h(a/b) = log max(|a/d|, |b/d|) = v log max(|a|v, |b|v), thanks to the product formula: v |d|v = 1, d ∈ Q× . For points of a projective space x = (x0 : . . . : xN) ∈ PN(Q): h(x) = v log x v = − v log (x) v, where · v is a norm on QN+1 compatible with the absolute value |·|v on Q (usual or p-adic). Metrics on OPN (1): (x) v = | (x)|v x v . 5 On an abstract variety equipped with a divisor (X, D), defined over Q, the suitable arithmetic setting amounts to a collection of metrics on the space of rational sections of the divisor, compatible with the absolute values on Q (the collection is in bijection with the set of absolute values on Q). We denote D the resulting metrised divisor. Arithmetic intersection theory allows to define the height of X relative to D analogously to the degree degD(X): hD(X) = v hv(X) where the local heights hv are defined through an arithmetic analogue of B´ezout formula. Local heights depend on the choice of auxiliary sections but the global height does not. 6 Metrics on toric varieties On toric divisors, a metric is said toric if it is invariant by the action of the compact sub-torus of the principal orbit. There exists a bijection between toric metrics and continuous functions on the fan, whose difference with the support function is bounded. The metric is semipositive iff the corresponding function is concave. By Legendre duality, the semipositive toric metrics are also in bijection with the continuous, concave functions on the polytope associated to the toric divisor, dubbed roof function. 7 The roof function is the concave enveloppe of the graph of the function s → − log s v,sup, for s running over the toric sections of the divisor and its multiples. Roof function of the pull-back of the canonical metric of P2 on P1 by t→(1 t :1 2:t) v=2 v=∞ v=other The support function itself corresponds to the so-called canonical metric. Its roof function is the zero function on the polytope. 8 Heights on toric varieties Let (X, D) be a toric varieties with a toric divisor (over Q), equipped with a collection of toric metrics (a toric metrised divisor). The (local) roof functions attached to the toric metrised divisor sum up in the so-called global roof function: ϑ := v ϑv. We have the analogue of the formula seen for the degree: hD(X) = (n + 1)! ∆ ϑ. 9 Metrics from polytopes Let F (x) = x, uF + F (0) be the linear forms defining a polytope Γ ⊂ Rn , with F running over its facets and uF = voln−1(F ) nvoln(Γ) . Let ∆ ⊂ Γ be another polytope, the restriction of ϑ := − 1 c F F log( F ) to ∆, is the roof function of some (archimedean) metric on the toric variety X and divisor D defined by ∆, hence D. Example: the roof function of the Fubini-Study metric on Pn is −(1/2)(x0 log(x0) + . . . + xn log(xn)) where x0 = 1 − x1 − . . . − xn (dual to −1 2 log 1 + n i=1 e−2ui ). 10 Height as average entropy Let x ∈ Γ and βx be the (discrete) random variable that maps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F): P(βx = F) = dist(x, F) voln−1(F) nvoln(Γ) . F Γ ∆ • x 11 Height as average entropy Let x ∈ Γ and βx be the (discrete) random variable that maps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F): P(βx = F) = dist(x, F) voln−1(F) nvoln(Γ) . The entropy E(βx) = − F P(βx = F) log(P(βx = F)) satisfies 1 voln(∆) · ∆ E(βx)dvoln(x) = c n + 1 · hD(X) degD(X) . 12 Integration over polytopes An aggregate of ∆ in a direction u ∈ Rn is the union of all the faces of ∆ contained in {x ∈ Rn | x, u = λ} for some λ ∈ R. Definition – Let V be an aggregate in the direction of u ∈ Rn , we set recursively: If u = 0, then Cn(∆, 0, V ) = voln(V ) and Ck(∆, 0, V ) = 0 for k = n. If u = 0, then Ck(∆, u, V ) = − F uF , u u 2 Ck(F, πF (u), V ∩ F), where the sum is over the facets F of ∆. This recursive formula implies that Ck(∆, u, V ) = 0 for all k > dim(V ). 13 Proposition [2, Prop.6.1.4] – Let ∆ ⊂ Rn be a polytope of dimension n and u ∈ Rn . Then, for any f ∈ Cn (R), ∆ f(n) ( x, u )dvoln(x) = V ∈∆(u) dim(V ) k=0 Ck(∆, u, V )f(k) ( V, u ). The coefficients Ck(∆, u, V ) are determined by this identity. Example: If ∆ = Conv(ν0, . . . , νn) = n i=0{x; x, ui ≥ λi} is a simplex and u ∈ Rn \ {0}, then C0(∆, u, ν0) equals n!voln(∆) n i=1 ν0 − νi, u = ε det(u1, . . . , un)n−1 n i=1 det(u1, . . . , ui−1, u, ui+1, . . . , un) , with ε the sign of (−1)n det(u1, . . . , un). 14 References [1] G.Everest & T.Ward, Heights of Polynomials and entropy in Algebraic Dynamics, Universitext, Springer Verlag (1999). [2] J.I.Burgos Gil, P.Philippon & M.Sombra, Arithmetic geometry of toric varieties. Metrics, measures and heights, Ast´erisque 360, Soc. Math. France, 2014. 15 [3] J.I.Burgos Gil, A.Moriwaki, P.Philippon & M.Sombra, Arithmetic positivity on toric varieties, J. Algebraic Geom., 2016, to appear, e-print arXiv:1210.7692v3. [4] J.I.Burgos Gil, P.Philippon & M.Sombra, Successive minima of toric height functions, Ann. Inst. Fourier, Grenoble, 2015, to appear, e-print arXiv:1403.4048v2. Ouf! 16

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this paper we propose a method to characterize and estimate the variations of a random convex set Ξ0 in terms of shape, size and direction. The mean n-variogram γ(n)Ξ0:(u1⋯un)↦E[νd(Ξ0∩(Ξ0−u1)⋯∩(Ξ0−un))] of a random convex set Ξ0 on ℝ d reveals information on the n th order structure of Ξ0. Especially we will show that considering the mean n-variograms of the dilated random sets Ξ0 ⊕ rK by an homothetic convex family rKr > 0, it’s possible to estimate some characteristic of the n th order structure of Ξ0. If we make a judicious choice of K, it provides relevant measures of Ξ0. Fortunately the germ-grain model is stable by convex dilatations, furthermore the mean n-variogram of the primary grain is estimable in several type of stationary germ-grain models by the so called n-points probability function. Here we will only focus on the Boolean model, in the planar case we will show how to estimate the n th order structure of the random vector composed by the mixed volumes t (A(Ξ0),W(Ξ0,K)) of the primary grain, and we will describe a procedure to do it from a realization of the Boolean model in a bounded window. We will prove that this knowledge for all convex body K is sufficient to fully characterize the so called difference body of the grain Ξ0⊕˘Ξ0. we will be discussing the choice of the element K, by choosing a ball, the mixed volumes coincide with the Minkowski’s functional of Ξ0 therefore we obtain the moments of the random vector composed of the area and perimeter t (A(Ξ0),U(Ξ)). By choosing a segment oriented by θ we obtain estimates for the moments of the random vector composed by the area and the Ferret’s diameter in the direction θ, t((A(Ξ0),HΞ0(θ)). Finally, we will evaluate the performance of the method on a Boolean model with rectangular grain for the estimation of the second order moments of the random vectors t (A(Ξ0),U(Ξ0)) and t((A(Ξ0),HΞ0(θ)).
 
no preview

Characterization and Estimation of the Variations of a Random Convex Set by its Mean n-Variogram : Application to the Boolean Model S.Rahmani, J-C.Pinoli & J.Debayle Ecole Nationale Sup´erieure des Mines de Saint-Etienne,FRANCE SPIN, PROPICE / LGF, UMR CNRS 5307 28/10/2015 SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 1 / 22 Geometric Stochastic Modeling and objectives Section 1 Geometric Stochastic Modeling and objectives SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 2 / 22 Geometric Stochastic Modeling and objectives Stochastic materials Material modelling Material characterization SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 3 / 22 Geometric Stochastic Modeling and objectives Germ-Grain model [Matheron 1967] Definition Ξ = xi ∈Φ xi + Ξi (1) The Ξi are i.i.d. Φ a point process Law of Φ ⇔ Spatial distribution Law of Ξ0 ⇔ granulometry Boolean model ⇒ Φ Poisson point process of intensity λ SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 4 / 22 Geometric Stochastic Modeling and objectives Objectives and state of the art Geometrical characterization of Ξ0 from measurements in a bounded window Ξ ∩ M No assumption on Ξ0’s shape. Describing Ξ0. State of the art Miles formulae [Miles 1967] Tangent points method [Molchanov 1995] Minimum contrast method[ Dupac & Digle 1980] ⇒ Mean geometric parameter λ, E[A(Ξ0)], E[U(Ξ0)] Formula for distribution for model of disk [Emery 2012] SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 5 / 22 Geometric Stochastic Modeling and objectives Characterization and description of the grain For homothetic grains: Disk of radius r: E[r] = E[U(Ξ0)] 2π & E[r2] = E[A(Ξ0)] π Square of side x :E[x] = E[U(Ξ0)] 4 & E[r2] = E[A(Ξ0)] ⇒ Parametric distribution of homothetic factor! For non homothetic grains: rectangle, ellipse... Same mean for area and perimeter (Minkowski densities) ⇒ insufficient to fully characterize Ξ0! What about the variations of these geometrical characteristics? SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 6 / 22 Theoretical aspects Section 2 Theoretical aspects SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 7 / 22 Theoretical aspects From covariance of Ξ to variation of Ξ0 Covariance: CΞ(u) = P(x ∈ (Ξ ∩ Ξ + u)) Mean covariogram: ¯γΞ0 (u) = E[A(Ξ0 ∩ Ξ0 + u)] Relationship: ¯γΞ0 (u) = 1 γ log 1 + CΞ(u) − p2 Ξ (1 − pΞ)2 (2) In addition: R2 ¯γΞ0 (u)du = E[A(Ξ0)2] SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 8 / 22 Theoretical aspects Stability by convex dilations Ξ (a) grain Ξ0, intensity λ Ξ ⊕ K (b) grain Ξ0 ⊕ K, intensity λ Where X ⊕ Y = {x + y|x ∈ X, y ∈ Y } ⇒ The Boolean model is stable under convex dilations SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 9 / 22 Theoretical aspects The proposed method Consequently, for all r ≥ 0 we can estimate: ζ0,K (r) = E[A(Ξ0 ⊕ rK)2 ] = R2 E[γΞ0⊕rK (u)]du SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 10 / 22 Theoretical aspects The proposed method Consequently, for all r ≥ 0 we can estimate: ζ0,K (r) = E[A(Ξ0 ⊕ rK)2 ] = R2 E[γΞ0⊕rK (u)]du Steiner’s formula (mixed volumes) A(Ξ0 ⊕ rK) = A(Ξ0) + 2rW (Ξ0, K) + r2A(K) The polynomial ζ0,K ζ0,K (r) = E[A2 0] + 4rE[A0W (Ξ0, K)] + r2 (4E[W (Ξ0, K)2 ] + + 2A(K)E[A0]) + 4r3 A(K)E[W (Ξ0, K)] + r4 A(K)2 SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 10 / 22 Theoretical aspects The proposed method Consequently, for all r ≥ 0 we can estimate: ζ0,K (r) = E[A(Ξ0 ⊕ rK)2 ] = R2 E[γΞ0⊕rK (u)]du Steiner’s formula (mixed volumes) A(Ξ0 ⊕ rK) = A(Ξ0) + 2rW (Ξ0, K) + r2A(K) The polynomial ζ0,K ζ0,K (r) = E[A2 0] + 4rE[A0W (Ξ0, K)] + r2 (4E[W (Ξ0, K)2 ] + + 2A(K)E[A0]) + 4r3 A(K)E[W (Ξ0, K)] + r4 A(K)2 ⇒ Estimation of E[A2 0], E[A0W (Ξ0, K)] and E[W (Ξ0, K)2] SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 10 / 22 Theoretical aspects Generalization to nth order moments The mean n-variogram For n ≤ 2, γ (n) Ξ0 (u1, · · · un−1) = E[A( n−1 i=1 (Ξ0 − ui ) ∩ Ξ0)] Relation n-variogram → n point probability function (see proceding) Of course R2 · · · R2 γ (n) Ξ0 (u1, · · · un−1)du1 · · · dun−1 = E[A(Ξ0)n] Then the development of E[A(Ξ0 ⊕ K)n] by Steiner’s formula gives: ∀K convex, nth order moments of (A0, W (Ξ0, K)) SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 11 / 22 Theoretical aspects The interpretation of the mixed area Definition For Ξ0 and K convex, W (Ξ0, K) = 1 2(A(Ξ0 ⊕ K) − A(K)) For unit ball :W (Ξ0, B) = U(Ξ) the perimeter For a segment: W (Ξ0, Sθ) = HΞ0 (θ) the F´eret’s diameter Ξ0 HΞ0 (θ) Oxθ For a polygon W (Ξ, N i=1 αi Sθi ) = N i=1 αi HΞ0 (θi ) ⇒ ∀N, ∀(θ1, · · · θN) all moments of (HΞ0 (θ1), · · · HΞ0 (θN)) ⇒ Characterization of the random process HΞ0 SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 12 / 22 Theoretical aspects The F´eret’s diameter random process HΞ0 trajectory of HΞ0 is the support function of the realization Ξ0 ⊕ ˘Ξ0 The process HΞ0 describes and characterizes Ξ0 ⊕ ˘Ξ0 NB: Ξ0 isotropic ⇔ HΞ0 strong stationary SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 13 / 22 Ξ0 HΞ0 (θ) θ 0 π/2 π 3π/2 2π 4 5 6 7 8 orientation θ FeretdiameterHΞ0 (θ) Practical aspects Section 3 Practical aspects SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 14 / 22 Practical aspects The simplest cases Estimation of 1st and 2nd-order moments E[A(Ξ0)] and E[W (Ξ0, K)] E[A(Ξ0)2], E[A(Ξ0)W (Ξ0, K)] and E[W (Ξ0, K)2] Disk E[A(Ξ0)2] E[A(Ξ0)U(Ξ0)] E[U(Ξ0)2] Segment E[A(Ξ0)2] E[HΞ0 (θ)2] E[A(Ξ0)HΞ0 (θ)] Parallelogram E[A(Ξ0)2] E[HΞ0 (θ)2] E[A(Ξ0)HΞ0 (θ)] E[HΞ0 (θ1)HΞ0 (θ2)] additional quantity of interest SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 15 / 22 Practical aspects The simplest cases Estimation of 1st and 2nd-order moments E[A(Ξ0)] and E[W (Ξ0, K)] E[A(Ξ0)2], E[A(Ξ0)W (Ξ0, K)] and E[W (Ξ0, K)2] Disk E[A(Ξ0)2] E[A(Ξ0)U(Ξ0)] E[U(Ξ0)2] Segment E[A(Ξ0)2] E[HΞ0 (θ)2] E[A(Ξ0)HΞ0 (θ)] Parallelogram E[A(Ξ0)2] E[HΞ0 (θ)2] E[A(Ξ0)HΞ0 (θ)] E[HΞ0 (θ1)HΞ0 (θ2)] additional quantity of interest SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 15 / 22 Practical aspects Procedure Ξ(ω) ∩ M r1, · · · rnKRealizations (Ξ(ω) ⊕ ri K) ∩ (M ri K)Dilations Covariances CΞ⊕ri K Mean Covariograms ¯γΞ0⊕ri K Integration E[A(Ξ0 ⊕ ri K)2 ] = ¯γΞ0⊕ri K (u)du Polynomial fitting E[A(Ξ0)2], E[A(Ξ0)W (Ξ0, K)], E[W (Ξ0, K)2] SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 16 / 22 Practical aspects statistical aspects The following estimator for n point probability function is unbiased and strong consistent as M → ∞. ˆC (n) Ξ,M(x1, · · · xn) = A((Ξ ∩ M) {0, x1 − xn, · · · xn−1 − xn}) A(M {0, x1 − xn, · · · xn−1 − xn}) Then it follows a consistent estimator for n-variogram and thus the moments of (A(Ξ0), W (Ξ0, K)), but not necessarily unbiased. ⇒ Small bias for M bigger than Ξ0.(check by simulation) SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 17 / 22 Test by simulation Section 4 Test by simulation SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 18 / 22 Test by simulation Experiments Several realizations of the following Boolean model a ∼ N(40, 10), b ∼ N(30, 10) M : 500 × 500 λ = 100 500 × 500 r = 0, 1, · · · 10 SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 19 / 22 Test by simulation Results 0 500 1,000 1,500 2,000 2 5 10 20 30 number of realizations relativeerror(%) Dilation with a segment E[Hθ(Ξ0)2] E[A(Ξ0)Hθ(Ξ0)] E[A(Ξ0)2] 0 500 1,000 2 5 10 20 30 number of realizations relativeerror(%) Dilation with a disk E[U(Ξ0)2] E[A(Ξ0)U(Ξ0)] E[A(Ξ0)2] SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 20 / 22 Test by simulation Conclusions and prospects Conclusions Theoretical estimator for nth order moments of the process HΞ0 Practical estimation of 1st and 2nd-order moments of: t(A(Ξ0), U(Ξ0)) and t(A(Ξ0), HΞ0 (θ)) ⇒ Characterization of a random particle depending on 2 parameters: rectangle, ellipse... Prospects Describing complex random convex by first and second order characteristics of the process HΞ0 (Ex:Gaussian process). quantifying the anisotropy of the grain. Bias corrector. SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 21 / 22 Test by simulation Thanks for listening! SR (ENSM-SE / LGF-PMDM) GSI 2015 28/10/2015 22 / 22

Keynote speach Marc Arnaudon (chaired by Frank Nielsen)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We will prove a Euler-Poincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the Navier-Stokes equation on a bounded domain and the Camassa-Holm equation.
 
Stochastic Euler-Poincaré reduction

Deterministic framework Stochastic framework Stochastic Euler-Poincaré reduction. Marc Arnaudon Université de Bordeaux, France GSI, École Polytechnique, 29 October 2015 Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework References Arnaudon, Marc; Chen, Xin; Cruzeiro, Ana Bela; Stochastic Euler-Poincaré reduction. J. Math. Phys. 55 (2014), no. 8, 17pp Chen, Xin; Cruzeiro, Ana Bela; Ratiu, Tudor S.; Constrained and stochastic variational principles for dissipative equations with advected quantities. arXiv:1506.05024 Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework 1 Deterministic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V 2 Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a Riemannian manifold and L : TM × [0, T] → R a Lagrangian on M. Let q ∈ C1 a,b([0, T]; M) := {q ∈ C1([0, T], M), q(0) = a, q(T) = b}. The action functional C : C1 a,b([0, T]; M) → R is defined by C (q(·)) := T 0 L (q(t), ˙q(t), t) dt. The critical points for C satisfy the Euler-Lagrange equation d dt ∂L ∂ ˙q − ∂L ∂q = 0. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a Riemannian manifold and L : TM × [0, T] → R a Lagrangian on M. Let q ∈ C1 a,b([0, T]; M) := {q ∈ C1([0, T], M), q(0) = a, q(T) = b}. The action functional C : C1 a,b([0, T]; M) → R is defined by C (q(·)) := T 0 L (q(t), ˙q(t), t) dt. The critical points for C satisfy the Euler-Lagrange equation d dt ∂L ∂ ˙q − ∂L ∂q = 0. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a Riemannian manifold and L : TM × [0, T] → R a Lagrangian on M. Let q ∈ C1 a,b([0, T]; M) := {q ∈ C1([0, T], M), q(0) = a, q(T) = b}. The action functional C : C1 a,b([0, T]; M) → R is defined by C (q(·)) := T 0 L (q(t), ˙q(t), t) dt. The critical points for C satisfy the Euler-Lagrange equation d dt ∂L ∂ ˙q − ∂L ∂q = 0. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a Riemannian manifold and L : TM × [0, T] → R a Lagrangian on M. Let q ∈ C1 a,b([0, T]; M) := {q ∈ C1([0, T], M), q(0) = a, q(T) = b}. The action functional C : C1 a,b([0, T]; M) → R is defined by C (q(·)) := T 0 L (q(t), ˙q(t), t) dt. The critical points for C satisfy the Euler-Lagrange equation d dt ∂L ∂ ˙q − ∂L ∂q = 0. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Suppose that the configuration space M = G is a Lie group and L : TG → R is a left invariant Lagrangian: (ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ TeG, g ∈ G. (here and in the sequel, g · ξ = TeLgξ) The action functional C : C1 a,b([0, T]; G) → R is defined by C (g(·)) := T 0 L (g(t), ˙g(t)) dt = T 0 (ξ(t)) dt, where ξ(t) := g(t)−1 · ˙g(t). [J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical point for C if and only if it satisfies the Euler-Poincaré equation on T∗ e G d dt d dξ − ad∗ ξ(t) d dξ = 0, where ad∗ ξ : T∗ e G → T∗ e G is the dual action of adξ : TeG → TeG: ad∗ ξ η, θ = η, adξ θ , η ∈ T∗ e G, θ ∈ TeG. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Suppose that the configuration space M = G is a Lie group and L : TG → R is a left invariant Lagrangian: (ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ TeG, g ∈ G. (here and in the sequel, g · ξ = TeLgξ) The action functional C : C1 a,b([0, T]; G) → R is defined by C (g(·)) := T 0 L (g(t), ˙g(t)) dt = T 0 (ξ(t)) dt, where ξ(t) := g(t)−1 · ˙g(t). [J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical point for C if and only if it satisfies the Euler-Poincaré equation on T∗ e G d dt d dξ − ad∗ ξ(t) d dξ = 0, where ad∗ ξ : T∗ e G → T∗ e G is the dual action of adξ : TeG → TeG: ad∗ ξ η, θ = η, adξ θ , η ∈ T∗ e G, θ ∈ TeG. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Suppose that the configuration space M = G is a Lie group and L : TG → R is a left invariant Lagrangian: (ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ TeG, g ∈ G. (here and in the sequel, g · ξ = TeLgξ) The action functional C : C1 a,b([0, T]; G) → R is defined by C (g(·)) := T 0 L (g(t), ˙g(t)) dt = T 0 (ξ(t)) dt, where ξ(t) := g(t)−1 · ˙g(t). [J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical point for C if and only if it satisfies the Euler-Poincaré equation on T∗ e G d dt d dξ − ad∗ ξ(t) d dξ = 0, where ad∗ ξ : T∗ e G → T∗ e G is the dual action of adξ : TeG → TeG: ad∗ ξ η, θ = η, adξ θ , η ∈ T∗ e G, θ ∈ TeG. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V We will be interested in variations ξ(·) satisfying ˙ξ(t) = ˙ν(t) + adξ(t) ν(t) for some ν ∈ C1 ([0, T], TeG), which is equivalent to the variation of g(·) with the perturbation gε(t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE on G: d dt eε,ν (t) = εeε,ν (t) · ˙ν(t), eε,ν (0) = e. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V We will be interested in variations ξ(·) satisfying ˙ξ(t) = ˙ν(t) + adξ(t) ν(t) for some ν ∈ C1 ([0, T], TeG), which is equivalent to the variation of g(·) with the perturbation gε(t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE on G: d dt eε,ν (t) = εeε,ν (t) · ˙ν(t), eε,ν (0) = e. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a n-dimensional compact Riemannian manifold. We define Gs := g : M → M a bijection , g, g−1 ∈ Hs (M, M) , where Hs(M, M) denotes the manifold of Sobolev maps of class s > 1 + n 2 from M to itself. If s > 1 + n 2 then Gs is a C∞ Hilbert manifold. Gs is a group under composition between maps, right translation is smooth, left translation and inversion are only continuous. Gs is also a topological group (but not an infinite dimensional Lie group). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a n-dimensional compact Riemannian manifold. We define Gs := g : M → M a bijection , g, g−1 ∈ Hs (M, M) , where Hs(M, M) denotes the manifold of Sobolev maps of class s > 1 + n 2 from M to itself. If s > 1 + n 2 then Gs is a C∞ Hilbert manifold. Gs is a group under composition between maps, right translation is smooth, left translation and inversion are only continuous. Gs is also a topological group (but not an infinite dimensional Lie group). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let M be a n-dimensional compact Riemannian manifold. We define Gs := g : M → M a bijection , g, g−1 ∈ Hs (M, M) , where Hs(M, M) denotes the manifold of Sobolev maps of class s > 1 + n 2 from M to itself. If s > 1 + n 2 then Gs is a C∞ Hilbert manifold. Gs is a group under composition between maps, right translation is smooth, left translation and inversion are only continuous. Gs is also a topological group (but not an infinite dimensional Lie group). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V The tangent space TηGs at arbitrary η ∈ Gs is TηGs = U : M → TM of class Hs , U(m) ∈ Tη(m)M . The Riemannian structure on M induces the weak L2, or hydrodynamic, metric ·, · 0 on Gs given by U, V 0 η := M Uη(m), Vη(m) m dµg(m), for any η ∈ Gs, U, V ∈ TηGs. Here Uη := U ◦ η−1 ∈ TeGs and µg denotes the Riemannian volume asociated with (M, g). Obviously, ·, · 0 is a right invariant metric on Gs. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V The tangent space TηGs at arbitrary η ∈ Gs is TηGs = U : M → TM of class Hs , U(m) ∈ Tη(m)M . The Riemannian structure on M induces the weak L2, or hydrodynamic, metric ·, · 0 on Gs given by U, V 0 η := M Uη(m), Vη(m) m dµg(m), for any η ∈ Gs, U, V ∈ TηGs. Here Uη := U ◦ η−1 ∈ TeGs and µg denotes the Riemannian volume asociated with (M, g). Obviously, ·, · 0 is a right invariant metric on Gs. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V The tangent space TηGs at arbitrary η ∈ Gs is TηGs = U : M → TM of class Hs , U(m) ∈ Tη(m)M . The Riemannian structure on M induces the weak L2, or hydrodynamic, metric ·, · 0 on Gs given by U, V 0 η := M Uη(m), Vη(m) m dµg(m), for any η ∈ Gs, U, V ∈ TηGs. Here Uη := U ◦ η−1 ∈ TeGs and µg denotes the Riemannian volume asociated with (M, g). Obviously, ·, · 0 is a right invariant metric on Gs. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let be the Levi-Civita connection associated with the Riemannian manifold (M, g). We define a right invariant connection 0 on Gs by 0 ˜X ˜Y (η) := ∂ ∂t t=0 ˜Y(ηt ) ◦ η−1 t ◦ η + Xη Yη ◦ η, where ˜X, ˜Y ∈ L (Gs), Xη := ˜X ◦ η−1, Yη := ˜Y ◦ η−1 ∈ L s(M), and η is a C1 curve in Gs such that η0 = η and d dt t=0 ηt = ˜X(η). Here L (Gs) denotes the set of smooth vector fields on Gs. 0 is the Levi-Civita connection associated to Gs, ·, · 0 . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Let be the Levi-Civita connection associated with the Riemannian manifold (M, g). We define a right invariant connection 0 on Gs by 0 ˜X ˜Y (η) := ∂ ∂t t=0 ˜Y(ηt ) ◦ η−1 t ◦ η + Xη Yη ◦ η, where ˜X, ˜Y ∈ L (Gs), Xη := ˜X ◦ η−1, Yη := ˜Y ◦ η−1 ∈ L s(M), and η is a C1 curve in Gs such that η0 = η and d dt t=0 ηt = ˜X(η). Here L (Gs) denotes the set of smooth vector fields on Gs. 0 is the Levi-Civita connection associated to Gs, ·, · 0 . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V For s > 1 + n 2 , let Gs V := g, g ∈ Gs , g is volume preserving . Gs V is still a topological group. The tangent space TeGs V is G s V = TeGs V = U, U ∈ TeGs , div(U) = 0 . The L2-metric ·, · 0 and its Levi-Civita connection 0,V are defined on Gs V by orthogonal projection. More precisely the Levi Civita connection on Gs V is given by 0,V X Y = Pe( 0 X Y) with Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V For s > 1 + n 2 , let Gs V := g, g ∈ Gs , g is volume preserving . Gs V is still a topological group. The tangent space TeGs V is G s V = TeGs V = U, U ∈ TeGs , div(U) = 0 . The L2-metric ·, · 0 and its Levi-Civita connection 0,V are defined on Gs V by orthogonal projection. More precisely the Levi Civita connection on Gs V is given by 0,V X Y = Pe( 0 X Y) with Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V For s > 1 + n 2 , let Gs V := g, g ∈ Gs , g is volume preserving . Gs V is still a topological group. The tangent space TeGs V is G s V = TeGs V = U, U ∈ TeGs , div(U) = 0 . The L2-metric ·, · 0 and its Levi-Civita connection 0,V are defined on Gs V by orthogonal projection. More precisely the Levi Civita connection on Gs V is given by 0,V X Y = Pe( 0 X Y) with Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V For s > 1 + n 2 , let Gs V := g, g ∈ Gs , g is volume preserving . Gs V is still a topological group. The tangent space TeGs V is G s V = TeGs V = U, U ∈ TeGs , div(U) = 0 . The L2-metric ·, · 0 and its Levi-Civita connection 0,V are defined on Gs V by orthogonal projection. More precisely the Levi Civita connection on Gs V is given by 0,V X Y = Pe( 0 X Y) with Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Consider the ODE on M d dt (gt (x)) = u (t, gt (x)) g0(x) = x. Here u(t, ·) ∈ TeGs for every t > 0. For every fixed t > 0, gt (·) ∈ Gs(M). So g ∈ C1([0, T], Gs). If div(u(t)) = 0 for every t then g ∈ C1([0, T], Gs V ) Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Consider the ODE on M d dt (gt (x)) = u (t, gt (x)) g0(x) = x. Here u(t, ·) ∈ TeGs for every t > 0. For every fixed t > 0, gt (·) ∈ Gs(M). So g ∈ C1([0, T], Gs). If div(u(t)) = 0 for every t then g ∈ C1([0, T], Gs V ) Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V Consider the ODE on M d dt (gt (x)) = u (t, gt (x)) g0(x) = x. Here u(t, ·) ∈ TeGs for every t > 0. For every fixed t > 0, gt (·) ∈ Gs(M). So g ∈ C1([0, T], Gs). If div(u(t)) = 0 for every t then g ∈ C1([0, T], Gs V ) Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V [V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian path g ∈ C2([0, T], Gs V ) satisfying the equation above is a geodesic on Gs V , ·, · 0,V (i.e. 0,V ˙g(t) ˙g(t)) if and only of the velocity field u satisfies the Euler equation for incompressible inviscid fluids (E) ∂u ∂t = − uu − p divu = 0 Notice that the term p corresponds to the use of 0 instead of 0,V : the first system rewrites as ∂u ∂t = − 0,V u u divu = 0 Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V [V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian path g ∈ C2([0, T], Gs V ) satisfying the equation above is a geodesic on Gs V , ·, · 0,V (i.e. 0,V ˙g(t) ˙g(t)) if and only of the velocity field u satisfies the Euler equation for incompressible inviscid fluids (E) ∂u ∂t = − uu − p divu = 0 Notice that the term p corresponds to the use of 0 instead of 0,V : the first system rewrites as ∂u ∂t = − 0,V u u divu = 0 Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V If we take : TeGs V → R as (X) := X, X , X ∈ TeGs V , and define the action functional C : C1 e,e([0, T], Gs V ) → R by C (g(·)) := T 0 ˙g(t) · g(t)−1 dt, then a Lagrangian path g ∈ C2([0, T], Gs V ) integral path of u is a critical point of C if and only if u satisfies the Euler equation (E). [J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993] Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Euler-Poincaré equations Diffeomorphism group on a compact Riemannian manifold Volume preserving diffeomorphism group Lagrangian paths Characterization of the geodesics on Gs V , ·, · 0 Euler-Poincaré equation on Gs V [S. Shkoller 1998] If we take : TeGs V → R as the H1 metric (X) := M X, X m dµg(m) + α2 M X, X m dµg(m), X ∈ TeGs V , and define the action functional C : C1 e,e([0, T], Gs V ) → R in the same way as before, then a Lagrangian path g ∈ C2([0, T], Gs V ) integral path of u is a critical point of C if and only if u satisfies the Camassa-Holm equation    ∂ν ∂t + u · ν + α2 ( u)∗ · ∆ν = p, ν = (1 + α2∆)u, div(u) = 0. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Aim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group. To apply it to volume preserving diffeomorphisms of a compact symmetric space. Stochastic term will correspond for Euler equation to introducing viscosity. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Aim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group. To apply it to volume preserving diffeomorphisms of a compact symmetric space. Stochastic term will correspond for Euler equation to introducing viscosity. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Aim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group. To apply it to volume preserving diffeomorphisms of a compact symmetric space. Stochastic term will correspond for Euler equation to introducing viscosity. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations An Rn-valued semimartingale ξt has a decomposition ξt (ω) = Nt (ω) + At (ω) where (Nt ) is a local martingale and (At ) has finite variation. If (Nt ) is a martingale, then E[Nt |Fs] = Ns, t ≥ s. We are interested in semimartingales which furthermore satisfy At (ω) = t 0 as(ω) ds. Defining Dξt dt := lim ε→0 E ξt+ε − ξt ε Ft , we have Dξt dt = at Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations An Rn-valued semimartingale ξt has a decomposition ξt (ω) = Nt (ω) + At (ω) where (Nt ) is a local martingale and (At ) has finite variation. If (Nt ) is a martingale, then E[Nt |Fs] = Ns, t ≥ s. We are interested in semimartingales which furthermore satisfy At (ω) = t 0 as(ω) ds. Defining Dξt dt := lim ε→0 E ξt+ε − ξt ε Ft , we have Dξt dt = at Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations An Rn-valued semimartingale ξt has a decomposition ξt (ω) = Nt (ω) + At (ω) where (Nt ) is a local martingale and (At ) has finite variation. If (Nt ) is a martingale, then E[Nt |Fs] = Ns, t ≥ s. We are interested in semimartingales which furthermore satisfy At (ω) = t 0 as(ω) ds. Defining Dξt dt := lim ε→0 E ξt+ε − ξt ε Ft , we have Dξt dt = at Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Itô formula : f(ξt ) = f(ξ0) + t 0 df(ξs), dNs + t 0 df(ξs), dAs + 1 2 t 0 Hessf(dξs ⊗ dξs). From this we see that ξt is a local martingale if and only if for all f ∈ C2(Rn), f(ξt ) − f(ξ0) − 1 2 t 0 Hessf(dξs ⊗ dξs) is a real valued local martingale. This property becomes a definition for manifold-valued martingales. Definition Let at ∈ Tξt M an adapted process. If for all f ∈ C2(M) f(ξt )−f(ξ0)− t 0 df(ξs), as ds− 1 2 t 0 Hessf(dξs⊗dξs) is a real valued local martingale then Dξt dt = at . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Itô formula : f(ξt ) = f(ξ0) + t 0 df(ξs), dNs + t 0 df(ξs), dAs + 1 2 t 0 Hessf(dξs ⊗ dξs). From this we see that ξt is a local martingale if and only if for all f ∈ C2(Rn), f(ξt ) − f(ξ0) − 1 2 t 0 Hessf(dξs ⊗ dξs) is a real valued local martingale. This property becomes a definition for manifold-valued martingales. Definition Let at ∈ Tξt M an adapted process. If for all f ∈ C2(M) f(ξt )−f(ξ0)− t 0 df(ξs), as ds− 1 2 t 0 Hessf(dξs⊗dξs) is a real valued local martingale then Dξt dt = at . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Itô formula : f(ξt ) = f(ξ0) + t 0 df(ξs), dNs + t 0 df(ξs), dAs + 1 2 t 0 Hessf(dξs ⊗ dξs). From this we see that ξt is a local martingale if and only if for all f ∈ C2(Rn), f(ξt ) − f(ξ0) − 1 2 t 0 Hessf(dξs ⊗ dξs) is a real valued local martingale. This property becomes a definition for manifold-valued martingales. Definition Let at ∈ Tξt M an adapted process. If for all f ∈ C2(M) f(ξt )−f(ξ0)− t 0 df(ξs), as ds− 1 2 t 0 Hessf(dξs⊗dξs) is a real valued local martingale then Dξt dt = at . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Itô formula : f(ξt ) = f(ξ0) + t 0 df(ξs), dNs + t 0 df(ξs), dAs + 1 2 t 0 Hessf(dξs ⊗ dξs). From this we see that ξt is a local martingale if and only if for all f ∈ C2(Rn), f(ξt ) − f(ξ0) − 1 2 t 0 Hessf(dξs ⊗ dξs) is a real valued local martingale. This property becomes a definition for manifold-valued martingales. Definition Let at ∈ Tξt M an adapted process. If for all f ∈ C2(M) f(ξt )−f(ξ0)− t 0 df(ξs), as ds− 1 2 t 0 Hessf(dξs⊗dξs) is a real valued local martingale then Dξt dt = at . Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let G be a Lie group with right invariant metric ·, · and right invariant connection . Let G := TeG be the Lie algebra of G. Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C1([0, T], G ). Consider the Stratonovich equation dgt = i≥1 Hi ◦ dWi t − 1 2 Hi Hi dt + u(t) dt · gt g0 = e where the (Wi t ) are independent real valued Brownian motions. Itô formula writes f(gt ) =f(g0) + i≥1 t 0 df(gs), Hi dWi s + t 0 df(gs), u(s)gs ds + 1 2 i≥1 t 0 Hessf(Hi (gs), Hi (gs)) ds. This implies that Dgt dt = u(t)gt . Particular case If (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associated to the metric and u ≡ 0, then gt is a Brownian motion in G. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let G be a Lie group with right invariant metric ·, · and right invariant connection . Let G := TeG be the Lie algebra of G. Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C1([0, T], G ). Consider the Stratonovich equation dgt = i≥1 Hi ◦ dWi t − 1 2 Hi Hi dt + u(t) dt · gt g0 = e where the (Wi t ) are independent real valued Brownian motions. Itô formula writes f(gt ) =f(g0) + i≥1 t 0 df(gs), Hi dWi s + t 0 df(gs), u(s)gs ds + 1 2 i≥1 t 0 Hessf(Hi (gs), Hi (gs)) ds. This implies that Dgt dt = u(t)gt . Particular case If (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associated to the metric and u ≡ 0, then gt is a Brownian motion in G. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let G be a Lie group with right invariant metric ·, · and right invariant connection . Let G := TeG be the Lie algebra of G. Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C1([0, T], G ). Consider the Stratonovich equation dgt = i≥1 Hi ◦ dWi t − 1 2 Hi Hi dt + u(t) dt · gt g0 = e where the (Wi t ) are independent real valued Brownian motions. Itô formula writes f(gt ) =f(g0) + i≥1 t 0 df(gs), Hi dWi s + t 0 df(gs), u(s)gs ds + 1 2 i≥1 t 0 Hessf(Hi (gs), Hi (gs)) ds. This implies that Dgt dt = u(t)gt . Particular case If (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associated to the metric and u ≡ 0, then gt is a Brownian motion in G. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations On the space S (G) of G-valued semimartingales define J(ξ) = 1 2 E T 0 Dξ dt 2 dt . Perturbation: for v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0 and ε > 0, let eε,v (·) ∈ C1([0, T], G) the flow generated by εv: d dt eε,v (t) = ε ˙v(t) · eε,v (t) eε,v (0) = e Definition We say that g ∈ S (G) is a critical point of J if for all v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0, dJ dε ε=0 gε,v = 0 where gε,v (t) = eε,v (t)g(t). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations On the space S (G) of G-valued semimartingales define J(ξ) = 1 2 E T 0 Dξ dt 2 dt . Perturbation: for v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0 and ε > 0, let eε,v (·) ∈ C1([0, T], G) the flow generated by εv: d dt eε,v (t) = ε ˙v(t) · eε,v (t) eε,v (0) = e Definition We say that g ∈ S (G) is a critical point of J if for all v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0, dJ dε ε=0 gε,v = 0 where gε,v (t) = eε,v (t)g(t). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations On the space S (G) of G-valued semimartingales define J(ξ) = 1 2 E T 0 Dξ dt 2 dt . Perturbation: for v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0 and ε > 0, let eε,v (·) ∈ C1([0, T], G) the flow generated by εv: d dt eε,v (t) = ε ˙v(t) · eε,v (t) eε,v (0) = e Definition We say that g ∈ S (G) is a critical point of J if for all v ∈ C1([0, T], G ) satisfying v(0) = v(T) = 0, dJ dε ε=0 gε,v = 0 where gε,v (t) = eε,v (t)g(t). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Theorem g is a critical point of J if and only if du(t) dt = −ad∗ ˜u(t)u(t) − K(u(t)) with ˜u(t) = u(t) − 1 2 i≥1 Hi Hi , ad∗ u v, w = v, aduv and K : G → G satisfies K(u), v = − u, 1 2 i≥1 adv Hi Hi + Hi (adv (Hi )) Remark 1 If for all i ≥ 1, Hi = 0, or uv = 0 for all u, v ∈ G , then K(u) = 0 and we get the standard Euler-Poincaré equation. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Theorem g is a critical point of J if and only if du(t) dt = −ad∗ ˜u(t)u(t) − K(u(t)) with ˜u(t) = u(t) − 1 2 i≥1 Hi Hi , ad∗ u v, w = v, aduv and K : G → G satisfies K(u), v = − u, 1 2 i≥1 adv Hi Hi + Hi (adv (Hi )) Remark 1 If for all i ≥ 1, Hi = 0, or uv = 0 for all u, v ∈ G , then K(u) = 0 and we get the standard Euler-Poincaré equation. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Proposition If for all i ≥ 1, Hi Hi = 0 then K(u) = − 1 2 i≥1 Hi · Hi u + R(u, Hi )Hi . In particular if (Hi ) is an o.n.b. of G then K(u) = − 1 2 u = − 1 2 ∆u + 1 2 Ric u the Hodge Laplacian. Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let Gs v = {g : M → M volume preserving bijection, such that g, g−1 ∈ Hs }. Assume s > 1 + dimM 2 . Then Gs V is a C∞ smooth manifold. Lie algebra G s V = TeGs V = {X : Hs (M, TM), π(X) = e, div(X) = 0}. Notice that π(X) = e means that X is a vector field on M: X(x) ∈ Tx M. On G s V consider the two scalar products X, Y 0 = M X(x), Y(x) dx and X, Y 1 = M X(x), Y(x) dx + M X(x), Y(x) dx. The Levi Civita connection on Gs V is given by 0V X Y = Pe( 0 X Y) with 0 the Levi Civita connection of ·, · 0 on Gs and Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). One can find (Hi )i≥1 such that for all i ≥ 1, Hi Hi = 0, div(Hi ) = 0, and i≥1 H2 i f = ν∆f, f ∈ C2 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let Gs v = {g : M → M volume preserving bijection, such that g, g−1 ∈ Hs }. Assume s > 1 + dimM 2 . Then Gs V is a C∞ smooth manifold. Lie algebra G s V = TeGs V = {X : Hs (M, TM), π(X) = e, div(X) = 0}. Notice that π(X) = e means that X is a vector field on M: X(x) ∈ Tx M. On G s V consider the two scalar products X, Y 0 = M X(x), Y(x) dx and X, Y 1 = M X(x), Y(x) dx + M X(x), Y(x) dx. The Levi Civita connection on Gs V is given by 0V X Y = Pe( 0 X Y) with 0 the Levi Civita connection of ·, · 0 on Gs and Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). One can find (Hi )i≥1 such that for all i ≥ 1, Hi Hi = 0, div(Hi ) = 0, and i≥1 H2 i f = ν∆f, f ∈ C2 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Let Gs v = {g : M → M volume preserving bijection, such that g, g−1 ∈ Hs }. Assume s > 1 + dimM 2 . Then Gs V is a C∞ smooth manifold. Lie algebra G s V = TeGs V = {X : Hs (M, TM), π(X) = e, div(X) = 0}. Notice that π(X) = e means that X is a vector field on M: X(x) ∈ Tx M. On G s V consider the two scalar products X, Y 0 = M X(x), Y(x) dx and X, Y 1 = M X(x), Y(x) dx + M X(x), Y(x) dx. The Levi Civita connection on Gs V is given by 0V X Y = Pe( 0 X Y) with 0 the Levi Civita connection of ·, · 0 on Gs and Pe the orthogonal projection on G s V : Hs (TM) = G s V ⊕ dHs+1 (M). One can find (Hi )i≥1 such that for all i ≥ 1, Hi Hi = 0, div(Hi ) = 0, and i≥1 H2 i f = ν∆f, f ∈ C2 (M). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Corollary (1) g is a critical point of J ·,· 0 if and only if u solves Navier-Stokes equation ∂u ∂t = − uu + ν 2 ∆u − p divu = 0 (2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of J ·,· 1 if and only if u solves Camassa-Holm equation    ∂u ∂t = − uv − 2 j=1 vj uj + ν 2 ∆v − p v = u − ∆u divu = 0 For the proof, use Itô formula and compute in different situations ad∗ v (u) and K(u). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Corollary (1) g is a critical point of J ·,· 0 if and only if u solves Navier-Stokes equation ∂u ∂t = − uu + ν 2 ∆u − p divu = 0 (2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of J ·,· 1 if and only if u solves Camassa-Holm equation    ∂u ∂t = − uv − 2 j=1 vj uj + ν 2 ∆v − p v = u − ∆u divu = 0 For the proof, use Itô formula and compute in different situations ad∗ v (u) and K(u). Marc Arnaudon Stochastic Euler-Poincaré reduction. Deterministic framework Stochastic framework Semi-martingales in a Lie group Stochastic Euler-Poincaré reduction Group of volume preserving diffeomorphisms Navier-Stokes and Camassa-Holm equations Corollary (1) g is a critical point of J ·,· 0 if and only if u solves Navier-Stokes equation ∂u ∂t = − uu + ν 2 ∆u − p divu = 0 (2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of J ·,· 1 if and only if u solves Camassa-Holm equation    ∂u ∂t = − uv − 2 j=1 vj uj + ν 2 ∆v − p v = u − ∆u divu = 0 For the proof, use Itô formula and compute in different situations ad∗ v (u) and K(u). Marc Arnaudon Stochastic Euler-Poincaré reduction.

Information Geometry Optimization (chaired by Giovanni Pistone, Yann Ollivier)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
When observing data x1, . . . , x t modelled by a probabilistic distribution pθ(x), the maximum likelihood (ML) estimator θML = arg max θ Σti=1 ln pθ(x i ) cannot, in general, safely be used to predict xt + 1. For instance, for a Bernoulli process, if only “tails” have been observed so far, the probability of “heads” is estimated to 0. (Thus for the standard log-loss scoring rule, this results in infinite loss the first time “heads” appears.)
 
no preview

Laplace’s Rule of Succession in Information Geometry Yann Ollivier CNRS & Paris-Saclay University, France Sequential prediction Sequential prediction problem: given observations x1, . . . , xt, build a probabilistic model pt+1 for xt+1, iteratively. Sequential prediction Sequential prediction problem: given observations x1, . . . , xt, build a probabilistic model pt+1 for xt+1, iteratively. Example: given that w women and m men entered this room, what is the probability that the next person who enters is a woman/man? Sequential prediction Sequential prediction problem: given observations x1, . . . , xt, build a probabilistic model pt+1 for xt+1, iteratively. Example: given that w women and m men entered this room, what is the probability that the next person who enters is a woman/man? Common performance criterion for prediction: cumulated log-loss LT := − T−1∑︁ t=0 log pt+1 (xt+1 | x1...t) to be minimized. Sequential prediction Sequential prediction problem: given observations x1, . . . , xt, build a probabilistic model pt+1 for xt+1, iteratively. Example: given that w women and m men entered this room, what is the probability that the next person who enters is a woman/man? Common performance criterion for prediction: cumulated log-loss LT := − T−1∑︁ t=0 log pt+1 (xt+1 | x1...t) to be minimized. This corresponds to compression cost, and is also equal to square loss for Gaussian models. Maximum likelihood estimator Maximum likelihood strategy: Fix a parametric model p

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A divergence function defines a Riemannian metric G and dually coupled affine connections (∇, ∇  ∗ ) with respect to it in a manifold M. When M is dually flat, a canonical divergence is known, which is uniquely determined from {G, ∇, ∇  ∗ }. We search for a standard divergence for a general non-flat M. It is introduced by the magnitude of the inverse exponential map, where α = -(1/3) connection plays a fundamental role. The standard divergence is different from the canonical divergence.
 
no preview

GSI – 2015 - Paris Standard Divergence in Manifold of Dual Affine Connections Shun-ichi Amari (RIKEN Brain Science Institute) Nihat Ay (Max-Planck Inst. Mathematics in Science) Divergence and metric 3 : 0 1 : 2 i j ij D p q D d g d d O d : Riemannian metric, positive-definiteG Divergence and dual affine connections : : ; ijk ijk ijk i j k ijk i j k i ji j D D Dual geometry , , , , , , , , , 2 X X ijk ijk ijk o ijk ijk ijk M g X Y Z Y Z Y Z M g T T T : Levi-Civita connectiono Dual geometry canonical divergence : dually flat : , : M D Bregman divergence Exponential map : geodesict 0 0 1 0 logp p q X q X p q p q Exponential map divergence 2 : :D p q X p q -divergence 2 : :D p q X p q -geodesic Theorem 1. Exponential map divergence induces geometry3 Standard divergence: 2 stan 1/3: ,D p q X p q Theorem 2. exponential map divergence recovers the original geometry 1 3 2 2 stan 1/3 1/3 1 : , , 2 D p q X p q X q p Remark: dually flat case stan canD D stan stan[ : ] * [ : ]D p q D q p 21 0 [ : ] ( )D p q t t dt Divergence and projection projection theorem: ˆ argmin : q S p D p q grad :qX c D p q p ˆp S

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The statistical structure on a manifold M is predicated upon a special kind of coupling between the Riemannian metric g and a torsion-free affine connection ∇ on the TM, such that ∇ g is totally symmetric, forming, by definition, a “Codazzi pair” { ∇ , g}. In this paper, we first investigate various transformations of affine connections, including additive translation (by an arbitrary (1,2)-tensor K), multiplicative perturbation (through an arbitrary invertible operator L on TM), and conjugation (through a non-degenerate two-form h). We then study the Codazzi coupling of ∇ with h and its coupling with L, and the link between these two couplings. We introduce, as special cases of K-translations, various transformations that generalize traditional projective and dual-projective transformations, and study their commutativity with L-perturbation and h-conjugation transformations. Our derivations allow affine connections to carry torsion, and we investigate conditions under which torsions are preserved by the various transformations mentioned above. Our systematic approach establishes a general setting for the study of Information Geometry based on transformations and coupling relations of affine connections – in particular, we provide a generalization of conformal-projective transformation.
 
Transformations and Coupling Relations for Affine Connections

Transformations and Coupling Relations for Affine Connections James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI) Oct 29, 2015 James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Outline 1 Transformation of affine connections (with torsion) h-conjugation: by a two-form h gauge transform: by an operator L additive translation: by a (1,2)-tensor K 2 Commutative relations and “commutativity prisms” keeping track of “torsion” as going through the transformations 3 Transformations that preserve Codazzi coupling pg, q more general than “conformal-projective transformation”? James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Outline 1 Transformation of affine connections (with torsion) h-conjugation: by a two-form h gauge transform: by an operator L additive translation: by a (1,2)-tensor K 2 Commutative relations and “commutativity prisms” keeping track of “torsion” as going through the transformations 3 Transformations that preserve Codazzi coupling pg, q more general than “conformal-projective transformation”? James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Outline 1 Transformation of affine connections (with torsion) h-conjugation: by a two-form h gauge transform: by an operator L additive translation: by a (1,2)-tensor K 2 Commutative relations and “commutativity prisms” keeping track of “torsion” as going through the transformations 3 Transformations that preserve Codazzi coupling pg, q more general than “conformal-projective transformation”? James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Statistical manifold and Codazzi coupling On a differentiable manifold M, one independently prescribes: 1 a pseudo-Riemannian metric g; 2 an affine connection . Codazzi coupling of g and The pair pg, q is said to be Codazzi-coupled if p Z gqpX, Y q “ p X gqpZ, Y q. This notion is a generalization of Levi-Civita coupling (i.e., parallelism of g with respect to ). It can be shown that p , gq is Codazzi-coupled ÐÑ and ˚ have same torsion. Statistical manifold: definition A manifold pM, g, q where (i) is torsion-free and (ii) pg, q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Statistical manifold and Codazzi coupling On a differentiable manifold M, one independently prescribes: 1 a pseudo-Riemannian metric g; 2 an affine connection . Codazzi coupling of g and The pair pg, q is said to be Codazzi-coupled if p Z gqpX, Y q “ p X gqpZ, Y q. This notion is a generalization of Levi-Civita coupling (i.e., parallelism of g with respect to ). It can be shown that p , gq is Codazzi-coupled ÐÑ and ˚ have same torsion. Statistical manifold: definition A manifold pM, g, q where (i) is torsion-free and (ii) pg, q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Statistical manifold and Codazzi coupling On a differentiable manifold M, one independently prescribes: 1 a pseudo-Riemannian metric g; 2 an affine connection . Codazzi coupling of g and The pair pg, q is said to be Codazzi-coupled if p Z gqpX, Y q “ p X gqpZ, Y q. This notion is a generalization of Levi-Civita coupling (i.e., parallelism of g with respect to ). It can be shown that p , gq is Codazzi-coupled ÐÑ and ˚ have same torsion. Statistical manifold: definition A manifold pM, g, q where (i) is torsion-free and (ii) pg, q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Statistical manifold and Codazzi coupling On a differentiable manifold M, one independently prescribes: 1 a pseudo-Riemannian metric g; 2 an affine connection . Codazzi coupling of g and The pair pg, q is said to be Codazzi-coupled if p Z gqpX, Y q “ p X gqpZ, Y q. This notion is a generalization of Levi-Civita coupling (i.e., parallelism of g with respect to ). It can be shown that p , gq is Codazzi-coupled ÐÑ and ˚ have same torsion. Statistical manifold: definition A manifold pM, g, q where (i) is torsion-free and (ii) pg, q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conjugate connection g-conjugation of a connection Given any pg, q, conjugate connection ˚ can be defined: ZgpX, Y q “ gp Z X, Y q ` gpX, ˚ Z Y q. It can be verified that (i) ˚ is indeed a connection and (ii) the ˚ action on is involutive: p ˚ q˚ “ . Defining a connection by conjugacy with a non-degenerate two-form h: can be done unambiguously only when h is symmetric or skew-symmetric; otherwise “left conjugate” and “right conjugate”, in reference to the slot hp¨, ¨q, will not be the same. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conjugate connection g-conjugation of a connection Given any pg, q, conjugate connection ˚ can be defined: ZgpX, Y q “ gp Z X, Y q ` gpX, ˚ Z Y q. It can be verified that (i) ˚ is indeed a connection and (ii) the ˚ action on is involutive: p ˚ q˚ “ . Defining a connection by conjugacy with a non-degenerate two-form h: can be done unambiguously only when h is symmetric or skew-symmetric; otherwise “left conjugate” and “right conjugate”, in reference to the slot hp¨, ¨q, will not be the same. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Gauge transformation of connection Let L denote TM isomorphism. The gauge transformation of by L, denoted Lp q, is defined as (for vector fields X, Y ): pLp qqX Y “ L´1 p X pLY qq. pL, q is said to be Codazzi-coupled if p X LqY “ p Y LqX, where p X LqY ” X pLY q ´ Lp X Y q. Proposition (Schwenk-Schellschmidt and Simon, 2009) Let be an affine connection, and L be a tangent bundle isomorphism. Then the following are equivalent: 1 p , Lq is Codazzi-coupled. 2 and Lp q have equal torsions. 3 pLp q, L´1 q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Gauge transformation of connection Let L denote TM isomorphism. The gauge transformation of by L, denoted Lp q, is defined as (for vector fields X, Y ): pLp qqX Y “ L´1 p X pLY qq. pL, q is said to be Codazzi-coupled if p X LqY “ p Y LqX, where p X LqY ” X pLY q ´ Lp X Y q. Proposition (Schwenk-Schellschmidt and Simon, 2009) Let be an affine connection, and L be a tangent bundle isomorphism. Then the following are equivalent: 1 p , Lq is Codazzi-coupled. 2 and Lp q have equal torsions. 3 pLp q, L´1 q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Gauge transformation of connection Let L denote TM isomorphism. The gauge transformation of by L, denoted Lp q, is defined as (for vector fields X, Y ): pLp qqX Y “ L´1 p X pLY qq. pL, q is said to be Codazzi-coupled if p X LqY “ p Y LqX, where p X LqY ” X pLY q ´ Lp X Y q. Proposition (Schwenk-Schellschmidt and Simon, 2009) Let be an affine connection, and L be a tangent bundle isomorphism. Then the following are equivalent: 1 p , Lq is Codazzi-coupled. 2 and Lp q have equal torsions. 3 pLp q, L´1 q is Codazzi-coupled. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Linking g-conjugation with L-gauge transform We proved the following characterization theorem for g-conjugation of a connection in terms of any L: Characterization Theorem Let be a connection and ˚ its conjugate connection w.r.t. a metric g. Denote ωpX, Y q “ gpLX, Y q for arbitrary TM isomorphism L. Then ω “ 0 if and only if Lp ˚ q “ . Explicitly written: ˚ Z X “ Z X ` Lp Z L´1 qX. Proof used the identify (for any invertible operator L): ChpX, Y , Zq “ Cg pLpXq, Y , Zq ` gpp Z LqX, Y q, where CpX, Y , Zq ” p Z gqpX, Y q, hpX, Y q ” gpLpXq, Y q. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Translation of a connection by K-tensor Translation by a (1,2)-tensor: X Y Ñ X Y ` KpX, Y q. It is torsion-preserving iff K is symmetric: KpX, Y q “ KpY , Xq. Examples of K-translations (i) P_ pτq : X Y ÞÑ X Y ` τpXqY , P_ -transformation; (ii) Ppτq : X Y ÞÑ X Y ` τpY qX, P-transformation; (iii) Projpτq : X Y ÞÑ X Y ` τpY qX ` τpXqY , called projective transformation, always torsion-preserving; (iv) Dph, V q : X Y ÞÑ X Y ´ hpY , XqV , called “dual-projective transformation”, torsion-preserving when h symmetric. Here τ is an arbitrary one-form, h is a non-degenerate two-form, X, Y , V are all vector fields. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Translation of a connection by K-tensor Translation by a (1,2)-tensor: X Y Ñ X Y ` KpX, Y q. It is torsion-preserving iff K is symmetric: KpX, Y q “ KpY , Xq. Examples of K-translations (i) P_ pτq : X Y ÞÑ X Y ` τpXqY , P_ -transformation; (ii) Ppτq : X Y ÞÑ X Y ` τpY qX, P-transformation; (iii) Projpτq : X Y ÞÑ X Y ` τpY qX ` τpXqY , called projective transformation, always torsion-preserving; (iv) Dph, V q : X Y ÞÑ X Y ´ hpY , XqV , called “dual-projective transformation”, torsion-preserving when h symmetric. Here τ is an arbitrary one-form, h is a non-degenerate two-form, X, Y , V are all vector fields. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Interactions of h-conjugation, L-gauge, K-translation Let g, L, τ be as above. Let gL denote gpL¨, ¨q, ΓL denote L-gauge transformation, Cpgq denote conjugation w.r.t. g, and ¯τ be the vector field such that gpX, ¯τq “ τpXq. ‚ ‚ ‚ ‚ ‚ ‚ Ppτq Cpgq ΓL CpgLq Dpg,¯τq ΓL Cpgq DpgL,L´1p¯τqq CpgLq James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Interactions of h-conjugation, L-gauge, K-translation ‚ ‚ ‚ ‚ ‚ ‚ P_pτq Cpgq ΓL CpgLq P_p´τq ΓL Cpgq P_p´τq CpgLq James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conformal-projective transformation (CPT) Conformal-projective transformation (CPT) is defined (Kurose, 2002) as, for any smooth functions ψ and φ, gpX, Y q ÞÑ eψ`φ gpX, Y q X Y ÞÑ X Y ´ gpX, Y q gradg ψ ` XpφqY ` Y pφqX CPT include, as special cases, projective transformation of conformal transformation of g and Levi-Civita connection dual-projective transformation of , given pg, q Codazzi transform of g and α-conformal transformation of g and It is known that CPT preserves Codazzi coupling of pg, q. We wonder whether it can be further generalized while preserving Codazzi structure. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conformal-projective transformation (CPT) Conformal-projective transformation (CPT) is defined (Kurose, 2002) as, for any smooth functions ψ and φ, gpX, Y q ÞÑ eψ`φ gpX, Y q X Y ÞÑ X Y ´ gpX, Y q gradg ψ ` XpφqY ` Y pφqX CPT include, as special cases, projective transformation of conformal transformation of g and Levi-Civita connection dual-projective transformation of , given pg, q Codazzi transform of g and α-conformal transformation of g and It is known that CPT preserves Codazzi coupling of pg, q. We wonder whether it can be further generalized while preserving Codazzi structure. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conformal-projective transformation (CPT) Conformal-projective transformation (CPT) is defined (Kurose, 2002) as, for any smooth functions ψ and φ, gpX, Y q ÞÑ eψ`φ gpX, Y q X Y ÞÑ X Y ´ gpX, Y q gradg ψ ` XpφqY ` Y pφqX CPT include, as special cases, projective transformation of conformal transformation of g and Levi-Civita connection dual-projective transformation of , given pg, q Codazzi transform of g and α-conformal transformation of g and It is known that CPT preserves Codazzi coupling of pg, q. We wonder whether it can be further generalized while preserving Codazzi structure. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Conformal-projective transformation (CPT) Conformal-projective transformation (CPT) is defined (Kurose, 2002) as, for any smooth functions ψ and φ, gpX, Y q ÞÑ eψ`φ gpX, Y q X Y ÞÑ X Y ´ gpX, Y q gradg ψ ` XpφqY ` Y pφqX CPT include, as special cases, projective transformation of conformal transformation of g and Levi-Civita connection dual-projective transformation of , given pg, q Codazzi transform of g and α-conformal transformation of g and It is known that CPT preserves Codazzi coupling of pg, q. We wonder whether it can be further generalized while preserving Codazzi structure. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections CPpV , W , Lq preserving Codazzi Structure Generalized conformal-projective transformation CPpV , W , Lq Let V and W be vector fields, and L an invertible operator. CPpV , W , Lq consists of an L-perturbation of the metric g along with a torsion-preserving transformation Dpg, W qProjp ˜V q of the connection, where ˜V is the one-form given by ˜V pXq :“ gpV , Xq for any vector field X. Proposition. (Assuming dim M ě 4) CPpV , W , Lq preserves Codazzi pairs t , gu if and only if L “ ef for some smooth function f , and V ` W “ gradg f . Take ˜V to be an arbitrary one-form, not necessarily closed, and ˜W :“ df ´ ˜V for some fixed smooth function f . CPT results when f “ φ ` ψ, in which case df “ dφ ` dψ is a natural decomposition. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections CPpV , W , Lq preserving Codazzi Structure Generalized conformal-projective transformation CPpV , W , Lq Let V and W be vector fields, and L an invertible operator. CPpV , W , Lq consists of an L-perturbation of the metric g along with a torsion-preserving transformation Dpg, W qProjp ˜V q of the connection, where ˜V is the one-form given by ˜V pXq :“ gpV , Xq for any vector field X. Proposition. (Assuming dim M ě 4) CPpV , W , Lq preserves Codazzi pairs t , gu if and only if L “ ef for some smooth function f , and V ` W “ gradg f . Take ˜V to be an arbitrary one-form, not necessarily closed, and ˜W :“ df ´ ˜V for some fixed smooth function f . CPT results when f “ φ ` ψ, in which case df “ dφ ` dψ is a natural decomposition. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Recent development (Teng Fei and Jun Zhang) Let L be J (almost compatible structure) or K (almost para-complex structure): J2 “ ´id; K2 “ id. A compatible triple pg, ω, Lq satisfies: 1 gpLX, Y q ` gpX, LY q “ 0; 2 ωpLX, Y q “ ωpX, LY q; 3 ωpX, Y q “ gpLX, Y q; A manifold M is called: 1 symplectic if there exists a symplectic (skew-symmetric + non-degenerate) form ω that is closed: dω “ 0; 2 Fedosov if (i) M is symplectic and (ii) there exists a torsion-free connection parallel to ω : ω “ 0; 3 (para)K¨ahler if (i) M is symplectic and (ii) there exists an integrable L compatible with ω : ωpX, LY q “ ωpLX, Y q. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Recent development (Teng Fei and Jun Zhang) Let L be J (almost compatible structure) or K (almost para-complex structure): J2 “ ´id; K2 “ id. A compatible triple pg, ω, Lq satisfies: 1 gpLX, Y q ` gpX, LY q “ 0; 2 ωpLX, Y q “ ωpX, LY q; 3 ωpX, Y q “ gpLX, Y q; A manifold M is called: 1 symplectic if there exists a symplectic (skew-symmetric + non-degenerate) form ω that is closed: dω “ 0; 2 Fedosov if (i) M is symplectic and (ii) there exists a torsion-free connection parallel to ω : ω “ 0; 3 (para)K¨ahler if (i) M is symplectic and (ii) there exists an integrable L compatible with ω : ωpX, LY q “ ωpLX, Y q. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections Codazzi Structure and (Para)-K¨ahler Structure Main Theorem Let be a torsion-free connection on M, and L denote either J (almost complex) or K (almost para-complex) operator on TM. Then, for the following three statements, any two imply the third: 1 is Codazzi-coupled with g; 2 is Codazzi-coupled with L; 3 ω “ 0. As a result, M becomes a K¨ahler or para-K¨ahler manifold. In other words, Codazzi coupling of p , Lq turns a statistical manifold or Fedosov manifold into a (para-)K¨ahler manifold, which is then both statistical and symplectic. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections THANK YOU FOR ATTENTION!! Tao, J. and Zhang, J. (2015). Transformation and coupling relations for affine connections. Proceedings of GSI 2015. Springer. Fei, T. and Zhang, J, (in preparation). Interaction of Codazzi structur and (para)-Kahler structure. James Tao (Harvard University, Cambridge MA) Jun Zhang (University of Michigan, Ann Arbor MI)Transformations and Coupling Relations for Affine Connections

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper address the problem of online learning finite statistical mixtures of exponential families. A short review of the Expectation-Maximization (EM) algorithm and its online extensions is done. From these extensions and the description of the k-Maximum Likelihood Estimator (k-MLE), three online extensions are proposed for this latter. To illustrate them, we consider the case of mixtures of Wishart distributions by giving details and providing some experiments.
 
Online k-MLE for mixture modeling with exponential families

Online k-MLE for mixture modelling with exponential families Christophe Saint-Jean Frank Nielsen Geometry Science Information 2015 Oct 28-30, 2015 - Ecole Polytechnique, Paris-Saclay Application Context 2/27 We are interested in building a system (a model) which evolves when new data is available: x1, x2, . . . , xN, . . . The time needed for processing a new observation must be constant w.r.t the number of observations. The memory required by the system is bounded. Denote π the unknown distribution of X Outline of this talk 3/27 1 Online learning exponential families 2 Online learning of mixture of exponential families Introduction, EM, k-MLE Recursive EM, Online EM Stochastic approximations of k-MLE Experiments 3 Conclusions Reminder : (Regular) Exponential Family 4/27 Firstly, π will be approximated by a member of a (regular) exponential family (EF): EF = {f (x; θ) = exp { s(x), θ + k(x) − F(θ)|θ ∈ Θ} Terminology: λ source parameters. θ natural parameters. η expectation parameters. s(x) sufficient statistic. k(x) auxiliary carrier measure. F(θ) the log-normalizer: differentiable, strictly convex Θ = {θ ∈ RD|F(θ) < ∞} is an open convex set Almost all common distributions are EF members but uniform, Cauchy distributions. Reminder : Maximum Likehood Estimate (MLE) 5/27 Maximum Likehood Estimate for general p.d.f: ˆθ(N) = argmax θ N i=1 f (xi ; θ) = argmin θ − 1 N N i=1 log f (xi ; θ) assuming a sample χ = {x1, x2, ..., xN} of i.i.d observations. Maximum Likehood Estimate for an EF: ˆθ(N) = argmin θ − 1 N i s(xi ), θ − cst(χ) + F(θ) which is exactly solved in H, the space of expectation parameters: ˆη(N) = F(ˆθ(N) ) = 1 N i s(xi ) ≡ ˆθ(N) = ( F)−1 1 N i s(xi ) Exact Online MLE for exponential family 6/27 A recursive formulation is easily obtained Algorithm 1: Exact Online MLE for EF Input: a sequence S of observations Input: Functions s and ( F)−1 for some EF Output: a sequence of MLE for all observations seen before ˆη(0) = 0; N = 1; for xN ∈ S do ˆη(N) = ˆη(N−1) + N−1(s(xN) − ˆη(N−1)); yield ˆη(N) or yield ( F)−1(ˆη(N)); N = N + 1; Analytical expressions of ( F)−1 exist for most EF (but not all) Case of Multivariate normal distribution (MVN) 7/27 Probability density function of MVN: N(x; µ, Σ) = (2π)−d 2 |Σ|−1 2 exp−1 2 (x−µ)T Σ−1(x−µ) One possible decomposition: N(x; θ1, θ2) = exp{ θ1, x + θ2, −xxT F − 1 4 t θ1θ−1 2 θ1 − d 2 log(π) + 1 2 log |θ2|} =⇒ s(x) = (x, −xxT ) ( F)−1(η1, η2) = (−η1ηT 1 − η2)−1η1, 1 2(−η1ηT 1 − η2)−1 Case of the Wishart distribution 8/27 See details in the paper. Finite (parametric) mixture models 9/27 Now, π will be approximated by a finite (parametric) mixture f (·; θ) indexed by θ: π(x) ≈ f (x; θ) = K j=1 wj fj (x; θj ), 0 ≤ wj ≤ 1, K j=1 wj = 1 where wj are the mixing proportions, fj are the component distributions. When all fj ’s are EFs, it is called a Mixture of EFs (MEF). −5 0 5 10 0.000.050.100.150.200.25 x 0.1*dnorm(x)+0.6*dnorm(x,4,2)+0.3*dnorm(x,−2,0.5) Unknown true distribution f* Mixture distribution f Components density functions f_j Incompleteness in mixture models 10/27 incomplete observable χ = {x1, . . . , xN} deterministic ← complete unobservable χc = {y1 = (x1, z1), . . . , yN} Zi ∼ catK (w) Xi |Zi = j ∼ fj (·; θj ) For a MEF, the joint density p(x, z; θ) is an EF: log p(x, z; θ) = K j=1 [z = j]{log(wj ) + θj , sj (x) + kj (x) − Fj (θj )} = K j=1 [z = j] [z = j] sj (x) , log wj − Fj (θj ) θj + k(x, z) Expectation-Maximization (EM) [1] 11/27 The EM algorithm maximizes iteratively Q(θ; ˆθ(t), χ). Algorithm 2: EM algorithm Input: ˆθ(0) initial parameters of the model Input: χ(N) = {x1, . . . , xN} Output: A (local) maximizer ˆθ(t∗) of log f (χ; θ) t ← 0; repeat Compute Q(θ; ˆθ(t), χ) := Eˆθ(t) [log p(χc; θ)|χ] ; // E-Step Choose ˆθ(t+1) = argmaxθ Q(θ; ˆθ(t), χ) ; // M-Step t ← t +1; until Convergence of the complete log-likehood; EM for MEF 12/27 For a mixture, the E-Step is always explicit: ˆz (t) i,j = ˆw (t) j f (xi ; ˆθ (t) j )/ j ˆw (t) j f (xi ; ˆθ (t) j ) For a MEF, the M-Step then reduces to: ˆθ(t+1) = argmax {wj ,θj } K j=1 i ˆz (t) i,j i ˆz (t) i,j sj (xi ) , log wj − Fj (θj ) θj ˆw (t+1) j = N i=1 ˆz (t) i,j /N ˆη (t+1) j = F(ˆθ (t+1) j ) = i ˆz (t) i,j sj (xi ) i ˆz (t) i,j (weighted average of SS) k-Maximum Likelihood Estimator (k-MLE) [2] 13/27 The k-MLE introduces a geometric split χ = K j=1 ˆχ (t) j to accelerate EM : ˜z (t) i,j = [argmax j wj f (xi ; ˆθ (t) j ) = j] Equivalently, it amounts to maximize Q over partition Z [3] For a MEF, the M-Step of the k-MLE then reduces to: ˆθ(t+1) = argmax {wj ,θj } K j=1 |ˆχ (t) j | xi ∈ˆχ (t) j sj (xi ) , log wj − Fj (θj ) θj ˆw (t+1) j = |ˆχ (t) j |/N ˆη (t+1) j = F(ˆθ (t+1) j ) = xi ∈ˆχ (t) j sj (xi ) |ˆχ (t) j | (cluster-wise unweighted average of SS) Online learning of mixtures 14/27 Consider now the online setting x1, x2, . . . , xN, . . . Denote ˆθ(N) or ˆη(N) the parameter estimate after dealing N observations Denote ˆθ(0) or ˆη(0) their initial values Remark: For a fixed-size dataset χ, one may apply multiple passes (with shuffle) on χ. The increase in the likelihood function is no more guaranteed after an iteration. Stochastic approximations of EM(1) 15/27 Two main approaches to online EM-like estimation: Stochastic M-Step : Recursive EM (1984) [5] ˆθ(N) = ˆθ(N−1) + {NIc(ˆθ(N−1) }−1 θ log f (xN; ˆθ(N−1) ) where Ic is the Fisher Information matrix for the complete data: Ic(ˆθ(N−1) ) = −Eˆθ (N−1) j log p(x, z; θ) ∂θ∂θT A justification for this formula comes from the Fisher’s Identity: log f (x; θ) = Eθ[log p(x, z; θ)|x] One can recognize a second order Stochastic Gradient Ascent which requires to update and invert Ic after each iteration. Stochastic approximations of EM(2) 16/27 Stochastic E-Step : Online EM (2009) [7] ˆQ(N) (θ) = ˆQ(N−1) (θ)+α(N) Eˆθ(N−1) [log p(xN, zN; θ)|xN] − ˆQ(N−1) (θ) In case of a MEF, the algorithm works only with the cond. expectation of the sufficient statistics for complete data. ˆzN,j = Eθ(N−1) [zN,j |xN] ˆS (N) wj ˆS (N) θj = ˆS (N−1) wj ˆS (N−1) θj + α(N) ˆzN,j ˆzN,j sj (xN) − ˆS (N−1) wj ˆS (N−1) θj The M-Step is unchanged: ˆw (N) j = ˆη (N) wj = ˆS (N) wj ˆθ (N) j = ( Fj )−1 (ˆη (N) θj = ˆS (N) θj / ˆS (N) wj ) Stochastic approximations of EM(3) 17/27 Some properties: Initial values ˆS(0) may be used for introducing a ”prior”: ˆS (0) wj = wj , ˆS (0) θj = wj η (0) j Parameters constraints are automatically respected No matrix to invert ! Policy for α(N) has to be chosen (see [7]) Consistent, asymptotically equivalent to the recursive EM !! Stochastic approximations of k-MLE(1) 18/27 In order to keep previous advantages of online EM for an online k-MLE, our only choice concerns the way to affect xN to a cluster. Strategy 1 Maximize the likelihood of the complete data (xN, zN) ˜zN,j = [argmax j ˆw (N−1) j f (xN; ˆθ (N−1) j ) = j] Equivalent to Online CEM and similar to Mac-Queen iterative k-Means. Stochastic approximations of k-MLE(2) 19/27 Strategy 2 Maximize the likelihood of the complete data (xN, zN) after the M-Step: ˜zN,j = [argmax j ˆw (N) j f (xN; ˆθ (N) j ) = j] Similar to Hartigan’s method for k-means. Additional cost: pre-compute all possible M-Steps for the Stochastic E-Step. Stochastic approximations of k-MLE(3) 20/27 Strategy 3 Draw ˜zN,j from the categorical distribution ˜zN sampled from CatK ({pj = log( ˆw (N−1) j fj (xN; ˆθ (N−1) j ))}j ) Similar to sampling in Stochastic EM [3] The motivation is to try to break the inconsistency of k-MLE. For strategies 1 and 3, the M-Step reduces the update of the parameters for a single component. Experiments 21/27 True distribution π = 0.5N(0, 1) + 0.5N(µ2, σ2 2) Different values for µ2, σ2 for more or less overlap between components. A small subset of observations has be taken for initialization (k-MLE++ / k-MLE). Video illustrating the inconsistency of online k-MLE. Experiments on Wishart 22/27 Conclusions - Future works 23/27 On consistency: EM, Online EM are consistent k-MLE, online k-MLE (Strategies 1,2) are inconsistent (due to the Bayes error in maximizing the classification likelihood) Online stochastic k-MLE (Strategy 3) : consistency ? So, when components overlap, online EM > k-MLE > online k-MLE for parameter learning. Need to study how the dimension influences the inconstancy/convergence rate for online k-MLE. Convergence rate is lower for online methods (sub-linear convergence of the SGD) Time for an update vs sample size: online k-MLE (1,3) < online EM < online k-MLE (2) << k-MLE 24/27 online EM appears to be the best compromise !! References I 25/27 Dempster, A.P., Laird, N.M. and Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), pp. 1–38, 1977. Nielsen, F.: On learning statistical mixtures maximizing the complete likelihood Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2014), AIP Conference Proceedings Publishing, 1641, pp. 238-245, 214. Celeux, G. and Govaert, G.: A classification EM algorithm for clustering and two stochastic versions. Computational Statistics and Data Analysis, 14(3), pp. 315-332, 1992. References II 26/27 Sam´e, A., Ambroise, C., Govaert, G.: An online classification EM algorithm based on the mixture model Statistics and Computing, 17(3), pp. 209–218, 2007. Titterington, D. M. : Recursive Parameter Estimation Using Incomplete Data. Journal of the Royal Statistical Society. Series B (Methodological), Volume 46, Number 2, pp. 257–267, 1984. Amari, S. I. : Natural gradient works efficiently in learning. Neural Computation, Volume 10, Number 2, pp. 251?276, 1998. Capp´e, O., Moulines, E.: On-line expectation-maximization algorithm for latent data models. Journal of the Royal Statistical Society. Series B (Methodological), 71(3):593-613, 2009. References III 27/27 Neal, R. M., Hinton, G. E.: A view of the EM algorithm that justifies incremental, sparse, and other variants. In Jordan, M. I., editor, Learning in graphical models, pages 355-368. MIT Press, Cambridge, 1999. Bottou, L´eon : Online Algorithms and Stochastic Approximations. Online Learning and Neural Networks, Saad, David Eds.,Cambridge University Press, 1998.

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method.
 
Second-order Optimization over the Multivariate Gaussian Distribution

GSI2015 2nd conference on Geometric Science of Information 28-30 Oct 2015 Ecole Polytechnique Paris-Saclay Second-order Optimization over the Multivariate Gaussian Distribution Luigi Malag`o 1 2 Giovanni Pistone 1Shinshu University JP & INRIA Saclay FR 2de Castro Statistics, Collegio Carlo Alberto, Moncalieri IT Introduction • This is is the presentation by Giovanni of the paper with the same title in the Proceedings. • Unfortunately, Giovanni is the least qualified of the two authors to present this specific application of Information Geometry, his specific field of expertise being non-parametric Information Geometry and its applications in Probability and Statistical Physics. Luigi is currently working in Japan and could not make it. • Among the two of us, Luigi is the responsible for the idea of using gradient methods and later, Newton methods, in black box optimization. Our collaboration started with the preparation of the FOGA 2011 paper • L. Malag`o, M. Matteucci, and G. Pistone. Towards the geometry of estimation of distribution algorithms based on the exponential family. In Proceedings of the 11th workshop on Foundations of genetic algorithms, FOGA ’11, pages 230–242, New York, NY, USA, 2011. ACM Summary 1. Geometry of the Exponential Family 2. Second-Order Optimization: The Newton Method 3. Applications to the Gaussian Distribution 4. Discussion and Future Work • A short introduction for Taylor formulæ on Gaussian exponential families is provided. The binary case has been previously discussed in • L. Malag`o and G. Pistone. Combinatorial optimization with information geometry: Newton method. Entropy, 16:4260–4289, 2014. • Riemannian Newton methods are discussed in a Session of this Conference cf, • P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, Princeton, NJ, 2008. With a foreword by Paul Van Dooren • The focus of this short presenation is on a specific framework for Information Geometry we call statistical bundle. Hilbert vs Tangent vs Statistical Bundle • S. Amari. Dual connections on the Hilbert bundles of statistical models. In Geometrization of statistical theory (Lancaster, 1987), pages 123–151, Lancaster, 1987. ULDM Publ • R. E. Kass and P. W. Vos. Geometrical foundations of asymptotic inference. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York, 1997. Statistical Bundle: Gaussian case • Hα(x), x ∈ Rm , are Hermite polynomials of order 1 and 2. • E.g, m = 3, H010(x) = x2, H011(x) = x2x3, H020(x) = x2 2 − 1. • The Gaussian model with sufficient statistics B = {X1, . . . , Xn} ⊂ {Hα||α| = 1, 2}, is N =    p(x; θ) = exp   n j=1 θj Xj − ψ(θ)      • The fibers are Vp = Span (Xj − Ep [Xj ]|j = 1, . . . , n) • The statistical bundle is SN = {(p, U)|p ∈ N, U ∈ Vp} • Each U ∈ Vp, p ∈ N, is a polynomial of degree up to 2 and t → Eq etU is finite around 0, q ∈ N • Every polynomial X belongs to ∩q∈N L2 (q) Parallel transports Definition • e-transport: e Uq p : Vp U → U − Eq [U] ∈ Vq . • m-transport: for each U ∈ Vp and V ∈ Vq U, m Up qV p = e Uq pU, V q Properties • e Ur q e Uq p = e Ur p • m Ur q m Uq p = m Ur p • e Uq pU, m Uq pV q = U, V p • If q p V ∈ L2 (p), then m Up qV is its orthogonal projection onto Vp. Parallel transports in coordinates I We define on the statistical bundle SN a system of moving frames. 1. The exponential frame of the fiber SpN = Vp is the vector basis Bp = {Xj − Ep [Xj ]|j = 1, . . . , n} 2. Each element U ∈ Vp is uniquely written as U = n j=1 αj (U)(Xj − Ep [Xj ]) = α(U)T (X − Ep [X]) 3. The expression in the exponential frame of the scalar product is the Fisher information matrix: e Iij (p) = Xi − Ep [Xi ] , Xj − Ep [Xj ] p = Covp (Xi , Xj ) = ∂2 ∂θi 2 θj ψ(θ) 4. U → α(U) = e I (p)−1 Covp (X, U) Parallel transports in coordinates II 5. The mixture frame of the fiber SpN = Vp is e I (p) −1 Bp = n i=1 e Iij (p)(Xi − Ep [Xi ]) j = 1, . . . , n 6. Each element V ∈ Vp is uniquely written as V = n j=1 βj (V ) n i=1 e Iij (p)(Xi −Ep [Xi ]) = β(V )T e I (p)−1 (X−Ep [X]) 7. The coordinates in the mixture basis are given in matrix form by V → β(V ) = Covp (X, V ) . 8. The matrix m I (p) = e I (p)−1 is the matrix expression of the metric in the mixture frame. α(U) = m I (p)β(U), β(U) = e I (p)α(U) . Parallel transports in the moving frames • The e-transport acts on the exponential coordinates as the identity, α e Uq pU = α(U) • Equivalently, = e I (q)−1 Covq (X, U) = e I (p)−1 Covp (X, U) • The m-transport acts on the mixture coordinates as the identity, β m Uq pV = β(V ) REMARK A section or vector field of the statistical bundle is a mapping F : N p → F(p) ∈ Vp. As there are two distingushed charts on the model (exponential p → θ(p) and mixture p → η(p) = ψ(θ(p))) and two distinguished frames on each fiber, there are in general four distinguished expression of each section. Score and statistical gradient Definition t → p(t) is a curve in the model N and f : N → R. • The score of the curve t → p(t) is a curve in the statistical bundle t → (p(t), Dp(t)) ∈ SN such that for all X ∈ Span (1, X1, . . . , Xn) it holds d dt Ep(t) [X] = X − Ep(t) [X] , Dp(t) p(t) • Usually, Dp(t) = ˙p(t) p(t) = d dt log p(t) • The statistical gradient of f : N → R is a section of the statistical bundle, p → (p, grad f (p)) ∈ SN such that for each regular curve t → p(t), it holds d dt f (p(t)) = grad f (p(t)), Dp(t) p(t) Score and statistical gradient in coordinates • Let the regular curve t → p(t) be expressed in the exponential coordinates by t → θ(t). The score t → Dp(t) is expressed in the exponential frame by t → ˙θ(t) that is, Dp(t) = n j=1 ˙θj (t)(Xj − ∂ ∂θj ψ(θ(t))) • Let the regular curve t → p(t) be expressed in the mixture coordinates by t → η(t) = ψ(θ(t)). The score is expressed in the mixture frame as t → ˙η(t). • Let X be a random variable which belongs to all L2 (p), p ∈ N and f (p) = Ep [f ]. Then p → grad f (p) exists and equals the orthogonal projection of X onto Vp, namely grad(p → Ep [X]) = e I (p)−1 Covp (X, X) (X − Ep [X]), X = (X1, . . . , Xn) . • The expressions of grad f are of interest in optimization. Taylor formula in the Statistical Bundle • For a curve t → p(t) ∈ N connecting p = p(0) to q = p(1) and a function f : N → R the Taylor formula is f (q) = f (p) + d dt f (p(t)) t=o + 1 2 d2 dt2 f (p(t)) t=o + R2(f , p, q) • The first derivative is computed with the statistical gradient and the score f (q) = f (p) + grad f (p(0)), Dp(0) p + 1 2 d dt grad f (p(t)), Dp(t) p(t) t=o + R2(f , p, q) Accelleration and Hessian d dt grad f (p(t)), Dp(t) p(t) t=o = d dt e U p(0) p(t) grad f (p(t)), m U p(0) p(t)Dp(t) p(0) t=o d dt grad f (p(t)), Dp(t) p(t) t=o = d dt m U p(0) p(t) grad f (p(t)), m U p(0) p(t)Dp(t) p(0) t=o d dt grad f (p(t)), Dp(t) p(t) t=o = d dt Ep(0) p(t) p(0) grad f (p(t))Dp(t) t=o Accellerations • Let us define the acceleration at t of a curve t → p(t) ∈ N. The velocity is defined to be t → (p(t), Dp(t)) = p(t), d dt log (p(t)) ∈ SN • The exponential acceleration is e D2 p(t) = d ds e U p(t) p(s)Dp(s) s=t • The mixture acceleration is m D2 p(t) = d ds m U p(t) p(s)Dp(s) s=t • The Riemannian acceleration is 0 D2 p(t) = 1 2 e D2 p(t) + m D2 p(t) Covariant derivatives I • p → (p, F(p)), p → (p, G(p)), are sections of SN, with expressions in the moving frames F(p) = n j=1 αj (p)(Xj − Ep [Xj ]) , F(p) = n j=1 βj (p) n i=1 e Iij (p)(Xi − Ep [Xi ]) , G(p) = n j=1 γj (p)(Xj − Ep [Xj ]) , G(p) = n j=1 δj (p) n i=1 e Iij (p)(Xi − Ep [Xi ]) . Covariant derivatives II • The exponential covariant derivative is the vector field p → (p, e DG F(p)), where e DG F(p) = n j=1 grad αj (p), G(p) p (Xj − Ep [Xj ]) = n j=1 n i=1 γi (p)(∂i grad αj (p))(Xj − Ep [Xj ]) • The mixture covariant derivative is the vector field p → (p, m DG F(p)), where m DG F(p) = n j=1 grad βj (p), G(p) p n i=1 e Iij (p)(Xi − Ep [Xi ]) = n j=1 n k=1 γk (p) grad βj (p), Xk − Ep [Xk ] p n i=1 e Iij (p)(Xi −Ep [Xi ]) Covariant derivatives III • The Riemannian covariant derivative is the vector field p → (p, 0 DG F(p)) with 0 DG F = 1 2 (e DG F + m DG F) . Hessians • Let f : N → R be a mapping with gradient p → (p, grad f (p)). Let p → (p, G(p)) be a vector field (section) of SN. • The exponential Hessian of f is the vector field p → (p, e HessG f (p)), with e HessG f (p) = e DG grad f (p) . • The mixture Hessian of f is the vector field p → (p, m HessG f (p)), with m HessG f (p) = m DG grad f (p) . • The Riemannian Hessian of F is the vector field p → (p, 0 HessG F(p)), with 0 HessG f (p) = 0 DG grad f (p) . Taylor’s formulæ I 1. t → p(t) is the mixture geodesic connecting p = p(0) to q = p(1). f (q) = f (p) + grad f (p), Dp(0) p + 1 2 e HessDp(0)f (p), Dp(0) p + R+ 2 (p, q) R+ 2 (p, q) = 1 0 dt (1 − t) e HessDp(t)f (p(t)), Dp(t) p(t) − 1 2 e HessDp(0)f (p), Dp(0) p Taylor’s formulæ II 2. t → p(t) is the exponential geodesic connecting p = p(0) to q = p(1). f (q) = f (p) + grad f (p), Dp(0) p + 1 2 m HessDp(0)f (p), Dp(0) p + R− 2 (p, q) R− 2 (p, q) = 1 0 dt (1 − t) m HessDp(t)f (p(t)), Dp(t) p(t) − 1 2 m HessDp(0)f (p), Dp(0) p Taylor’s formulæ III 3. t → p(t) is the Riemannian geodesic connecting p = p(0) to q = p(1). f (q) = f (p) + grad f (p), Dp(0) p + 1 2 0 HessDp(0)f (p), Dp(0) p + R0 2 (p, q) where R0 2 (p, q) = 1 0 dt(1 − t) 0 HessDp(t)f (p(t)), Dp(t) p(t) − 1 2 0 HessDp(0)f (p), Dp(0) p Newton step • Let t → p(t) be the exponential geodesic starting at p = p(0) with Dp(0) = U. • Assume U is a critical point of Vp(0) U → f (p) + grad f (p(0)), U p(0) + 1 2 m HessU f (p), U p(0) that is grad f (p(0)) + m HessU f (p) = 0 • If q = p(1), then f (q) = f (p) − 1 2 0 HessU f (p), U p + R0 2 (p, q) Conclusion and work in progress • Comparisons between the Riemannian Newton method e.g., Absil et al., and the statistical bundle setup are being performed. • In particular, the use of alternative Hessians is of special interest.

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We prove the equivalence of two online learning algorithms, mirror descent and natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-Euclidean manifold. Natural gradient descent selects the steepest descent direction along a Riemannian manifold by multiplying the standard gradient by the inverse of the metric tensor. Mirror descent induces non-Euclidean structure by solving iterative optimization problems using different proximity functions. In this paper, we prove that mirror descent induced by a Bregman divergence proximity functions is equivalent to the natural gradient descent algorithm on the Riemannian manifold in the dual coordinate system.We use techniques from convex analysis and connections between Riemannian manifolds, Bregman divergences and convexity to prove this result. This equivalence between natural gradient descent and mirror descent, implies that (1) mirror descent is the steepest descent direction along the Riemannian manifold corresponding to the choice of Bregman divergence and (2) mirror descent with log-likelihood loss applied to parameter estimation in exponential families asymptotically achieves the classical Cramér-Rao lower bound.
 
The information geometry of mirror descent

Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee (Duke) 29 Oct 2015 Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 1 / 18 Optimization of large-scale problems Optimization of a function f (θ) where θ ∈ Rp. O( √ p) - convergence rate of standard subgradient descent. A problem in modern optimization, e.g. machine learning. Mirror descent [A Nemirovski, 1979. A Beck & M Teboulle, 2003]: O(log p) - convergence rate of mirror descent. Widely used tool in optimization and machine learning. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 2 / 18 Differential geometry in statistics (1) Cram´er-Rao lower bound (Rao 1945) - Lower bound on the variance of an estimator is a function of curvature. Sometimes called Cram´er-Rao-Fr´echet-Darmois lower bound. (2) Invariant (non-informative) priors (Jeffreys 1946) - An uniformative prior distribution for a parameter space is based on a differential form. (3) Information geometry (Amari 1985) - Differential geometry of probability distributions. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 3 / 18 Stochastic gradient descent Given a convex differentiable cost function, f : Θ → R. Generate a sequence of parameters {θt}∞ t=1 which incur a loss f (θt) that minimize regret at a time T, T t=1 f (θt). One solution θt+1 = θt − αt f (θt), where (αt)∞ t=0 denotes a sequence of step-sizes. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 4 / 18 Natural gradient For certain cost functions (log-likelihoods of exponential family models) the set of parameters Θ are supported on a p-dimensional Riemannian manifold, (M, H). Typically the metric tensor H = (hjk) is determined by the Fisher information matrix (I(θ))ij = EData ∂ ∂θi f (x; θ) ∂ ∂θj f (x; θ) θ , i, j = 1, . . . , p. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 5 / 18 Natural gradient Given a cost function f on the Riemannian manifold f : M → R, the natural gradient descent step is: θt+1 = θt − αtH−1 (θt) f (θt), where H−1 is the inverse of the Riemannian metric. The natural gradient algorithm steps in the direction of steepest descent along the Riemannian manifold (M, H). It requires a matrix inversion. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 6 / 18 Mirror descent Gradient descent can be written θt+1 = arg min θ∈Θ θ, f (θt) + 1 2αt θ − θt 2 2 . For a (strictly) convex proximity function Ψ : Rp × Rp → R+ mirror descent is θt+1 = arg min θ∈Θ θ, f (θt) + 1 αt Ψ(θ, θt) . Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 7 / 18 Bregman divergence Let G : Θ → R be a strictly convex twice-differentiable function the Bregman divergence is BG (θ, θ ) = G(θ) − G(θ ) − G(θ ), θ − θ . Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 8 / 18 Bregman divergences for exponential family Family G(θ) BG (θ, θ ) N(θ, Ip×p) 1 2 θ 2 2 1 2 θ − θ 2 2 Poi(eθ) exp(θ) exp (θ/θ ) − exp(θ ), θ − θ Be 1 1+e−θ log(1 + exp(θ)) log 1+eθ 1+eθ − eθ 1+eθ , θ − θ Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 9 / 18 Mirror descent Mirror descent using the Bregman divergence as the proximity function θt+1 = arg min θ θ, f (θt) + 1 αt BG (θ, θt) . Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 10 / 18 Convex duals The convex conjugate function for a function G is defined to be: H(µ) := sup θ∈Θ { θ, µ − G(θ)} . Let µ = g(θ) ∈ Φ be the extremal point of the dual. The dual Bregnman divergence BH : Φ × Φ → R+ is BH(µ, µ ) = H(µ) − H(µ ) − H(µ ), µ − µ . Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 11 / 18 Dual Bregman divergences for exponential family G(θ) H(µ) BH(µ, µ ) 1 2 θ 2 2 1 2 µ 2 2 1 2 µ − µ 2 2 exp(θ) µ, log µ − µ µ log µ µ log(1 + exp(θ)) η log µ (1 − µ) log 1−µ 1−µ +(1 − µ) log(1 − µ) +µ log µ µ Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 12 / 18 Manifolds in primal and dual co-ordinates BG (·, ·) induces a Riemannian manifold (Θ, 2G) in the primal co-ordinates. Φ be the image of Θ under the continuous map g = G. BH : Φ × Φ → R+ induces the same Riemannian manifold (Φ, 2H) under dual co-ordinates Φ. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 13 / 18 Equivalence Theorem (Raskutti, Mukherjee) The mirror descent step with Bregman divergence defined by G applied to function f in the space Θ is equivalent to the natural gradient step along Riemannian manifold (Φ, 2H) in dual co-ordinates. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 14 / 18 Consequences Exponential family with density: p(y | θ) = h(y) exp( θ, y − G(θ)). Consider the following mirror descent step given yt θt+1 = arg min θ θ, θBG (θ, h(yt))|θ=θt + 1 αt BG (θ, θt) . In dual coordinates one would minimize ft(µ; yt) = − log p(yt | µ) = BH(yt, µ). The natural gradient step is µt+1 = µt − αt[ 2 H(µt)]−1 BH(yt, µt), = µt+1 = µt − αt(µt − yt), the curvature of the loss BH(yt, µt) matches the metric tensor 2H(µ). Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 15 / 18 Statistical efficiency Given independent samples YT = (y1, ..., yT ) and a sequence of unbiased estimators µT is Fisher efficient if lim T→∞ EYT [(µT − µ)(µT − µ)T ] → 1 T 2 H, where 2H is the inverse of the Fisher information matrix. Theorem (Raskutti, Mukherjee) The mirror descent step applied to the log loss (??) with step-sizes αt = 1 t asymptotically achieves the Cram´er-Rao lower bound. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 16 / 18 Challenges (1) Information geometry on mixture of manifolds. (2) Proximity functions for functions over the Grassmannian. (3) EM algorithms for mixtures. Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 17 / 18 Acknowledgements Funding: Center for Systems Biology at Duke NSF DMS and CCF DARPA AFOSR NIH Anthea Monod (Duke) Information geometry of mirror descent 29 Oct 2015 18 / 18

Geometry of Time Series and Linear Dynamical systems (chaired by Bijan Afsari, Arshia Cont)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present in this paper a novel non-parametric approach useful for clustering independent identically distributed stochastic processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com .
 
TS-GNPR Clustering Random Walk Time Series

Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Clustering Random Walk Time Series GSI 2015 - Geometric Science of Information Gautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat 29 October 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Context (data from www.datagrapple.com) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What is a clustering program? Definition Clustering is the task of grouping a set of objects in such a way that objects in the same group (cluster) are more similar to each other than those in different groups. Example of a clustering program We aim at finding k groups by positioning k group centers {c1, . . . , ck} such that data points {x1, . . . , xn} minimize minc1,...,ck n i=1 mink j=1 d(xi , cj )2 But, what is the distance d between two random walk time series? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion What are clusters of Random Walk Time Series? French banks and building materials CDS over 2006-2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Geometry of RW TS ≡ Geometry of Random Variables i.i.d. observations: X1 : X1 1 , X2 1 , . . . , XT 1 X2 : X1 2 , X2 2 , . . . , XT 2 . . . , . . . , . . . , . . . , . . . XN : X1 N, X2 N, . . . , XT N Which distances d(Xi , Xj ) between dependent random variables? Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N(µX , σ2 X ), Y ∼ N(µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2] = (µX − µY )2 + (σX − σY )2. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Pitfalls of a basic distance Let (X, Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σ2 X ), Y ∼ N (µY , σ2 Y ) and whose correlation is ρ(X, Y ) ∈ [−1, 1]. E[(X − Y ) 2 ] = (µX − µY ) 2 + (σX − σY ) 2 + 2σX σY (1 − ρ(X, Y )) Now, consider the following values for correlation: ρ(X, Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σ2 X + σ2 Y . Assume µX = µY and σX = σY . For σX = σY 1, we obtain E[(X − Y )2 ] 1 instead of the distance 0, expected from comparing two equal Gaussians. ρ(X, Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 . 30 20 10 0 10 20 30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Probability density functions of Gaus- sians N(−5, 1) and N(5, 1), Gaus- sians N(−5, 3) and N(5, 3), and Gaussians N(−5, 10) and N(5, 10). Green, red and blue Gaussians are equidistant using L2 geometry on the parameter space (µ, σ). Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN) = C(P1(X1), . . . , PN(XN)), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Sklar’s Theorem Theorem (Sklar’s Theorem (1959)) For any random vector X = (X1, . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P is uniquely expressed as P(X1, . . . , XN ) = C(P1(X1), . . . , PN (XN )), where C, the multivariate distribution of uniform marginals, is known as the copula of X. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN) := P(X) = (P1(X1), . . . , PN(XN)) is known as the copula transform. Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probability integral transform): for Pi the cdf of Xi , we have x = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thus Pi (Xi ) ∼ U[0, 1]. Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Copula Transform Definition (The Copula Transform) Let X = (X1, . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N. The random vector U = (U1, . . . , UN ) := P(X) = (P1(X1), . . . , PN (XN )) is known as the copula transform. 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 X∼U[0,1] 10 8 6 4 2 0 2 Y∼ln(X) ρ≈0.84 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PX (X) 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 PY(Y) ρ=1 The Copula Transform invariance to strictly increasing transformation Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Deheuvels’ Empirical Copula Transform Let (Xt 1 , . . . , Xt N ), 1 ≤ t ≤ T, be T observations from a random vector (X1, . . . , XN ) with continuous margins. Since one cannot directly obtain the corresponding copula observations (Ut 1, . . . , Ut N ) = (P1(Xt 1 ), . . . , PN (Xt N )), where t = 1, . . . , T, without knowing a priori (P1, . . . , PN ), one can instead Definition (The Empirical Copula Transform) estimate the N empirical margins PT i (x) = 1 T T t=1 1(Xt i ≤ x), 1 ≤ i ≤ N, to obtain the T empirical observations ( ˜Ut 1, . . . , ˜Ut N ) = (PT 1 (Xt 1 ), . . . , PT N (Xt N )). Equivalently, since ˜Ut i = Rt i /T, Rt i being the rank of observation Xt i , the empirical copula transform can be considered as the normalized rank transform. In practice x_transform = rankdata(x)/len(x) Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 θ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|2 + (1 − θ) 1 2 R dPi dλ − dPj dλ 2 dλ (i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric, (iii) dθ is invariant under diffeomorphism Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Generic Non-Parametric Distance d2 0 : 1 2 R dPi dλ − dPj dλ 2 dλ = Hellinger2 d2 1 : 3E |Pi (Xi ) − Pj (Xj )|2 = 1 − ρS 2 = 2−6 1 0 1 0 C(u, v)dudv Remark: If f (x, θ) = cΦ(u1, . . . , uN; Σ) N i=1 fi (xi ; νi ) then ds2 = ds2 GaussCopula + N i=1 ds2 margins Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion The Hierarchical Block Model A model of nested partitions The nested partitions defined by the model can be seen on the distance matrix for a proper distance and the right permutation of the data points In practice, one observe and work with the above distance matrix which is identitical to the left one up to a permutation of the data Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Data from Hierarchical Block Model Adjusted Rand Index Algo. Distance Distrib Correl Correl+Distrib HC-AL (1 − ρ)/2 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 E[(X − Y )2 ] 0.00 ±0.00 0.09 ±0.12 0.55 ±0.05 GPR θ = 0 0.34 ±0.01 0.01 ±0.01 0.06 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.56 ±0.01 GPR θ = .5 0.34 ±0.01 0.59 ±0.12 0.57 ±0.01 GNPR θ = 0 1 0.00 ±0.00 0.17 ±0.00 GNPR θ = 1 0.00 ±0.00 1 0.57 ±0.00 GNPR θ = .5 0.99 ±0.01 0.25 ±0.20 0.95 ±0.08 AP (1 − ρ)/2 0.00 ±0.00 0.99 ±0.07 0.48 ±0.02 E[(X − Y )2 ] 0.14 ±0.03 0.94 ±0.02 0.59 ±0.00 GPR θ = 0 0.25 ±0.08 0.01 ±0.01 0.05 ±0.02 GPR θ = 1 0.00 ±0.01 0.99 ±0.01 0.48 ±0.02 GPR θ = .5 0.06 ±0.00 0.80 ±0.10 0.52 ±0.02 GNPR θ = 0 1 0.00 ±0.00 0.18 ±0.01 GNPR θ = 1 0.00 ±0.01 1 0.59 ±0.00 GNPR θ = .5 0.39 ±0.02 0.39 ±0.11 1 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Results: Application to Credit Default Swap Time Series Distance matrices computed on CDS time series exhibit a hierarchical block structure Marti, Very, Donnat, Nielsen IEEE ICMLA 2015 (un)Stability of clusters with L2 distance Stability of clusters with the proposed distance Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Definition (Consistency of a clustering algorithm) A clustering algorithm A is consistent with respect to the Hierarchical Block Model defining a set of nested partitions P if the probability that the algorithm A recovers all the partitions in P converges to 1 when T → ∞. Definition (Space-conserving algorithm) A space-conserving algorithm does not distort the space, i.e. the distance Dij between two clusters Ci and Cj is such that Dij ∈ min x∈Ci ,y∈Cj d(x, y), max x∈Ci ,y∈Cj d(x, y) . Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Consistency Theorem (Consistency of space-conserving algorithms (Andler, Marti, Nielsen, Donnat, 2015)) Space-conserving algorithms (e.g., Single, Average, Complete Linkage) are consistent with respect to the Hierarchical Block Model. T = 100 T = 1000 T = 10000 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion 1 Introduction 2 Geometry of Random Walk Time Series 3 The Hierarchical Block Model 4 Conclusion Gautier Marti, Frank Nielsen Clustering Random Walk Time Series Introduction Geometry of Random Walk Time Series The Hierarchical Block Model Conclusion Discussion and questions? Avenue for research: distances on (copula,margins) clustering using multivariate dependence information clustering using multi-wise dependence information Optimal Copula Transport for Clustering Multivariate Time Series, Marti, Nielsen, Donnat, 2015 Gautier Marti, Frank Nielsen Clustering Random Walk Time Series

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper highlights some more examples of maps that follow a recently introduced “symmetrization” structure behind the average consensus algorithm. We review among others some generalized consensus settings and coordinate descent optimization.
 
A common symmetrization framework for random linear algorithms

Operational viewpoint on consensus inspired by quantum consensus objective covers some more linear algorithms Limit on accelerating consensus algorithms with information-theoretic links Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Operational viewpoint on consensus inspired by quantum consensus objective covers some more linear algorithms Limit on accelerating consensus algorithms with information-theoretic links the announced talk Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS seems cool … in press at IEEE Trans. Automatic Control Operational viewpoint on consensus inspired by quantum consensus objective covers some more linear algorithms Limit on accelerating consensus algorithms with information-theoretic links Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Operational viewpoint on consensus inspired by quantum consensus objective covers some more linear algorithms “ Symmetrization “ L.Mazzarella, F.Ticozzi, A.S. arXiv:1311.3364 and arXiv:1303.4077 ! 
 
 
 
 Consensus: reaching agreement x1 = x2 = ... = xN 
 is the basis for many distributed computing tasks 
 Very flexible and robust convergence: as long as the network integrated over some finite T forms a 
 connected graph and α(t) ∈ [αm, αM] ⊂ (0,1)
 
 Classical consensus algorithm x1 x2 x3 x4 xN x... x... ! 
 
 
 
 ! ! 
 
 highest value can only decrease, lowest can only increase Convergence proof idea:
 shrinking convex hull xk(t+1) ( xj(t) – xk(t) ) + xk(t)α(t) Defining consensus in tensor product space? How define consensus w.r.t. correlations, entanglement,... ! How to write a consensus algorithm? Standard consensus: system states xk are directly accessible for 
 computation, can be linearly combined, copied, communicated... Quantum consensus: the whole quantum state / proba distribution 
 cannot be measured ➯ We must physically exchange “things”
 Our initial goal:
 Bringing consensus into quantum regime Consensus viewed as partial swapping Pairwise consensus interaction between agents ( j, k ): Consensus viewed as partial swapping Pairwise consensus interaction between agents ( j, k ): swap j with k stay in place Such mixture of two unitary operations: stay in place and swap
 can be easily implemented physically in quantum systems or, for that matter, in other information structures Linear action a(g,x) of G on X ! ! 
 Target : symmetrization reach a state x ∈ X where a(g,x) = x for all g ∈ G Consensus operation as discrete group action (finite) group vector space with objects “of interest” − − − Property: the projection on the symmetrization set can be written 
 Linear action a(g,x) of G on X ! ! 
 Dynamics : 
 with the defining a convex combination over G at each t Usually sg(t) ≠0 only for g belonging to a very restricted subset of G (finite) group vector space with objects “of interest” − − Consensus operation as discrete group action * The state x(t) at any time can be written as a convex combination ! ! 
 * The dynamics can then be lifted to the vector p(t) and written as ! ! ! Lift from actions to group … with p independent of x(0) ! ! 
 starting point pg(0) = δ(g,e) target pg = 1/|G| for all g … yields consensus on group weights − Possibly large number of nodes, e.g. |G| = N! for permutation group The exact values of sh(t), and even the selected interactions at each time step, need not be exactly controlled ➯ strong robustness ! 
 ! ! ! ! ! Proof: * possible by analogy with classical consensus * alternative: use entropy of p(t) as strict Lyapunov function Convergence to p holds if:– G = Permutations leads to random consensus by acting on classical state values (standard consensus) classical or quantum probability distributions G = cyclic group leads to random Fourier transform (use?) G = decoupling group links to quantum Dynamical Decoupling G = operational gates gives uniform random gate generation Consensus with antagonistic interactions Consensus towards leader value Gradient descent and coordinate descent Various applications the announced talk Non-trivial weight assignment & convergence result Solves previously not covered cases to distinguish {xk}=0 or {xk} = -{xj} Consensus with antagonistic interactions G = permutation matrices with arbitrary sign ±1 on each entry Weights sg: Birkhoff decomposition on |ajk| as for standard consensus Then swap weights to non-positive permutation if ajk <0 Non-trivial weight assignment (iterative procedure, see paper) Operator conclusions about which components of x converge to zero
 (slightly more general than standard convergence to x=0) Consensus towards leader value G = permutation matrices with arbitrary sign ±1 on each entry also other algorithms with (ajk) substochastic Gradient & Coordinate descent Search for min of f(x) by computing Assume (sorry) f(x)= xT A x In the eigenbasis of A this becomes a (if stable) substochastic iteration.
 Not a big insight… extension: k cycle through coordinates k Weights * follow from reflection matrices around nonorthogonal directions 
 * sum to 1 but may be negative ➱ Study coordinate descent convergence via symmetric but possibly 
 negative transition matrix: ∃ clear tools e.g. in consensus Gradient & Coordinate descent G = permutation matrices with arbitrary sign ±1 on each entry Search for min of f(x) by computing Assume (sorry) f(x)= xT A x In the eigenbasis of A this becomes a (if stable) substochastic iteration.
 Not a big insight… extension: k cycle through coordinates k Operational viewpoint on consensus inspired by quantum consensus objective covers some more linear algorithms Limit on accelerating consensus algorithms with information-theoretic links Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS arXiv:1412.0402 Add one memory, no more [Muthukrishnan et al, 98] + k k memory Properly using one memory x(t-1)-x(t) allows to converge 
 quadratically faster What about more memories? Add one memory, no more [Muthukrishnan et al, 98] + k k memory Properly using one memory x(t-1)-x(t) allows to converge exponentially 
 quadratically faster What about more memories? Our result: if graph eigenvalues can be any in [a,b] with a,b known then more memories do not improve worst consensus eigenvalue proof: not very information-theoretic, see arXiv:1412.0402 Optimization: ! Nesterov method not further improvable by m(t-2),… ? 
 Robust control: design plant to be stable under feedback u = -k y , k in interval ! Communication theory: Interesting links = network Optimization: ! Nesterov method not further improvable by m(t-2),… ? 
 Robust control: design plant to be stable under feedback u = -k y , k in interval ! Communication theory: Interesting links = improves by taking direct
 feedback to itself into account network Optimization: ! Nesterov method not further improvable by m(t-2),… ? 
 Robust control: design plant to be stable under feedback u = -k y , k in interval ! Communication theory: Interesting links = network if network poorly known, no
 benefit to account for longer loops

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodness-of-fit-testing in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding information-geometric 3D computer-graphical selection procedure. Some sample-size asymptotics is given as well.
 
New model search for nonlinear recursive models, regressions and autoregressions

New model search for nonlinear recursive models, regressions and autoregressions Wolfgang Stummer and Anna-Lena Kißlinger FAU University of Erlangen-Nürnberg Talk at GSI 2015, Palaiseau, 29/10/2015 Outline Outline • introduce a new method for model search (model preselection, structure detection) in data streams/clouds: key technical tool: density-based probability distances/divergences with “scaling” • gives much flexibility for interdisciplinary situation-based applications (also with cost functions, utility, etc.) • goal-specific handling of outliers and inliers (dampening, amplification) not directly covered today • give new general parameter-free asymptotic distributions for involved data-derived distances/divergences • outline a corresponding information-geometric 3D computer-graphical selection procedure 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 3 WHY distances between (non-)probability measures (1) • “distances” D(P, Q) between two (non-)probability measures P, Q play a prominent role in modern statistical inferences: • parameter estimation, • testing for goodness-of-fit resp. homogenity resp. independence, • clustering, • change-point detection, • Bayesian decision procedures as well as for other research fields such as • information theory, • signal processing including image and speech processing, • pattern recognition, • feature extraction, • machine learning, • econometrics, and • statistical physics. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 4 WHY distances between (non-)probability measures (2) • suppose we want to describe the proximity/distance/closeness/similarity D(P, Q) of two (non-)probability distributions P and Q • either two “theoretical” distributions e.g. P = N(µ1, σ2 1), Q = N(µ2, σ2 2) • or two (empirical) distributions representing data (e.g. derived from frequencies, histograms, . . . ) • or one of each −→ today • P, Q may live on Rd , or on “spaces of functions with appropriate properties”: e.g. potential future scenarios of a time series, or a cont.-time stochastic process e.g. functional data • exemplary statistical uses of distances D(P, Q) −→ 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 5 WHY distances between probability measures (3) Applic. 1: plane = all probability distributions (on R, Rd , a path space, . . . ) we have a “distance” on this, say D(P, Q) e.g. P := P orig N := Pemp N := 1 N · N i=1 δXi [·] . . . empirical distribution of an iid sample X1, . . . , XN of size N from Qθtrue ; puts equal “weight” 1 N on each data point. θ = minimum distance estimator (e.g. θ = MLE for D(Pemp N , Qθ) = Kullback-Leib.) however, D(Pemp N , Qθ) may still be large −→ “bad goodness of fit” −→ test 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 6 Time Series and Nonlinear Regressions (1) in time series, the data (describing random var.) . . . , X1, X2, . . . are non-iid: e.g. autoregressive model AR(2) of order 2: Xm+1 − ψ1 · Xm − ψ2 · Xm−1 = εm+1, m ≥ k, where (εm+1)m≥k is a family of independent and identically distributed (i.i.d.) random variables on some space Y having parametric distribution Qθ (θ ∈ Θ). compact notation: take the parameter vector £ := (2, ψ1, ψ2), the backshift operator B defined by B Xm := Xm−1, the 2−polynomial ψ1 · B + ψ2 · B2 , the identity operator 1 given by 1Xm := Xm −→ left-hand side becomes F£ Xm+1, Xm, Xm−1, . . . , Xk = 1 − 2 j=1 ψjBj Xm+1 −→ as data-derived distribution we take the empirical distribution of left-hand side P orig N,£ [ · ] := P[ · ; Xk−1, . . . , Xk+N; £] := 1 N · N i=1 δ F£ Xk+i,Xk+i−1,...,Xk [·] with histogram-according probability mass function (relative frequencies) p £ N(y) = # i ∈ {1, . . . , N} : F£ Xk+i, . . . , Xk = y N = # i : Xk+i − γ1 · Xk+i−1 − γ2 · Xk+i−2 = y N 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 7 Time Series and Nonlinear Regressions (2) −→ 2 issues: which time series models Xi and which distances D(·, ·) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 8 Time Series and Nonlinear Regressions (3) more general: nonlinear autorecursions in the sense of F£m+1 m+1, Xm+1, Xm, Xm−1, . . . , Xk, Zk−, am+1, am, am−1, . . . , ak = εm+1, m ≥ k, • where (F£m+1 )m≥k is a sequence of nonlinear functions parametrized by £m+1 ∈ Γ, • (εm+1)m≥k are iid with parametric distribution Qθ (θ ∈ Θ), • (ak)m≥k are independent variables which are non-stochastic (deterministic) today, • the “backlog-input” Zk− denotes the additional input on X and a before k to get the recursion started. today, we assume k = −∞, and EQθ [εm+1] = 0, and that the initial data Xk as well as the backlog-input Zk− are deterministic. Special case: Xm+1 = g f£m+1 (m+1, Xm, Xm−1, . . . , Xk, Zk−, am+1, am, am−1, . . . , ak), εm+1 for some appropriate functions f£m+1 and g, e.g. g(u, v) := u + v, g(u, v) := u · v −→ (εm+1)m≥k can be interpreted as “randomness-driving innovations (noise)” 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 9 Time Series and Nonlinear Regressions (4) our general context covers in particular • NARX models = nonlinear autoregressive models with exogenous input: is the above special case with constant parameter vector £m+1 ≡ £ and additive g. Especially: • nonlinear regressions with deterministic independent variables: the only involved X is Xm+1 • AR(r) = linear autoregressive models (time series) of order r ∈ N (recall the above example with r = 2) • ARIMA(r,d,0) = linear autoregressive integrated models (time series) of order r ∈ N0 and d ∈ N0 • SARIMA(r,d,0)(R,D,0)s = linear seasonal autoregressive integrated models (time series) of order d ∈ N0 of non-seasonal differencing, order r ∈ N0 of the non-seasonal AR-part, length s ∈ N0 of a season, order D ∈ N0 of seasonal differencing and order R ∈ N0 of the seasonal AR-part. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 10 Divergences / similarity measures (1) • so far: motiviations for “WHY to measure the proximity/distance/closeness/similarity D(P, Q)” here: P = P orig N,£ [ · ] (= empirical distribution of iid noises) Q = Qθ ( = candidate for true distribution of iid noises) • now: “HOW to measure”, which “distance” D(P, Q) to use ? • prominent examples for D(P, Q): relative entropy (Kullback-Leibler information discrimination) –> MDE = MLE !!, Hellinger distance, Pearson’s Chi-Square divergence, Csiszar’s f−divergences ... −→ all will be covered by our much more general context • DESIRE: to have a toolbox {Dφ,M(P, Q) : φ ∈ Φ, M ∈ M} which is far-reaching and flexible (reflected by different choices of the “generator” φ and the scaling measure M) should also cover robustness issues !! 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 11 Divergences / similarity measures (2) • from now on: probability distributions P, Q on (X, A) non-probability distribution/(σ−)finite measure M on (X, A) we assume that all three of them have densities w.r.t. a σ−finite measure λ p(x) = dP dλ (x), q(x) = dQ dλ (x) and m(x) = dM dλ (x) for a. all x ∈ X (for today: mostly X ⊂ R) • furthermore we take a “divergence (distance) generating function” φ : (0, ∞) → R which (for today) is twice differentiable, strictly convex without loss of generality we also assume φ(1) = 0 the limit φ(0) := limt↓0 φ(t) always exists (but may be ∞) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 12 Scaled Bregman Divergences (1) Definition (Stu. 07, extended in Stu. & Vajda 2012 IEEE Trans. Inf. Th.) The Bregman divergence (distance) of probability distributions P, Q scaled by the (σ−)finite measure M on (X, A) is defined by Bφ (P, Q | M) := X m(x) φ p(x) m(x) − φ q(x) m(x) − φ q(x) m(x) · p(x) m(x) − q(x) m(x) dλ(x) • if X = {x1, x2, . . . xs} where s may be infinite, and “λ is a counting measure” −→ p(·), q(·), m(·) are classical probability mass functions (“counting densities”): Bφ (P, Q | M) = s i=1 m(xi) φ p(xi) m(xi) − φ q(xi) m(xi) − φ q(xi) m(xi) · p(xi) m(xi) − q(xi) m(xi) e.g. φ(t) = (t − 1)2 −→ Bφ (P, Q | M) = s i=1 (p(xi)−q(xi))2 m(xi) weighted Pearson χ2 Ex.: P := Pemp N := 1 N · N i=1 δεi [·] . . . empirical distribution of an iid sample of size N from Qθtrue ; corresponding pmf = relative frequency p(x) := pemp N (x) := 1 N · #{j ∈ {1, . . . , N} : εj = x}; Q := Qθ where the “hypothetical candidate distribution” Qθ has pmf q(x) := qθ(x) M := W(Pemp N , Qθ) with pmf m(x) = w(pemp N (x), qθ(x)) > 0 for some funct. w(·, ·) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 13 discrete case with φ(t) = φα(t) and m(x) = wβ(p(x), q(x)) 3D presentation; exemplary goal: ≈ 0 for all α, β 10 3D presentation; exemplary goal: ≈ 0 for all α, β 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 14 Bφ (P, Q | M) with composite scalings M = W(P, Q) (1) • from now on: M = W(P, Q), i.e. m(x) = w(p(x), q(x)) for some function w(·, ·) • w(u, v) = 1 −→ unscaled/classical Bregman distance (discr.: Pardo/Vajda 97,03) e.g. for generator φ1(t) = t log t + 1 − t −→ Kullback-Leibler divergence (MLE) e.g. for the power functions φα(t) := tα−1+α−α·t α(α−1) , α = 0, 1, −→ density power divergences of Basu et al. 98, Basu et al. 2013/14/15 • new example (Kißlinger/Stu. (2015c): scaling by weighted r-th-power means: wβ,r (u, v) := (β · ur + (1 − β) · vr )1/r , β ∈ [0, 1], r ∈ R\{0} • e.g. r = 1: arithmetic-mean-scaling (mixture scaling) subcase β = 0: w0,1(u, v) = v −→ all Csiszar φ−divergences/disparities for φ2(t) one gets Pearson’s chi-square divergence subcase β = 1 and φ2(t) −→ Neyman’s chi-square divergence subcase β ∈ [0, 1] and φ2(t) −→ blended weight chi-square divergence, Lindsay 94 subcase β ∈ [0, 1] and φα(t) −→ Stu./Vajda (2012), Kißlinger/Stu. (2013, 2015a) • e.g. r = 1/2: wβ,1/2(u, v) = (β · √ u + (1 − β) · √ v)2 subcase β ∈ [0, 1] and φ2(t) −→ blended weight Hellinger distance: Lindsay (1994), Basu/Lindsay (1994) • e.g. r → 0: geometric-mean scaling wβ,0(u, v) = uβ · v1−β Kißlinger/Stu. (2015b) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 15 Some scale connectors w(u, v) (for any generator φ) (1) (a) w0,1(u, v) = v Csiszar diverg. (b) w0.45,1(u, v) = 0.45 · u + 0.55 · v . (c) w0.45,0.5(u, v) = (0.45 √ u + 0.55 √ v)2 (d) w0.45,0(u, v) = u0.45 · v0.55 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 16 Scale connectors w(u, v), NOT r−th power means (e) WEXPM: w0.45,˜f6 (u, v) = 1 6 log 0.45e6u + 0.55e6v (g) wmed 0.45 (u, v) = med{min{u, v}, 0.45, max{u, v}} . (j) wsmooth adj (u, v) with hin = −0.5 , hout = 0.3, δ = 10−7 , etc. (k) Parameter description for wadj (u, v) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 17 Robustness to obtain the robustness against outliers and inliers (i.e. high unusualnesses in data, surprising observations), as well as the (asymptotic) efficiency of our procedure is a question of a good choice of the scale connector w(·, ·) −→ another long paper Kiss. and Stu. 2015b −→ another talk we end up with a new transparent, far-reaching 3D computer-graphical “geometric” method called density-pair adjustment function this is vaguely a similar task to choosing a good copula in (inter-)dependence-modelling frameworks 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 18 Universal model search UMSPD (1) recall: which time series model Xi and which distance D(·, ·) now: model search in detail; basic idea (for finite discrete distributions): under the correct (“true”) model F£0 m+1 , Qθ0 we get that the sequence Fγ0 k+i (k + i, Xk+i, Xk+i−1, ..., Xk, Zk−, ak+i, ..., ak) i=1...N behaves like a size-N-sample from an iid sequence under the distribution Qθ0 , i.e. P£0 N [·] := 1 N N i=1 δF£0 k+i (k+i,Xk+i,Xk+i−1,...,Xk ,Zk−,ak+i,...,ak )[·] N→∞ −−−→ Qθ0 [·] and thus Dα,β P£0 N , Qθ0 N→∞ −−−→ 0 for a very broad family D := Dα,β(·, ·) : α ∈ [α, α] , β ∈ β, β of distances, where we use the SBDs Dα,β(P£0 N , Qθ) := Bφα P£0 N , Qθ0 || Wβ(Pemp N , Qθ0 ) for a α−family of generators φα(·) (today: the above power functions) and a β−family of scale connectors Wβ(·, ·) (today: geometric-mean scaling wβ,0(u, v) = uβ · v1−β ) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 19 Universal model search UMSPD (2) We introduce the universal model-search by probability distance (UMSPD): 1. choose F£m+1 m≥k from a principal parametric-function-family class 2. choose some prefixed class of parametric candidate distributions {Qθ : θ ∈ Θ} 3. find a parameter sequence £ := (£m+1)m≥k (often constant) and a θ ∈ Θ such that Dα,β P£ N, Qθ ≈ 0 for large enough sample size N and all (α, β) ∈ [α, α] × β, β 4. preselect the model F£m+1 , Qθ if the “3D score surface” (the “mountains”) S := {(α, β, Dα,β(P£ N, Qθ)) : α ∈ [α, α] , β ∈ β, β } is smaller than some appropriatly chosen threshold T (namely, a chisquare-quantile, see below) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 20 Universal model search UMSPD (3) Graphical implementation by plotting the 3D preselection-score surface S 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 21 Universal model search UMSPD (4) ADVANTAGE OF UMSPD: after the preselection process one can continue to work with the same Dα,β(·, ·) in order to perform amongst all preselected candidate models a statistically sound inference in terms of simultaneous exact parameter-estimation and goodness-of-fit. one issue remains to be discussed for UMSPD: the choice of the threshold T 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 22 Universal model search UMSPD (5) exemplarily show how to quantify the above-mentioned preselection criterion “the 3D surface S should be smaller than a threshold T” by some sound asymptotic analysis for the above special choices φα(·) and wβ(·, ·) the cornerstone is the following limit theorem Theorem Let Qθ0 be a finite discrete distribution with c := |Y| ≥ 2 possible outcomes and strictly positive densities qθ0 (y) > 0 for all y ∈ Y. Then for each α > 0, α = 1 and each β ∈ [0, 1[ the random scaled Bregman power distance 2N · Bφα P£0 N , Qθ0 | (P£0 N )β · Q1−β θ0 =: 2N · B(α, β; £0, θ0; N) is asymptotically chi-squared distributed in the sense that 2N · B(α, β; £0, θ0; N) L −−−→ N→∞ χ2 c−1 . in terms of the corresponding χ2 c−1−quantiles, one can derive the threshold T which the 3D preselection-score surface S has to (partially) exceed in order to believe with appropriate level of confidence that the investigated model ((F£m+1 )m≥k, Qθ) is not good enough to be preselected. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 23 Further Topics • can use scaled Bregman divergences for robust statistical inferences with “completely general asymptotic results” for other choices of φ(·) and w(·, ·) −→ Kißlinger & Stu. (2015b) • can use scaled Bregman divergences for change detection in data streams −→ Kißlinger & Stu. (2015c) • explicit formulae for Bφα(Pθ1 , Pθ2 |Pθ0 ) where Pθ1 , Pθ2 , Pθ0 stem from the same arbitrary exponential family, cf. Stu. & Vajda (2012), Kißlinger & Stu. (2013); including stochastic processes (Levy processes) • we can do Bayesian decision making with important processes • non-stationary stochastic differential equations • e.g. non-stationary branching processes −→ Kammerer & Stu. (2010) • e.g. inhomogeneous binomial diffusion approximations −→ Stu. & Lao (2012) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 24 Summary • introduced a new method for model search (model preselection, structure detection) in data streams/clouds: key technical tool: density-based probability distances/divergences with “scaling” • gives much flexibility for interdisciplinary situation-based applications (also with cost functions, utility, etc.) • gave a new parameter-free asymptotic distribution result for involved data-derived distances/divergences • outlined a corresponding information-geometric 3D computer-graphical selection procedure 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 25 Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B-28,131-140 (1966) Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559 (1998) Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011) Billings, S.A.: Nonlinear System Identification. Wiley, Chichester (2013) Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A-8, 85–108 (1963) Kißlinger, A.-L., Stummer, W.: Some Decision Procedures Based on Scaled Bregman Distance Surfaces. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2013, LNCS 8085, pp. 479–486. Springer, Berlin (2013) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 26 Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models, regressions and autoregressions. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2015, LNCS 9389, Springer, Berlin (2015a) Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman divergences. Preprint (2015b). Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of Change Detection. Preprint (2015c). Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987) Nock, R., Piro, P., Nielsen, F., Ali, W.B.H., Barlaud, M.: Boosting k−NN for categorization of natural sciences. Int J. Comput. Vis. 100, 294 – 314 (2012) Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall, Boca Raton (2006) Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Transaction on Information Theory 49(7), 1860 – 1868 (2003) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 27 Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988) Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503 – 1050504 (2007) Stummer, W., Vajda, I.: On Bregman Distances and Divergences of Probability Measures. IEEE Transaction on Information Theory 58 (3), 1277–1288 (2012) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 28

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In the context of sensor networks, gossip algorithms are a popular, well established technique, for achieving consensus when sensor data are encoded in linear spaces. Gossip algorithms also have several extensions to non linear data spaces. Most of these extensions deal with Riemannian manifolds and use Riemannian gradient descent. This paper, instead, studies gossip in a broader CAT(k) metric setting, encompassing, but not restricted to, several interesting cases of Riemannian manifolds. As it turns out, convergence can be guaranteed as soon as the data lie in a small enough ball of a mere CAT(k) metric space. We also study convergence speed in this setting and establish linear rates of convergence.
 
Random Pairwise Gossip on CAT(k) Metric Spaces

Gossip in CAT(κ) metric spaces Anass Bellachehab J´er´emie Jakubowicz T´el´ecom SudParis, Institut Mines-T´el´ecom & CNRS UMR 5157 GSI 2015 Palaiseau October 28 1 / 21 Problem We consider a network of N agents such that: The network is represented by a connected, undirected graph G = (V , E), where V = {1, . . . , N} stands for the set of agents and E denotes the set of available communication links between agents. At any given time t an agent v stores stores data represented as an element xv (t) of a data space M. Xt = (x1(t), . . . , xN(t)) is the tuple of data values of the whole network at instant t. 2 / 21 Problem (cont’d) Each agent has its own Poisson clock that ticks with a common intensity λ (the clocks are identically made) independently of other clocks. When an agent clock ticks, the agent is able to perform some computations and wake up some neighboring agents. The goal is to take the system from an initial state X(0) to a consensus state; meaning a state of the form X∞ = (x∞, . . . , x∞) with: x∞ ∈ M. 3 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x0 =     −1 −1 1 −1 −2 1 1 2     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x0 =     −1 −1 1 −1 −2 1 1 2     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x1 =     0 −1 0 −1 −2 1 1 2     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x1 =     0 −1 0 −1 −2 1 1 2     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x1 =     0.5 0.5 0 −1 −2 1 0.5 0.5     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x1 =     0.5 0.5 0 −1 −2 1 0.5 0.5     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x2 =     0.5 0.5 −1 0 −1 0 0.5 0.5     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x∞ =     −0.25 0.25 −0.25 0.25 −0.25 0.25 −0.25 0.25     4 / 21 Random Pairwise Gossip (Xiao & Boyd’04) x∞ =     −0.25 0.25 −0.25 0.25 −0.25 0.25 −0.25 0.25     xn = I − 1 2 (δin − δjn )(δin − δjn )T xn−1 4 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 A natural extension in a metric setting 5 / 21 Outline 1. Motivation 2. State of the art 3. CAT(κ) spaces 4. Previous result for κ = 0 5. Why the κ > 0 case is more complex 6. Our result 6 / 21 Motivation In its Euclidean setting, Random Pairwise Midpoint cannot address several useful type of data: Sphere positions (Sphere) Line orientations (Projective space) Solid orientations (Rotations) Subspaces (Grassmanians) Phylogenetic Trees (Metric space) Cayley graphs (Metric space) Reconfigurable systems (Metric space) 7 / 21 State of the art Consensus optimization on manifolds : [Sarlette-Sepulchre’08],[Tron et al.’12],[Bonnabel’13] Synchronization on the circle : [Sarlette et al.’08] Synchronization on SO(3) : [Tron et al.’12] Our previous work: Distibuted pairwise gossip on CAT(0) spaces Caveat: In this work, we deal the problem of synchonization, i.e. attaining a consensus, whatever its value; contrarily to the Euclidean case where it is known that random pairwise midpoints converges to ¯x0. 8 / 21 CAT(κ) spaces Model spaces Consider a model surface Mκ with constant sectional curvature κ: κ < 0 corresponds to a hyperbolic space κ = 0 corresponds to a Euclidean space κ > 0 corresponds to a sphere Geodesics Assume M is a metric space equipped with metric d. A map γ : [0, l] → M such that: ∀0 ≤ t, t ≤ l, d γ(t), γ(t ) = |t − t | is called a geodesic in M; a = γ(0) and b = γ(l) are its endpoints. If there exists one and only one geodesic linking a to b, it is denoted [a, b]. 9 / 21 CAT(κ) spaces (cont’d) Triangles A triple of geodesics γ, γ and γ with respective endpoints a, b and c is called a triangle and is denoted (γ, γ , γ ) or (a, b, c) when there is no ambiguity. Comparison triangles When κ ≤ 0, given a triangle (γ, γ , γ ), there always exist a triangle (aκ, bκ, cκ) in Mκ such that d(a, b) = d(aκ, bκ), d(b, c) = d(bκ, cκ) and d(c, a) = d(cκ, aκ) with a = γ(0), b = γ (0) and c = γ (0). b a c l l l bκ aκ cκ l l l 10 / 21 CAT(κ) spaces (cont’d) CAT(κ) inequality A triangle (γ, γ , γ ) in a metric space M satisfies the CAT(κ) inequality if for any x ∈ [a, b] and y ∈ [a, c] one has: d(x, y) ≤ d(xκ, yκ) where xκ ∈ [aκ, bκ] is such that d(aκ, xκ) = d(a, x) and yκ ∈ [aκ, cκ] is such that d(aκ, yκ) = d(a, y). b a c x y d d ≤ dκ bκ aκ cκ xκ yκdκ A metric space is said CAT(κ) if every pair of points can be joined by a geodesic and every triangle with perimeter less than 2Dκ = 2π√ κ satisfy the CAT(κ) inequality. 11 / 21 Formal setting Assumptions 1. Time is discrete t = 0, 1, . . . 2. At each time each agent holds a “value” xt,v in a CAT(κ) metric space M 3. At each time t, an agent Vt randomly wakes up and wakes up a neighbor Wt, according to the probability distribution: P[{Vk, Wk} = {v, w}] = Pv,w > 0 if v ∼ w 0 otherwise Algorithm description xt,v = Midpoint(xt−1,Vt , xt−1,Wt ) if v ∈ {Vt, Wt} xt−1,v otherwise 12 / 21 Previous result The algorithm is sound Because geodesics exist and are unique in CAT(0) spaces. Convergence The algorithm converges to a consensus with probability 1, whatever the initial state x0. Rate of convergence Convergence occur at a linear rate: define σ2 (x) = v∼w d2 (xv , xw ) ; then, there exists a constant L < 0 such that Eσ2 (Xk) ≤ C0 exp(Lk) 13 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 What changes for the κ > 0 (the case of the sphere) 14 / 21 Our result Provided the diameter of the initial set of values is less than Dκ/2, The algorithm is sound Because geodesics exist and are unique using this restriction. Convergence The algorithm converges to a consensus with probability 1. Rate of convergence Convergence occur at a linear rate: define σ2 (x) = v∼w χκ (d(xv , xw )) ; with: χκ(x) = 1 − cos( √ κx) then, there exists a constant L ∈ (−1, 0) such that: Eσ2 (Xk) ≤ C0 exp(Lk) 15 / 21 Before iteration xt−1,u • xt−1,Vt • xt−1,Wt• 16 / 21 After iteration xt,u • xt,Vt xt,Wt • • • 16 / 21 Net balance xt−1,u • xt−1,Vt • xt−1,Wt•xt,Vt xt,Wt • 16 / 21 Sketch of proof (Net balance) Let us look at the increments: N(σ2 κ(Xt) − σ2 κ(Xt−1)) = −χκ(d(XVt (t − 1), XWt (t − 1))) + u∈V u=Vt ,u=Wt Tκ(Vt, Wt, u) with: Tκ(Vt, Wt, u) = 2χκ(d(Xu(t), Mt)) − χκ(d(Xu(t), XVt (t − 1))) −χκ(d(Xu(t), XWt (t − 1))) Using the inequality: χκ d p + q 2 , r ≤ χκ(d(p, r)) + χκ(d(q, r)) 2 17 / 21 Sketch of proof (Two propositions) We can prove the a first propostion: E[σ2 κ(Xk+1) − σ2 κ(Xk)] ≤ − 1 N E∆κ(Xk) with: ∆κ(x) = 1 2N v∼w {v,w}∈E Pv,w χκ(d(xv , xw )) Using graph connectedeness we prove a second proposition: Assume G = (V , E) is an undirected connected graph, there exists a constant CG ≥ 1 depending on the graph only such that: ∀x ∈ MN , 1 2 ∆κ(x) ≤ σ2 κ(x) ≤ CG ∆κ(x) 18 / 21 Sketch of proof (cont’d) The following lemma Assume an is a sequence of nonnegative numbers such that an+1 − an ≤ −βan with β ∈ (0, 1). Then, ∀n ≥ 0, an ≤ a0 exp(−βn) Combined with the two propositions, gives the desired result. Eσ2 (Xk) ≤ exp(Lk) 19 / 21 Simulation results Sphere 20 / 21 Simulation results Rotations 20 / 21 Summary We have proved that, when the data belong to complete CAT(κ) metric space, provided the initial values are close enough, the same algorithm makes sense and also converge linearly. We have checked that our results are consistent with simulations. 21 / 21

Optimal Transport (chaired by Jean-François Marcotorchino, Alfred Galichon)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this paper we relate the Equilibrium Assignment Problem (EAP), which is underlying in several economics models, to a system of nonlinear equations that we call the “nonlinear Bernstein-Schrödinger system”, which is well-known in the linear case, but whose nonlinear extension does not seem to have been studied. We apply this connection to derive an existence result for the EAP, and an efficient computational method.
 
The nonlinear Bernstein-Schrodinger equation in Economics

TOPICS IN EQUILIBRIUM TRANSPORTATION Alfred Galichon (NYU and Sciences Po) GSI, Ecole polytechnique, October 29, 2015 GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 1/ 22 THIS TALK This talk is based on the following two papers: AG, Scott Kominers and Simon Weber (2015a). Costly Concessions: An Empirical Framework for Matching with Imperfectly Transferable Utility. AG, Scott Kominers and Simon Weber (2015b). The Nonlinear Bernstein-Schr¨odinger Equation in Economics, GSI proceedings. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 2/ 22 THIS TALK Agenda: 1. Economic motivation 2. The mathematical problem 3. Computation 4. Estimation GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 3/ 22 THIS TALK Agenda: 1. Economic motivation 2. The mathematical problem 3. Computation 4. Estimation GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 3/ 22 THIS TALK Agenda: 1. Economic motivation 2. The mathematical problem 3. Computation 4. Estimation GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 3/ 22 THIS TALK Agenda: 1. Economic motivation 2. The mathematical problem 3. Computation 4. Estimation GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 3/ 22 Section 1 ECONOMIC MOTIVATION GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 4/ 22 MOTIVATION: A MODEL OF LABOUR MARKET Consider a very simple model of labour market. Assume that a population of workers is characterized by their type x ∈ X , where X = Rd for simplicity. There is a distribution P over the workers, which is assumed to sum to one. A population of firms is characterized by their types y ∈ Y (say Y = Rd ), and their distribution Q. It is assumed that there is the same total mass of workers and firms, so Q sums to one. Each worker must work for one firm; each firm must hire one worker. Let π (x, y) be the probability of observing a matched (x, y) pair. π should have marginal P and Q, which is denoted π ∈ M (P, Q) . GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 5/ 22 OPTIMALITY In the simplest case, the utility of a worker x working for a firm y at wage w (x, y) will be α (x, y) + w (x, y) while the corresponding profit of firm y is γ (x, y) − w (x, y) . In this case, the total surplus generated by a pair (x, y) is α (x, y) + w + γ (x, y) − w = α (x, y) + γ (x, y) =: Φ (x, y) which does not depend on w (no transfer frictions). A central planner may thus like to choose assignment π ∈ M (P, Q) so to max π∈M(P,Q) Φ (x, y) dπ (x, y) . But why would this be the equilibrium solution? GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 6/ 22 EQUILIBRIUM The equilibrium assignment is determined by an important quantity: the wages. Let w (x, y) be the wage of employee x working for firm of type y. Let the indirect surpluses of worker x and firm y be respectively u (x) = max y {α (x, y) + w (x, y)} v (y) = max x {γ (x, y) − w (x, y)} so that (π, w) is an equilibrium when u (x) ≥ α (x, y) + w (x, y) with equality if (x, y) ∈ Supp (π) v (y) ≥ γ (x, y) − w (x, y) with equality if (x, y) ∈ Supp (π) By summation, u (x) + v (y) ≥ Φ (x, y) with equality if (x, y) ∈ Supp (π) . GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 7/ 22 THE MONGE-KANTOROVICH THEOREM OF OPTIMAL TRANSPORTATION One can show that the equilibrium outcome (π, u, v) is such that π is solution to the primal Monge-Kantorovich Optimal Transportation problem max π∈M(P,Q) Φ (x, y) dπ (x, y) and (u, v) is solution to the dual OT problem min u,v u (x) dP (x) + v (y) dQ (y) s.t. u (x) + v (y) ≥ Φ (x, y) Feasibility+Complementary slackness yield the desired equilibrium conditions π ∈ M (P, Q) u (x) + v (y) ≥ Φ (x, y) (x, y) ∈ Supp (π) =⇒ u (x) + v (y) = Φ (x, y) “Second welfare theorem”, “invisible hand”, etc. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 8/ 22 EQUILIBRIUM VS. OPTIMALITY Is equilibrium always the solution to an optimization problem? It is not. This is why this talk is about “Equilibrium Transportation,” which contains, but is strictly more general than “Optimal Transportation”. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 9/ 22 EQUILIBRIUM VS. OPTIMALITY Is equilibrium always the solution to an optimization problem? It is not. This is why this talk is about “Equilibrium Transportation,” which contains, but is strictly more general than “Optimal Transportation”. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 9/ 22 IMPERFECTLY TRANSFERABLE UTILITY Consider the same setting as above, but instead of assuming that workers’ and firm’s payoffs are linear in surplus, assume u (x) = max y {Uxy (w (x, y))} v (y) = max x {Vxy (w (x, y))} where Uxy (w) is nondecreasing and continuous, and Vxy (w) is nonincreasing and continuous. Motivation: taxes, decreasing marginal returns, risk aversion, etc. Of course, Optimal Transportation case is recovered when Uxy (w) = αxy + w Vxy (w) = γxy − w. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 10/ 22 IMPERFECTLY TRANSFERABLE UTILITY For (u, v) ∈ R2, let Ψxy (u, v) = min {t ∈ R : ∃w, u − t ≤ Uxy (w) and v − t ≤ Vxy (w)} so that Ψ is nondecreasing in both variables and (u, v) = (Uxy (w) , Vxy (w)) for some w if and only if Ψxy (u, v) = 0. Optimal Transportation case is recovered when Ψxy (u, v) = (u + v − Φxy ) /2. As before, (π, w) is an equilibrium when u (x) ≥ Uxy (w (x, y)) with equality if (x, y) ∈ Supp (π) v (y) ≥ Vxy (w (x, y)) with equality if (x, y) ∈ Supp (π) We have therefore that (π, u, v) is an equilibrium when Ψxy (u (x) , v (y)) ≥ 0 with equality if (x, y) ∈ Supp (π) . GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 11/ 22 Section 2 THE MATHEMATICAL PROBLEM GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 12/ 22 EQUILIBRIUM TRANSPORTATION: DEFINITION We have therefore that (π, u, v) is an equilibrium outcome when    π ∈ M (P, Q) Ψxy (u (x) , v (y)) ≥ 0 (x, y) ∈ Supp (π) =⇒ Ψxy (u (x) , v (y)) = 0 . Problem: existence of an equilibrium outcome? This paper: yes in the discrete case (X and Y finite), via entropic regularization. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 13/ 22 REMARK 1: LINK WITH GALOIS CONNECTIONS As soon as Ψxy is strictly increasing in both variables, Ψxy (u, v) = 0 expresses as u = Gxy (v) and v = G−1 xy (u) where the generating functions Gxy and G−1 xy are decreasing and continuous functions. In this case, relations u (x) = max y∈Y Gxy (v (y)) and v (y) = max x∈X G−1 xy (u (x)) generalize the Legendre-Fenchel conjugacy. This pair of relations form a Galois connection; see Singer (1997) and Noeldeke and Samuelson (2015). GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 14/ 22 REMARK 2: TRUDINGER’S LOCAL THEORY OF PRESCRIBED JACOBIANS Assuming everything is smooth, and letting fP and fQ be the densities of P and Q we have under some conditions that the equilibrium transportation plan is given by y = T (x), where mass balance yields |det DT (x)| = f (x) g (T (x)) and optimality yieds ∂x G−1 xT(x) (u (x)) + ∂uG−1 xT(x) (u (x)) u (x) = 0 which thus inverts into T (x) = e (x, u (x) , u (x)) . Trudinger (2014) studies Monge-Ampere equations of the form |det De (., u, u)| = f g (e (., u, u)) . (more general than Optimal Transport where no dependence on u). GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 15/ 22 DISCRETE CASE Our work (GKW 2015a and b) focuses on the discrete case, when P and Q have finite support. Call px and qy the mass of x ∈ X and y ∈ Y respectively. In the discrete case, problem boils down to looking for (π, u, v) such that    πxy ≥ 0, ∑y πxy = px , ∑x πxy = qy Ψxy (ux , vy ) ≥ 0 πxy > 0 =⇒ Ψxy (ux , vy ) = 0 . GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 16/ 22 Section 3 COMPUTATION GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 17/ 22 ENTROPIC REGULARIZATION Take temperature parameter T > 0 and look for π under the form πxy = exp − Ψxy (ux , vy ) T Note that when T → 0, the limit of Ψxy (ux , vy ) is nonnegative, and the limit of πxy Ψxy (ux , vy ) is zero. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 18/ 22 THE NONLINEAR BERNSTEIN-SCHR ¨ODINGER EQUATION If πxy = exp (−Ψxy (ux , vy ) /T) , condition π ∈ M (P, Q) boils down to set of nonlinear equations in (u, v)    ∑y∈Y exp − Ψxy (ux ,vy ) T = px ∑x∈X exp − Ψxy (ux ,vy ) T = qy which we call the nonlinear Bernstein-Schr¨odinger equation. In the optimal transportation case, this becomes the classical B-S equation    ∑y∈Y exp Φxy −ux −vy 2T = px ∑x∈X exp Φxy −ux −vy 2T = qy GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 19/ 22 ALGORITHM Note that Fx : ux → ∑y∈Y exp − Ψxy (ux ,vy ) T is a decreasing and continuous function. Mild conditions on Ψ therefore ensure the existence of ux so that Fx (ux ) = px . Our algorithm is thus a nonlinear Jacobi algorithm: - Make an initial guess of v0 y - Determine the uk+1 x to fit the px margins, based on the vk y - Update the vk+1 y to fit the qy margins, based on the uk+1 x . - Repeat until vk+1 is close enough to vk. One can proof that if v0 y is high enough, then the vk y decrease to fixed point. Convergence is very fast in practice. GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 20/ 22 Section 4 STATISTICAL ESTIMATION GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 21/ 22 MAXIMUM LIKELIHOOD ESTIMATION In practice, one observes ˆπxy and would like to estimate Ψ. Assume that Ψ belongs to a parametric family Ψθ, so that πθ xy = exp −Ψθ xy uθ x , vθ y ∈ M (P, Q). The log-likelihood l (θ) associated to observation ˆπxy is l (θ) = ∑ xy ˆπxy log πθ xy = − ∑ xy ˆπxy Ψθ xy uθ x , vθ y and thus the maximum likelihood procedure consists in min θ ∑ xy ˆπxy Ψθ xy uθ x , vθ y . GALICHON EQUILIBRIUM TRANSPORTATION SLIDE 22/ 22

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This note presents a short review of the Schrödinger problem and of the first steps that might lead to interesting consequences in terms of geometry. We stress the analogies between this entropy minimization problem and the renowned optimal transport problem, in search for a theory of lower bounded curvature for metric spaces, including discrete graphs.
 
Some geometric consequences of the Schrödinger problem

. . . .. . . Some geometric aspects of the Schr¨odinger problem Christian L´eonard Universit´e Paris Ouest GSI’15 ´Ecole Polytechnique. October 28-30, 2015 Interpolations in P(X) X : Riemannian manifold (state space) P(X) : set of all probability measures on X µ0, µ1 ∈ P(X) interpolate between µ0 and µ1 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 0 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 1 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 0 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 0.25 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 0.5 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 0.75 Interpolations in P(X) Standard affine interpolation between µ0 and µ1 µaff t := (1 − t)µ0 + tµ1 ∈ P(X), 0 ≤ t ≤ 1 t = 1 Interpolations in P(X) . . . .. . . Affine interpolations require mass transference with infinite speed Interpolations in P(X) . . . .. . . Affine interpolations require mass transference with infinite speed Denial of the geometry of X We need interpolations built upon trans -portation, not tele -portation Interpolations in P(X) We seek interpolations of this type Interpolations in P(X) We seek interpolations of this type t = 0 Interpolations in P(X) We seek interpolations of this type t = 0.25 Interpolations in P(X) We seek interpolations of this type t = 0.5 Interpolations in P(X) We seek interpolations of this type t = 0.75 Interpolations in P(X) We seek interpolations of this type t = 1 Displacement interpolation µ0 µ1 Displacement interpolation x y µ0 µ1 y = T(x) Displacement interpolation µ0 µ1 geodesics Displacement interpolation µ0 µ1 geodesics Displacement interpolation Displacement interpolation x y γxy t Displacement interpolation Displacement interpolation Curvature geodesics and curvature are intimately linked several geodesics give information on the curvature Curvature geodesics and curvature are intimately linked several geodesics give information on the curvature δ(t) θ p . . . .. . . δ(t) = √ 2(1 − cos θ) t ( 1 − σp(S) cos2(θ/2) 6 t2 + O(t4 ) ) Displacement interpolation x y µ0 µ1 y = T(x) Displacement interpolation . Respect geometry .. . . .. . . we have already used geodesics how to choose y = T(x) such that interpolations encrypt curvature as best as possible? no shock Displacement interpolation . Respect geometry .. . . .. . . we have already used geodesics how to choose y = T(x) such that interpolations encrypt curvature as best as possible? no shock perform optimal transport . Monge’s problem .. . . .. . . ∫ X d2(x, T(x)) µ0(dx) → min; T : T#µ0 = µ1 d : Riemannian distance Lazy gas experiment t = 0 0 < t < 1 t = 1 Positive curvature Lazy gas experiment t = 0 0 < t < 1 t = 1 Negative curvature Curvature and displacement interpolations . Relative entropy .. . . .. . . H(p|r) := ∫ log(dp/dr) dp, p, r : probability measures . Convexity of the entropy along displacement interpolations .. . . .. . . The following assertions are equivalent Ric ≥ K along any [µ0, µ1]disp = (µt)0≤t≤1, d2 dt2 H(µt|vol) ≥ KW 2 2 (µ0, µ1) von Renesse-Sturm (04) W2 is the Wasserstein distance Curvature and displacement interpolations . Relative entropy .. . . .. . . H(p|r) := ∫ log(dp/dr) dp, p, r : probability measures . Convexity of the entropy along displacement interpolations .. . . .. . . The following assertions are equivalent Ric ≥ K along any [µ0, µ1]disp = (µt)0≤t≤1, d2 dt2 H(µt|vol) ≥ KW 2 2 (µ0, µ1) von Renesse-Sturm (04) W2 is the Wasserstein distance starting point of the Lott-Sturm-Villani theory Schr¨odinger’s thought experiment Consider a huge collection of non-interacting identical Brownian particles. Schr¨odinger’s thought experiment Consider a huge collection of non-interacting identical Brownian particles. If the density profile of the system at time t = 0 is approximately µ0 ∈ P(R3), you expect it to evolve along the heat flow: { νt = ν0et∆/2, 0 ≤ t ≤ 1 ν0 = µ0 where ∆ is the Laplace operator. Schr¨odinger’s thought experiment Consider a huge collection of non-interacting identical Brownian particles. If the density profile of the system at time t = 0 is approximately µ0 ∈ P(R3), you expect it to evolve along the heat flow: { νt = ν0et∆/2, 0 ≤ t ≤ 1 ν0 = µ0 where ∆ is the Laplace operator. Suppose that you observe the density profile of the system at time t = 1 to be approximately µ1 ∈ P(R3) with µ1 different from the expected ν1. Probability of this rare event ≃ exp(−CNAvogadro). Schr¨odinger’s thought experiment Consider a huge collection of non-interacting identical Brownian particles. If the density profile of the system at time t = 0 is approximately µ0 ∈ P(R3), you expect it to evolve along the heat flow: { νt = ν0et∆/2, 0 ≤ t ≤ 1 ν0 = µ0 where ∆ is the Laplace operator. Suppose that you observe the density profile of the system at time t = 1 to be approximately µ1 ∈ P(R3) with µ1 different from the expected ν1. Probability of this rare event ≃ exp(−CNAvogadro). . Schr¨odinger’s question (1931) .. . . .. . . Conditionally on this very rare event, what is the most likely path (µt)0≤t≤1 ∈ P(R3)[0,1] of the evolving profile of the particle system? Schr¨odinger problem X : compact Riemannian manifold Ω := {paths} ⊂ X[0,1] P ∈ P(Ω) and (Pt)0≤t≤1 ∈ P(X)[0,1] R ∈ P(Ω) : Wiener measure (Brownian motion) Schr¨odinger problem X : compact Riemannian manifold Ω := {paths} ⊂ X[0,1] P ∈ P(Ω) and (Pt)0≤t≤1 ∈ P(X)[0,1] R ∈ P(Ω) : Wiener measure (Brownian motion) . Schr¨odinger problem .. . . .. . . H(P|R) → min; P ∈ P(Ω) : P0 = µ0, P1 = µ1 (S) µ0, µ1 ∈ P(X) are the initial and final prescribed profiles Schr¨odinger problem X : compact Riemannian manifold Ω := {paths} ⊂ X[0,1] P ∈ P(Ω) and (Pt)0≤t≤1 ∈ P(X)[0,1] R ∈ P(Ω) : Wiener measure (Brownian motion) . Schr¨odinger problem .. . . .. . . H(P|R) → min; P ∈ P(Ω) : P0 = µ0, P1 = µ1 (S) µ0, µ1 ∈ P(X) are the initial and final prescribed profiles . Definition. R-entropic interpolation .. . . .. . . [µ0, µ1]R := (Pt)0≤t≤1 with P the unique solution of (S). It is the answer to Schr¨odinger’s question Lazy gas experiments Lazy gas experiment at zero temperature (Monge) Zero temperature Displacement interpolations Optimal transport Lazy gas experiment at positive temperature (Schr¨odinger) Positive temperature Entropic interpolations Minimal entropy Lazy gas experiments t = 0 0 < t < 1 t = 1 Negative curvature Zero temperature Lazy gas experiments t = 0 t = 1 Negative curvature Positive temperature Slowing down . . . .. . . To decrease temperature, slow down the particles of the heat bath Slowing down . . . .. . . To decrease temperature, slow down the particles of the heat bath . Slowed down reference measures .. . . .. . . (Wt)t≥0 : Brownian motion on the Riemannian manifold X R : law of (Wt)0≤t≤1 Rk : law of (Wt/k)0≤t≤1 k → ∞ Slowing down k = 1 : x y γxy Rxy t = 0 t = 1 Slowing down k = 1 : x y γxy Rxy t = 0 t = 1 k = 10 : x y Rk,xy Slowing down k = 1 : x y γxy Rxy t = 0 t = 1 k = 10 : x y Rk,xy k = ∞ : x y γxy Slowing down N → ∞, k = 1 : the whole particle system performs a rare event to travel from µ0 to µ1 cooperative behavior Gibbs conditioning principle (thermodynamical limit: N → ∞) Slowing down N → ∞, k = 1 : the whole particle system performs a rare event to travel from µ0 to µ1 cooperative behavior Gibbs conditioning principle (thermodynamical limit: N → ∞) N = 1, k → ∞ : each individual particle faces a harder task and must travel along an approximate geodesic individual behavior large deviation principle (slowing down limit: k → ∞) Slowing down N → ∞, k = 1 : the whole particle system performs a rare event to travel from µ0 to µ1 cooperative behavior Gibbs conditioning principle (thermodynamical limit: N → ∞) N = 1, k → ∞ : each individual particle faces a harder task and must travel along an approximate geodesic individual behavior large deviation principle (slowing down limit: k → ∞) . Slowing down principle .. . . .. . . The slowed down sequence (Rk)k≥1 encodes some geometry Slowing down N → ∞, k = 1 : the whole particle system performs a rare event to travel from µ0 to µ1 cooperative behavior Gibbs conditioning principle (thermodynamical limit: N → ∞) N = 1, k → ∞ : each individual particle faces a harder task and must travel along an approximate geodesic individual behavior large deviation principle (slowing down limit: k → ∞) . Slowing down principle .. . . .. . . The slowed down sequence (Rk)k≥1 encodes some geometry N → ∞, k → ∞ : these two behaviors superpose Results . Results 1 .. . . .. . . displacement interpolations feel curvature entropic interpolations also feel curvature Results . Results 1 .. . . .. . . displacement interpolations feel curvature entropic interpolations also feel curvature . Results 2 .. . . .. . . entropic interpolations converge to displacement interpolations entropic interpolations regularize displacement interpolations Results . Results 1 .. . . .. . . displacement interpolations feel curvature entropic interpolations also feel curvature . Results 2 .. . . .. . . entropic interpolations converge to displacement interpolations entropic interpolations regularize displacement interpolations Γ-convergence Results . Results 3 .. . . .. . . The same kind of results hold in other settings ...1 discrete graphs ...2 Finsler manifolds ...3 interpolations with varying mass Results . Results 3 .. . . .. . . The same kind of results hold in other settings ...1 discrete graphs ...2 Finsler manifolds ...3 interpolations with varying mass ...1 graphs: random walk Results . Results 3 .. . . .. . . The same kind of results hold in other settings ...1 discrete graphs ...2 Finsler manifolds ...3 interpolations with varying mass ...1 graphs: random walk ...2 Finsler: jump process in a manifold, (work in progress) Results . Results 3 .. . . .. . . The same kind of results hold in other settings ...1 discrete graphs ...2 Finsler manifolds ...3 interpolations with varying mass ...1 graphs: random walk ...2 Finsler: jump process in a manifold, (work in progress) ...3 varying mass: branching process, (work in progress) Results . Results 4 .. . . .. . . Schr¨odinger’s problem is the analogue of Hamilton’s least action principle. It allows for dynamical theories of diffusion processes random walks on graphs Results . Results 4 .. . . .. . . Schr¨odinger’s problem is the analogue of Hamilton’s least action principle. It allows for dynamical theories of diffusion processes random walks on graphs stochastic Newton equation acceleration is related to curvature References Schr¨odinger (1931) Villani (big yellow book on optimal transport) Zambrini (stochastic deformation of classical mechanics in the diffusion setting) References Schr¨odinger (1931) Villani (big yellow book on optimal transport) Zambrini (stochastic deformation of classical mechanics in the diffusion setting) Conforti + L. (preprint) Mikami (PTRF ’04) L. (JFA ’12, AoP ’15) . . . .. . . Thank you for your attention

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This article leans on some previous results already presented in [10], based on the Fréchet’s works,Wilson’s entropy and Minimal Trade models in connectionwith theMKPtransportation problem (MKP, stands for Monge-Kantorovich Problem). Using the duality between “independance” and “indetermination” structures, shown in this former paper, we are in a position to derive a novel approach to design a copula, suitable and efficient for anomaly detection in IT systems analysis.
 
Optimal Transport, Independance versus Indetermination duality, impact on a new Copula Design

Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Optimal Transport, Independance versus Indetermination duality, impact on a new Copula Design Benoit Huyot, Yves Mabiala Thales Communications and Security 29 October 2015 Benoit Huyot, Yves Mabiala 1 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba 1 Cybersecurity problem overview Current Intrusion Detection Systems Anomaly based IDS IDS as a classification problem 2 Properties of Copula Function Copula theory historic Sklar’s Theorem and Frechet’s Bounds Regularity properties on copula function 3 Copula theory used in anomalies detection applications Classification AUC with copula paradigm Experimental results Benoit Huyot, Yves Mabiala 2 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Current Intrusion Detection Systems Rule based approaches Suitable to detect previously known patterns Rules are easily understandable Easy addition of new rules But Unable to detect unknown patterns Benoit Huyot, Yves Mabiala 3 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Anomaly based IDS Anomaly based approaches Suitable to detect unknown patterns Time consuming to update model Alerts are difficult to understand through existing tools Too many false alerts But Our approach is an attempt to overcome these problems Benoit Huyot, Yves Mabiala 4 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Anomaly based IDS Anomaly detection as a classification problem Y is a binary random variable where Y = 0 if the event is abnormal Y = 1 else. p0 is the a priori attack probability define by p0 = P(Y ≤ 0) X represents the difference characteristics of the network event If X is a p-dimensional random vector, the cumulative distribution function will be denoted F(x) = P(X1 ≤ x1, ..., Xp ≤ xp) Benoit Huyot, Yves Mabiala 5 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba IDS as a classification problem Scoring function Scoring function is defined as P(Y = 0|X = x) By definition we have P(Y = 0|X = x) = P(Y = 0, X = x) P(X = x) Anomalies are identified thanks to the classical Bayes’s rule model Empirical estimation is difficult due to the ”Curse of Dimensionnality” Joint probabilities will be computed using copula theory to ease computations Benoit Huyot, Yves Mabiala 6 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Copula theory historic Introduction to Copula theory Originated by M.Fr´echet in 1951 Fr´echet, M. (1951): ”Sur les tableaux de corr´elations dont les marges sont donn´ees”, Annales de l’Universit´e de Lyon, Section A no 14, 53-77 A.Sklar gave a breakthrough in 1959 Sklar, A. (1959), ”Fonctions de r´epartition `a n dimensions et leurs marges”, Publ. Inst. Statist. Univ. Paris 8: 229-231 Benoit Huyot, Yves Mabiala 7 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Sklar’s Theorem and Frechet’s Bounds Main results on copula function Theorem (Sklar’s theorem) Given two continuous random variables X and Y in L1, with cumulative distribution functions written F and G. It exists an unique function C, called, copula such as: P(X ≤ x, Y ≤ y) = C(F(x), G(y)) Theorem (Fr´echet-Hoeffding’s Bounds) Given a copula function C, ∀(u, v) ∈ [0, 1]2 we have the following Fr´echet’s bounds: Max(u + v − 1, 0) ≤ C(u, v) ≤ Min(u, v) Benoit Huyot, Yves Mabiala 8 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Regularity properties on copula function 2-increasing property or Monge’s conditions B + D = C(u1, v2) D + C = C(v1, u2) A + B + C + D = C(v1, v2) D = C(u1, u2) A = (A + B + C + D) − (B + D) − (D + C) + D and A ≥ 0 ∀(u1, v1) as 0 ≤ u1 ≤ v1 ≤ 1 ∀(u2, v2) as 0 ≤ u2 ≤ v2 ≤ 1 C(v1, v2) − C(u1, v2) − C(v1, u2) + C(u1, u2) ≥ 0 Benoit Huyot, Yves Mabiala 9 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Regularity properties on copula function Copula is an Holderian function B + C + E = C(u2, v2) − C(u1, v1) A + C + E = C(u2, 1) − C(u1, 1) B + C + D = C(v2, 1) − C(v1, 1) B + C + E ≤ (B + C + D) + (A + C + E) We obtain a 1-Holderian condition for the Copula C: ∀(u1, v1, u2, v2) ∈ [0, 1]4 |C(u2, v2)−C(u1, v1)| ≤ |u2−u1|+|v2−v1| Benoit Huyot, Yves Mabiala 10 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Copula theory used in anomalies detection applications Only unfrequent events could have a score greater than 1 2 Looking for attack remains to looking for rare events Fr´echet’s Bounds gives us P(Y = 0|X) ≤ min(P(X), P(Y = 0)) P(X) and we get: P(Y = 0|X) ≥ 1 2 ⇒ P(X) ≤ 2.P(Y = 0) Benoit Huyot, Yves Mabiala 11 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Lower bound for anomalies detection It’s possible to show limit The ”lower tail dependance” is defined as: λL = Lim v→0 C(v, v) v λL ≤ Lim v→0 C(u, v) v Benoit Huyot, Yves Mabiala 12 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Variation of the score function We want to study to variation of v → C(u, v) v in [0, 2p0] 1 v2 v ∂C ∂v (u, v) − C(u, v) ≤ 0 ⇔ ∂C ∂v (u, v) ≤ C(u, v) v link to convexity ⇔ v ∂ ∂v logC(u, v) ≤ 1 link to Fisher’s information Benoit Huyot, Yves Mabiala 13 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Classification AUC with copula paradigm ROC curve and AUC Sensitivity: True Positive Rate, C(p0, s) p0 1-Specificity (anti-Specificity): False Positive Rate, s 1 − p0 (1 − C(p0, s)) AUC = 1 2p0(1 − p0) 1 − p2 0 − 1 0 (C(p0, s) − 1)2 ds In case of a bivariate random vector X we get: AUC = K1(p0)−K2(p0) 1 0 1 0 (C2(s1, s2) − 1)2 ∂2 ∂s1∂s2 C2(s1, s2)ds1ds2 Benoit Huyot, Yves Mabiala 14 Optimal transport problem In the Monge-Kantorovich problem we want to minimize following quantity: minh A 0 B 0 h(x, y) − 1 AB 2 Under constraints: 1 A 0 B 0 h(x, y) = 1 2 A 0 h(x, y) = g(y) 3 B 0 h(x, y) = f (x) The solution is given by: h∗ (x, y) = f (x) B + g(y) A − 1 AB The cumulative distribution function associated to the solution is: H∗ (x, y) = y F(x) B + x G(y) A − xy AB Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Classification AUC with copula paradigm Algorithm principle Benoit Huyot, Yves Mabiala 16 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Experimental results Experimental results Quantile level used for copula benchmark Quantile level 10−4 5.10−4 10−3 5.10−3 10−2 Optimal Transport Copula Detection rate 18.64% 73.86% 74.32% 74.82% 75.09% False alarms rate 23.15% 2.32% 4.38% 3.72% 4.71% Clayton Copula Detection rate 0.0% 0.0% 19.28% 71.73% 79.86% False alarms rate 0.0% 0.0% 0.63% 36.76% 34.20% Frechet’s upper bound Copula Detection rate 30.35% 31.39% 32.73% 36.93% 79.11% False alarms rate 41.26% 38.68% 31.89% 27.48% 27.95% Benoit Huyot, Yves Mabiala 17 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba Experimental results Thanks for your attention! Benoit Huyot, Yves Mabiala 18 Link to Fisher’s Information We will use the following equation: v C(u, v) ∂ ∂v C(u, v) = ∂ ∂v logC(u, v).v This condition is the statistical score The variance of this quantity gives the Fisher’s Information Sensitivity Sensitivity represents how many events are well assigned to anomalies Sensitivity : P( ˆY = 0|Y = 0) ˆY = 0 when F(X) ≤ s for a given threshold s ˆY = 0 when X ∈ F−1([0; s]) Sensitivity: P(X ∈ F−1([0; s])|p0) Sensitivity Sensitivity appears so as : P( ˆY = 0|Y = 0) = P(Y = 0, ˆY = 0) P(Y = 0) = P(Y = 0, X ≤ F−1 X (s)) P(Y = 0) = C(p0, s) p0 Specificity/Antispecificity Antispecificity represents how many misclassifications are given by the algorithm Specificity : P( ˆY = 1|Y = 1) ˆY = 1 when F(X) ≥ s for a given threshold s ˆY = 1 when X ∈ F−1([s; 1]) Specificity: P(X ∈ F−1([s; 1])|p0) Antispecificity Antispecificity appears using survival copula function as: 1 − P( ˆY = 1|Y = 1) = P( ˆY = 0|Y > 0) = P( ˆY = 0) P(Y > 0) P(Y > 0| ˆY = 0) = s 1 − p0 (1 − C(p0, s)) Area under ROC Curve (AUC) AUC = 1 0 PD(PF )dPF Using an integration by substitution we obtain: AUC = 1 0 PD(s). ∂PF (s) ∂s ds Sensitivity: PD(s) = C(p0, s) p0 Antispecificity PF (s) = s 1 − p0 (1 − C(p0, s)) AUC = 1 p0(1 − p0) 1 0 C(p0, s) − C(p0, s)2 − sC(p0, s)C (p0, s) ds AUC simplification AUC = 1 p0(1 − p0) 1 0 C(p0, s) − C(p0, s)2 − sC(p0, s)C (p0, s) ds An integration by parts give us: A3 = − sC2(p0, s) 2 1 0 + 1 2 1 0 C(p0, s)2 ds = − p2 0 2 + 1 2 1 0 C(p0, s)2 ds AUC = 1 p0(1 − p0) 1 0 C(p0, s) − 1 2 C(p0, s)2 ds − p0 2(1 − p0) Using this simplification X − 1 2 X2 = − 1 2 X2 − 2X + 1 + 1 2 it comes: AUC = 1 2p0(1 − p0) 1 − p2 0 − 1 0 (C(p0, s) − 1)2 ds AUC in a bivariate case Using the Frechet-Hoeffding’s upper bounds and the lower tail dependence we get: 1 0 (λLs − 1)2 ds ≤ 1 0 (C(p0, s) − 1)2 ds ≤ 1 0 (min(p0, s) − 1)2 ds It comes : K + λ2 L 1 0 (s − 1)2 ds ≤ 1 0 (C(p0, s) − 1)2 ds ≤ 1 0 (s − 1)2 ds If X is a bivariate random vector: 1 0 (s − 1)2 ds = 1 0 1 0 (C2(s1, s2) − 1)2 ∂2 ∂s1∂s2 C2(s1, s2)ds1ds2

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present an overview of our recent work on implementable solutions to the Schrödinger bridge problem and their potential application to optimal transport and various generalizations.
 
Optimal mass transport over bridges

Optimal Mass Transport over Bridges Michele Pavon Department of Mathematics University of Padova, Italy GSI’15, Paris, October 29, 2015 Joint work with Yongxin Chen, Tryphon Georgiou, Department of Electrical and Computer Engineering, University of Minnesota A Venetian Schr¨odinger bridge Dynamic version of OMT “Fluid-dynamic” version of OMT (Benamou and Brenier (2000)): inf (ρ,v) Rn 1 0 1 2 v(x, t) 2ρ(x, t)dtdx, (1a) ∂ρ ∂t + · (vρ) = 0, (1b) ρ(x, 0) = ρ0(x), ρ(y, 1) = ρ1(y). (1c) Proposition 1 Let ρ∗(x, t) with t ∈ [0, 1] and x ∈ Rn, satisfy ∂ρ∗ ∂t + · ( ψρ∗) = 0, ρ∗(x, 0) = ρ0(x), where ψ is the (viscosity) solution of the Hamilton-Jacobi equation ∂ψ ∂t + 1 2 ψ 2 = 0 for some boundary condition ψ(x, 1) = ψ1(x). If ρ∗(x, 1) = ρ1(x), then the pair (ρ∗, v∗) with v∗(x, t) = ψ(x, t) is optimal for (1). Schr¨odinger’s Bridges • Cloud of N independent Brownian particles; • empirical distr. ρ0(x)dx and ρ1(y)dy at t = 0 and t = 1, resp. • ρ0 and ρ1 not compatible with transition mechanism ρ1(y) = 1 0 p(t0, x, t1, y)ρ0(x)dx, where p(s, y, t, x) = [2π(t − s)]−n 2 exp − |x − y|2 2(t − s) , s < t Particles have been transported in an unlikely way (N large). Schr¨odinger(1931): Of the many unlikely ways in which this could have happened, which one is the most likely? Schr¨odinger’s Bridges (cont’d) Schr¨odinger: solution (bridge from ρ0 to ρ1 over Brownian motion), has at each time a density ρ that factors as ρ(x, t) = ϕ(x, t) ˆϕ(x, t), where ϕ and ˆϕ solve Schr¨odinger’s system ϕ(x, t) = p(t, x, 1, y)ϕ(y, 1)dy, ϕ(x, 0) ˆϕ(x, 0) = ρ0(x) ˆϕ(x, t) = p(0, y, t, x) ˆϕ(y, 0)dy, ϕ(x, 1) ˆϕ(x, 1) = ρ1(x). F¨ollmer 1988: This is a problem of large deviations of the empirical distribution on path space connected through Sanov’s theorem to a maximum entropy problem. Existence and uniqueness for Schr¨odinger’s system: Fortet 1940, Beurling 1960, Jamison 1974/75, F¨ollmer 1988. Schr¨odinger’s Bridges as a control problem The maximum entropy formulation of the Schr¨odinger bridge prob- lem (SBP) with “prior” P is Minimize H(Q, P ) = EQ log dQ dP over D(ρ0, ρ1), where D be the family of distributions on Ω := C([0, 1], Rn) that are equivalent to stationary Wiener measure W = Wx dx. It can be turned, thanks to Girsanov’s theorem, into a stochastic control problem (Blaqui`ere, Dai Pra, M.P.-Wakolbinger, Filliger- Hongler-Streit,...) with fluid dynamic counterpart (P = W ) inf (ρ,v) Rn 1 0 1 2 v(x, t) 2ρ(x, t)dtdx, ∂ρ ∂t + · (vρ) − 2 ∆ρ = 0, ρ(x, 0) = ρ0(x), ρ(y, 1) = ρ1(y). Alternative time-symmetric fluid-dynamic for- mulation of SBP When prior is W stationary Wiener measure with variance inf (ρ,v) Rn 1 0 1 2 v(x, t) 2 + 8 log ρ(x, t) 2 ρ(x, t)dtdx, ∂ρ ∂t + · (vρ) = 0, ρ(0, x) = ρ0(x), ρ(1, y) = ρ1(y). With respect to Benamou and Brenier problem, extra term given by a Fisher information functional integrated over time. Answers at once question posed by Eric Carlen in 2006 investigating connections between OMT and Nelson’s stochastic mechanics. Schr¨odinger’s Bridges and OMT - Can we use SBP as a regular approximation of OMT? Yes, Mikami 2004, Mikami-Thieullen 2006,2008, L´eonard 2012. - Is this useful to compute solution to OMT? Problems: 1. Solution to control formulation of SBP not given in implementable form! 2 Control formulation of SBP only for non degenerate diffusions with control and noise entering through same channel! (excludes most engineering applications); 3 No steady-state theory; 4 No OMT problem with nontrivial prior! Gauss-Markov processes Problem 1. Find a control u minimizing J(u) := E 1 0 u(t) · u(t) dt , among those which achieve the transfer dXt = A(t)Xtdt + B(t)u(t)dt + B1(t)dWt, X0 ∼ N (0, Σ0), X1 ∼ N (0, Σ1). Engineering applications: Swarms of robots, shape bulk magne- tization distribution in NMR spectroscopy and imaging, industrial process control, ... If pair (A, B) is controllable (for constant A and B amounts to B, AB, ..., An−1B having full row rank), prob- lem always feasible (highly nontrivial, control may be “handicapped” with respect to the effect of noise). Gauss-Markov processes (cont’d) Problem 2. Find u = −Kx which minimizes Jpower(u) := E{u · u} and such that dx(t) = (A − BK)x(t)dt + B1dw(t) has ρ(x) = (2π)−n/2 det(Σ)−1/2 exp − 1 2 x Σ−1x as invariant probability density. Problem may not have a solution (not all values for Σ can be maintained by state feedback). Previous contributions: Beghi (1996,1997), Grigoriadis- Skelton (1997), Brockett (2007, 2012), Vladimirov-Petersen (2010, 2015) Gauss-Markov processes (cont’d) Sufficient conditions for optimality in terms of: - a system of two matrix Riccati equations (Lyapunov equations if B = B1) in the finite horizon case ˙Π = −A Π − ΠA + ΠBB Π ˙H = −A H − HA − HBB H + (Π + H) BB − B1B1 (Π + H) . Σ−1 0 = Π(0) + H(0) Σ−1 T = Π(T ) + H(T ). Gauss-Markov processes (cont’d) - in terms of algebraic conditions for the stationary case. rank AΣ + ΣA + B1B1 B B 0 = rank 0 B B 0 . Optimal controls may be computed via semidefinite programming in both cases. - Y. Chen, T.T. Georgiou and M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, Part I, Aug. 2014, arXiv:1408.2222v1, IEEE Trans. Aut. Control, to appear. - Y. Chen, T.T. Georgiou and M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, Part II, Oct. 2014, arXiv:1410.3447v1, IEEE Trans. Aut. Control, to appear. Cooling Two problems: • Efficient asymptotic steering of a system of stochastic oscillators to desired steady state ¯ρ; • Efficient steering of the system from initial condition ρ0 to ¯ρ at finite time t = 1. In both cases get solution for general system of nonlinear stochastic oscillators by extending Schr¨odinger bridges theory. - Y. Chen, T.T. Georgiou and M. Pavon, Fast cooling for a system of stochastic oscillators, arXiv:1411.1323v2, J. Math. Phys. Nov. 2015. OMT with “prior” inf (ρ,v) Rn 1 0 1 2 v(x, t) − vp(x, t) 2ρ(x, t)dtdx, (2a) ∂ρ ∂t + · (vρ) = 0, (2b) ρ(x, 0) = ρ0(x), ρ(y, 1) = ρ1(y). (2c) Proposition 2 Let ρ∗(x, t) with t ∈ [0, 1] and x ∈ Rn, satisfy ∂ρ∗ ∂t + · [(vp + ψ)ρ∗] = 0, ρ∗(x, 0) = ρ0(x), where ψ is the (viscosity) solution of the Hamilton-Jacobi equation ∂ψ ∂t + vp · ψ + 1 2 ψ 2 = 0 for boundary cond. ψ(x, 1) = ψ1(x). If ρ∗(x, 1) = ρ1(x), then the pair (ρ∗, v∗) with v∗(x, t) = vp(x, t) + ψ(x, t) is optimal for (2). OMT with “prior” (cont’d) Problem still in classical OMT framework inf π∈Π(µ,ν) Rn×Rn c(x, y)dπ(x, y), with c(x, y) = inf x∈Xxy 1 0 L(t, x(t), ˙x(t))dt, L(t, x, ˙x) = ˙x − vp(x, t) 2. Many results in OMT only for c(x, y) = c(x−y) strictly convex orig- inating from a Lagrangian L(t, x, ˙x) = c( ˙x). We are also interested in inf (ρ,u) Rn 1 0 1 2 u(x, t) 2ρ(x, t)dtdx, (3a) ∂ρ ∂t + · ((vp(x, t) + B(t)u(x, t))ρ) = 0, (3b) ρ(x, 0) = ρ0(x), ρ(y, 1) = ρ1(y). (3c) OMT and SBP Mikami (2004), L´eonard (2012) show, when prior is W , that as the diffusion coefficient tends to zero, OMT is Γ-limit of SBP. Hence infima converge and minimizers converge. In Gaussian case, we show directly convergence of solution to Hamilton-Jacobi-Bellman equation to solution of Hamilton-Jacobi equation also in case with prior. - Y. Chen, T.T. Georgiou and M. Pavon, On the relation between optimal transport and Schr¨odinger bridges: A stochastic control viewpoint, Dec. 2014, arXiv:1412.4430v1, J. Opt. Th. Appl., DOI: 10.1007/s10957-015-0803-z. - Y. Chen, T.T. Georgiou and M. Pavon, Optimal transport over a linear dynamical system, Feb. 2015, arXiv:1502.01265v1. OMT and SBP: Example Smoluchowski model for highly overdamped planar Brownian mo- tion in a force field is, in a strong sense, high-friction limit of full Ornstein-Uhlenbeck model in phase space dXt = − V (Xt)dt + √ dWt, − V (x) = Ax, A = −3 0 0 −3 , m0 = 5 5 , and Σ0 = 1 0 0 1 m1 = −5 −5 , and Σ1 = 1 0 0 1 Transparent tube represent the “3σ region” (x − mt)Σ−1 t (x − mt) ≤ 9. OMT and SBP: Example (cont’d) Interpolation based on Schr¨odinger bridge with = 9 Interpolation based on Schr¨odinger bridge with = 4 Interpolation based on Schr¨odinger bridge with = 0.01 Interpolation based on optimal transport with prior OMT and SBP in general How can we effectively compute the solution of the SBP in the general non Gaussian case? In T. T. Georgiou and M. Pavon, Positive contraction mappings for classical and quantum Schr¨odinger systems, May 2014, arXiv:1405.6650v2, J. Math. Phys., 56, 033301, March 2015. efficient iterative techniques to solve the Schr¨odinger system for Markov chains and Kraus maps of statistical quantum mechanics based on the Garrett Birkhoff (1957)-Bushell(1973) theorem. Application to general OMT - Y. Chen, T. Georgiou, and M. Pavon, Entropic and displacement interpolation: a computational approach using the Hilbert metric, June 2015, arXiv:1506.04255v1, submitted for publication. Applications to interpolation of 2D images to get 3D model. [t = 0] [t = 1] MRI slices at two different points [t = 0.2] [t = 0.4] [t = 0.6] [t = 0.8] Interpolation with = 0.01 THANK YOU FOR YOUR ATTENTION! References • T. T. Georgiou and M. Pavon, Positive contraction mappings for classical and quantum Schr¨odinger systems, arXiv:1405.6650v2, J. Math. Phys., 56, 033301, March 2015. • Y. Chen and T.T. Georgiou, Stochastic bridges of linear systems, preprint, arxiv: 1407.3421, IEEE Trans. Aut. Control, to appear. • Y. Chen, T.T. Georgiou and M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, Aug. 2014, arXiv:1408.2222v1, IEEE Trans. Aut. Control, to appear. • Y. Chen, T. Georgiou and M. Pavon, Optimal steering of inertial particles diffusing anisotropically with losses, arXiv 1410.1605v1, Oct. 7, 2014, Amer- ican Control Conf. 2015. • Y. Chen, T.T. Georgiou and M. Pavon, Optimal steering of a linear stochastic system to a final probability distribution, part II, Oct. 2014, arXiv:1410.3447v1, IEEE Trans. Aut. Control, to appear. • Y. Chen, T.T. Georgiou and M. Pavon, Fast cooling for a system of stochas- tic oscillators, Nov. 2014, arXiv:1411.1323v2, J. Math. Phys., Nov. 2015. • Y. Chen, T.T. Georgiou and M. Pavon, On the relation between optimal transport and Schr¨odinger bridges: A stochastic control viewpoint, Dec. 2014, arXiv:1412.4430v1, JOTA, to appear. • Y. Chen, T.T. Georgiou and M. Pavon, Optimal transport over a linear dynamical system, Feb. 2015, arXiv:1502.01265v1. References (cont’d) • Y. Chen, T.T. Georgiou and M. Pavon, Optimal mass transport over bridges, arXiv 1503.00215v1, Feb. 28, 2015, GSI’15 Conf.. • Y. Chen, T. Georgiou and M. Pavon, Steering state statistics with output feedback, arXiv 1504.00874v1, April 3, 2015, Proc. CDC 2015 (to appear). • Y. Chen, T.T. Georgiou and M. Pavon, Optimal control of the state statistics for a linear stochastic system, arXiv 1503.04885v1, March 17, 2015, Proc. CDC 2015 (to appear). • Y. Chen, T.T. Georgiou and M. Pavon, Entropic and displacement inter- polation: a computational approach using the Hilbert metric, June 2015, arXiv:1506.04255v1, submitted for publication. Markovian prior P with Nelson’s current velocity field vP (x, t) Minimize(ρ,v) t1 t0 RN 1 2σ2 v(x, t) − vP (x, t) 2 + 2 8 log ρ ρP (x, t) 2 ρ(x, t)dxdt, ∂ρ ∂t + · (vρ) = 0, ρt0 = ρ0, ρt1 = ρ1. Comparison with OMT with prior: There is here an extra term in the action functional which has the form of a relative Fisher information of ρt with respect to the prior one-time density ρP t (Dirichlet form) integrated over time. Decomposition of relative entropy H(Q, P ) = EQ log dQ dP = EQ log dµQ dµP (x(0), x(1)) + EQ   log dQ x(1) x(0) dP x(1) x(0) (x)    = log dµQ dµP dµQ + log dQ y x dP y x dQy xµQ(dx, dy). Thus, problem reduces to minimizing log dµQ dµP dµQ subject to the (linear) constraints µQ(dx × Rn) = ρ0(x)dx, µQ(Rn × dy) = ρ1(y)dy.

Information Geometry in Image Analysis (chaired by Yannick Berthoumieu, Geert Verdoolaege)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The current paper introduces new prior distributions on the zero-mean multivariate Gaussian model, with the aim of applying them to the classification of covariance matrices populations. These new prior distributions are entirely based on the Riemannian geometry of the multivariate Gaussian model. More precisely, the proposed Riemannian Gaussian distribution has two parameters, the centre of mass ˉY and the dispersion parameter σ. Its density with respect to Riemannian volume is proportional to exp(−d2(Y;ˉY)), where d2(Y;ˉY) is the square of Rao’s Riemannian distance. We derive its maximum likelihood estimators and propose an experiment on the VisTex database for the classification of texture images.
 
Texture classification using Rao's distance on the space of covariance matrices

Geometric Science of Information 2015 Non supervised classification in the space of SPD matrices Salem Said – Lionel Bombrun – Yannick Berthoumieu Laboratoire IMS CNRS UMR 5218 – Universit´e de Bordeaux 29 October 2015 Said et al. (IMS Bordeaux – CNRS UMR 5218) Geometric Science of Information 2015 29 October 2015 0 / 11 Context of our work Our project : Statistical learning in the space of SPD matrices Our team : 3 members of IMS laboratory + 2 post docs (Hatem Hajri, Paolo Zanini) Target applications : remote sensing , radar signal processing , Neuroscience (BCI) Our partners : IMB (Marc Arnaudon + PhD student), Gipsa-lab, Ecole des Mines Our recent work http ://arxiv.org/abs/1507.01760 Riemannian Gaussian distributions on the space of SPD matrices (in review, IEEE IT) Some of our problems : Given a population of SPD matrices (any size or structure) − Non-supervised learning of its class structure − Semi-parametric learning of its density Please look up our paper on Arxiv :-) Said et al. (IMS Bordeaux – CNRS UMR 5218) Geometric Science of Information 2015 29 October 2015 1 / 11 Geometric tools Statistical manifold : Θ = SPD, Toeplitz, Block-Toeplitz, etc, matrices Hessian or Fisher metric : ds2 (θ ) = HessΦ (dθ,dθ ) Φ model entropy — Θ becomes a Riemannian homogeneous space of negative curvature ! ! Example : 2 × 2 correlation (baby Toeplitz) Θ = 1 θ θ∗ 1 |θ | < 1 Φ(θ ) = − log[1 − |θ |2 ] ⇒ ds2 (θ ) = |dθ |2 [1 − |θ |2 ]2 Poincar´e disc model Why do we use this ? – Suitable mathematical properties – Relation to entropy or “information” – Often leads to excellent performance Said et al. (IMS Bordeaux – CNRS UMR 5218) Geometric Science of Information 2015 29 October 2015 2 / 11 First place in IEEE BCI challenge Contribution I-Introduction of Riemannian Gaussian distributions A statistical model of a class/cluster : [Pennec 2006] p(θ | ¯θ, σ ) = Z−1 (σ ) Expression unknown in the literature × exp − d 2 (θ, ¯θ ) 2σ2 d(θ, ¯θ ) Riema- nnian distance Computing Z (σ ) Z (σ ) = Θ exp − d 2 (θ, ¯θ ) 2σ2 dv(θ ) d 2 (θ, ¯θ ) = tr log θ−1 ¯θ 2 dv(θ ) = det(θ )− m+1 2 i

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a new texture discrimination method for textured color images in the wavelet domain. In each wavelet subband, the correlation between the color bands is modeled by a multivariate generalized Gaussian distribution with fixed shape parameter (Gaussian, Laplacian). On the corresponding Riemannian manifold, the shape of texture clusters is characterized by means of principal geodesic analysis, specifically by the principal geodesic along which the cluster exhibits its largest variance. Then, the similarity of a texture to a class is defined in terms of the Rao geodesic distance on the manifold from the texture’s distribution to its projection on the principal geodesic of that class. This similarity measure is used in a classification scheme, referred to as principal geodesic classification (PGC). It is shown to perform significantly better than several other classifiers.
 
Color Texture Discrimination using the Principal Geodesic Distance on a Multivariate Generalized Gau

FACULTY OF ENGINEERING AND ARCHITECTURE Color Texture Discrimination using the Principal Geodesic Distance on a Multivariate Generalized Gaussian Manifold Geert Verdoolaege1,2 and Aqsa Shabbir1,3 1Department of Applied Physics, Ghent University, Ghent, Belgium 2Laboratory for Plasma Physics, Royal Military Academy (LPP–ERM/KMS), Brussels, Belgium 3Max-Planck-Institut für Plasmaphysik, D-85748 Garching, Germany Geometric Science of Information Paris, October 28–30, 2015 Overview 1 Color texture 2 Geometry of wavelet distributions 3 Principal geodesic classification 4 Classification experiments 5 Conclusions 2 Overview 1 Color texture 2 Geometry of wavelet distributions 3 Principal geodesic classification 4 Classification experiments 5 Conclusions 3 VisTex database 128 × 128 subimages extracted from RGB images from 40 classes (textures) 4 CUReT database 200 × 200 RGB images from 61 classes with varying illumination and viewpoint 5 Texture modeling Structure at various scales Stochasticity Correlations between colors, neighboring pixels, etc. ⇒ Multivariate wavelet distributions 6 Overview 1 Color texture 2 Geometry of wavelet distributions 3 Principal geodesic classification 4 Classification experiments 5 Conclusions 7 Generalized Gaussian distributions Univariate: generalized Gaussian distribution (zero mean): p(x|α, β) = β 2αΓ(1/β) exp − |x| α β m-variate multivariate generalized Gaussian (MGGD, zero-mean): p(x|Σ, β) = Γ m 2 π m 2 Γ m 2β 2 m 2β β |Σ| 1 2 exp − 1 2 x Σ−1 x β Shape parameter β = 1: Gaussian; β = 1/2: Laplace (heavy tails) 8 MGGD geometry: coordinate system (Σ1, β1) → (Σ2, β2): find K such that K Σ1K = Im, K Σ2K ≡ Φ2 ≡ diag(λ1 2, . . . , λp 2), λi 2 eigenvalues of Σ−1 1 Σ2 In fact, ∀ Σ(t), t ∈ [0, 1]: K Σ(t)K ≡ Φ(t) ≡ diag(λ1 2, . . . , λp 2), λi 2 eigenvalues of Σ−1 1 Σ(t) ri(t) ≡ ln[λi(t)] M. Berkane et al., J. Multivar. Anal., 63, 35–46, 1997 G. Verdoolaege and P. Scheunders, J. Math. Imaging Vis., 43, 180–193, 2012 9 MGGD geometry: Fisher information metric gββ(β) = 1 β2 1 + m 2β 2 Ψ1 m 2β + m β ln(2) + Ψ m 2β + m 2β [ln(2)]2 + Ψ 1 + m 2β ln(4) + Ψ 1 + m 2β + Ψ1 1 + m 2β gβi(β) = − 1 2β 1 + ln(2) + Ψ 1 + m 2β gii(β) = 3bh − 1 4 bh ≡ 1 4 m + 2β m + 2 gij(β) = bh − 1 4 , i = j 10 MGGD geometry: geodesics and exponential map Geodesic equations for fixed β: ri (t) ≡ ln(λi 2) t Geodesic distance: GD(Σ1, Σ2) =   3bh − 1 4 i (ri 2)2 + 2 bh − 1 4 i

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Practical estimation of mixture models may be problematic when a large number of observations are involved: for such cases, online versions of Expectation-Maximization may be preferred, avoiding the need to store all the observations before running the algorithms. We introduce a new online method well-suited when both the number of observations is large and lots of mixture models need to be learned from different sets of points. Inspired by dictionary methods, our algorithm begins with a training step which is used to build a dictionary of components. The next step, which can be done online, amounts to populating the weights of the components given each arriving observation. The usage of the dictionary of components shows all its interest when lots of mixtures need to be learned using the same dictionary in order to maximize the return on investment of the training step. We evaluate the proposed method on an artificial dataset built from random Gaussian mixture models.
 
Bag-of-components an online algorithm for batch learning of mixture models

Information Geometry for mixtures Co-Mixture Models Bag of components Bag-of-components: an online algorithm for batch learning of mixture models Olivier Schwander Frank Nielsen Université Pierre et Marie Curie, Paris, France École polytechnique, Palaiseau, France October 29, 2015 1 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Exponential families Definition p(x; λ) = pF (x; θ) = exp ( t(x)|θ − F(θ) + k(x)) λ source parameter t(x) sufficient statistic θ natural parameter F(θ) log-normalizer k(x) carrier measure F is a stricly convex and differentiable function ·|· is a scalar product 2 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Multiple parameterizations: dual parameter spaces Legendre Transform (F, Θ) ↔ (F , H) θ ∈ Θ Natural Parameters η ∈ H Expectation Parameters θ = F (η) η = F(θ) Source Parameters (not unique) λ1 ∈ Λ1, λ2 ∈ Λ2, . . . , λn ∈ Λn Multiple source parameterizations Two canonical parameterizations 3 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman divergences Definition and properties BF (x y) = F(x) − F(y) − x − y, F(y) F is a stricly convex and differentiable function No symmetry! Contains a lot of common divergences Squared Euclidean, Mahalanobis, Kullback-Leibler, Itakura-Saito. . . 4 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman centroids Left-sided centroid min c i ωi BF (c xi ) Right-sided centroid min c i ωi BF (xi c) Closed-form cL = F∗ i ωi F(xi ) cR = i ωi xi 5 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Link with exponential families [Banerjee 2005] Bijection with exponential families log pF (x|θ) = −BF∗ (t(x) η) + F∗ (t(x)) + k(x) Kullback-Leibler between exponential families between members of the same exponential family KL(pF (x, θ1), pF (x, θ2)) = BF (θ2 θ1) = BF (η1 η2) Kullback-Leibler centroids In closed-form through the Bregman divergence 6 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Maximum likelihood estimator A Bregman centroid ˆη = arg max η i log pF (xi , η) = arg min η i BF∗ (t(xi ) η) −F∗ (t(xi )) − k(xi ) does not depend on η = arg min η i BF∗ (t(xi ) η) = i t(xi ) And ˆθ = F (ˆη) 7 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Mixtures of exponential families m(x; ω, θ) = 1≤i≤k ωi pF (x; θi ) Fixed Family of the components PF Number of components k (model selection techniques to choose) Parameters Weights i ωi = 1 Component parameters θi Learning a mixture Input: observations x1, . . . , xN Output: ωi and θi 8 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Exponential families Bregman divergences Mixture models Bregman Soft Clustering: EM for exponential families [Banerjee 2005] E-step p(i, j) = ωjpF (xi , θj) m(xi ) M-step ηj = arg max η i p(i, j) log pF (xi , θj) = arg min η i p(i, j)   BF∗ (t(xi ) η) −F∗ (t(xi )) − k(xi ) does not depend on η    = i p(i, j) u p(u, j) t(xu) 9 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Joint estimation of mixture models Exploit shared information between multiple pointsets to improve quality to improve speed Inspiration Dictionary methods Transfer learning Efficient algorithms Building Comparing 10 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Co-Mixtures Sharing components of all the mixtures m1(x|ω(1) , η) = k i=1 ω (1) i pF (x| ηj) . . . mS(x|ω(S) , η) = k i=1 ω (S) i pF (x| ηj) Same η1 . . . ηk everywhere Different weights ω(l) 11 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications co-Expectation-Maximization Maximize the mean of the likelihoods on each mixtures E-step A posterior matrix for each dataset p(l) (i, j) = ω (l) j pF (xi , θj) m(x (l) i |ω(l), η) M-step Maximization on each dataset η (l) j = i p(i, j) u p(l)(u, j) t(x(l) u ) Aggregation ηj = 1 S S l=1 η (l) j 12 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Variational approximation of Kullback-Leibler [Hershey Olsen 2007] KLVariationnal(m1, m2) = K i=1 ω (1) i log j ω (1) j e−KL(pF (·; θi ) pF (·; θj )) j ω (2) j e−KL(pF (·; θi ) pF (·; θj )) With shared parameters Precompute Dij = e−KL(pF (·| ηi ),pF (·| ηj )) Fast version KLvar(m1 m2) = i ω (1) i log j ω (1) j e−Dij j ω (2) j e−Dij 13 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications co-Segmentation Segmentation from 5D RGBxy mixtures Original EM Co-EM 14 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Motivation Algorithms Applications Transfer learning Increase the quality of one particular mixture of interest First image: only 1% of the points Two other images: full set of points Not enough points for EM 15 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Bag of Components Training step Comix on some training set Keep the parameters Costly but offline D = {θ1, . . . , θK } Online learning of mixtures For a new pointset For each observation arriving: arg max θ∈D pF (xj, θ) or arg min θ∈D BF (t(xj), θ) 16 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Nearest neighbor search Naive version Linear search O(number of samples × number of components) Same order of magnitude as one step of EM Improvement Computational Bregman Geometry to speed-up the search Bregman Ball Trees Hierarchical clustering Approximate nearest neighbor 17 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Image segmentation Segmentation on a random subset of the pixels 100% 10% 1% EM BoC 18 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Computation times Training 100% 10% 1% 0 20 40 60 80 100 120 Training EM BoC 19 / 20 Information Geometry for mixtures Co-Mixture Models Bag of components Algorithm Experiments Summary Comix Mixtures with shared components Compact description of a lot of mixtures Fast KL approximations Dictionary-like methods Bag of Components Online method Predictable time (no iteration) Works with only a few points Fast 20 / 20

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Stochastic watershed is an image segmentation technique based on mathematical morphology which produces a probability density function of image contours. Estimated probabilities depend mainly on local distances between pixels. This paper introduces a variant of stochastic watershed where the probabilities of contours are computed from a gaussian model of image regions. In this framework, the basic ingredient is the distance between pairs of regions, hence a distance between normal distributions. Hence several alternatives of statistical distances for normal distributions are compared, namely Bhattacharyya distance, Hellinger metric distance and Wasserstein metric distance.
 
Statistical Gaussian Model of Image Regions in Stochastic Watershed Segmentation

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A technique of spatial-spectral quantization of hyperspectral images is introduced. Thus a quantized hyperspectral image is just summarized by K spectra which represent the spatial and spectral structures of the image. The proposed technique is based on α-connected components on a region adjacency graph. The main ingredient is a dissimilarity metric. In order to choose the metric that best fit the hyperspectral data manifold, a comparison of different probabilistic dissimilarity measures is achieved.
 
Quantization of hyperspectral image manifold using probabilistic distances

Optimal Transport and applications in Imagery/Statistics (chaired by Bertrand Maury, Jérémie Bigot)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Optimal transport (OT) is a major statistical tool to measure similarity between features or to match and average features. However, OT requires some relaxation and regularization to be robust to outliers. With relaxed methods, as one feature can be matched to several ones, important interpolations between different features arise. This is not an issue for comparison purposes, but it involves strong and unwanted smoothing for transfer applications. We thus introduce a new regularized method based on a non-convex formulation that minimizes transport dispersion by enforcing the one-to-one matching of features. The interest of the approach is demonstrated for color transfer purposes.
 
Non-convex relaxation of optimal transport for color transfer between images

Introduction 1 / 30 Adaptive color transfer with relaxed optimal transport Julien Rabin1 , Sira Ferradans2 and Nicolas Papadakis3 1 GREYC, University of Caen, 2 Data group, ENS, 3 CNRS, Institut de Mathématiques de Bordeaux Conference on Geometric Science of Information J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Introduction 2 / 30 Optimal transport on histograms Monge-Kantorovitch (MK) discrete mass transportation problem: Map µ0 onto µ1 while minimizing the total transport cost ������������� The two histograms must have the same mass. Optimal transport cost is called the Wasserstein distance (Earth Mover’s Distance) Optimal transport map is the application mapping µ0 onto µ1 J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Introduction 3 / 30 Applications in Image Processing and Computer Vision Optimal transport as a framework to define statistical-based tools Applications to many imaging and computer vision problems: • Robust dissimilarity measure (Optimal transport cost): Image retrieval [Rubner et al., 2000] [Pele and Werman, 2009] SIFT matching [Pele and Werman, 2008] [Rabin et al., 2009] 3D shape recognition, Feature detection [Tomasi] Object segmentation [Ni et al., 2009] [Swoboda and Schnorr, 2013] • Tool for matching/interpolation (Optimal transport map): Non-rigid shape matching, image registration [Angenent et al., 2004] Texture synthesis and mixing [Ferradans et al., 2013] Histogram specification and averaging [Delon, 2004] Color transfer [Pitié et al., 2007], [Rabin et al., 2011b] Not to mention other applications (physics, economy, etc). J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Introduction 4 / 30 Color transfer Target image (µ) Source image (ν) Optimal transport of µ onto ν Target image after color transfer Limitations: • Mass conservation artifacts • Irregularity of optimal transport map J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Introduction 5 / 30 Outline Outline: Part I. Computation of optimal transport between histograms Part II. Optimal transport relaxation and regularization Application to color transfer J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 6 / 30 Part I Wasserstein distance between histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 7 / 30 Formulation for clouds of points Definition: L2 -Wasserstein Distance Given two clouds of points X, Y ⊂ Rd×N of N elements in Rd with equal masses 1 N , the quadratic Wasserstein distance is defined as W2(X, Y)2 = min σ∈ΣN 1 N N i=1 Xi − Yσ(i) 2 (1) where ΣN is the set of all permutations of N elements. J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 7 / 30 Formulation for clouds of points Definition: L2 -Wasserstein Distance Given two clouds of points X, Y ⊂ Rd×N of N elements in Rd with equal masses 1 N , the quadratic Wasserstein distance is defined as W2(X, Y)2 = min σ∈ΣN 1 N N i=1 Xi − Yσ(i) 2 (1) where ΣN is the set of all permutations of N elements. ⇔ Optimal Assignment problem, can be computed using standard sorting algorithms when d = 1 J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 8 / 30 Exact solution in unidimensional case (d = 1) for histograms Histograms may be seen as clouds of points with non-uniform masses, so that µ(x) = M i=1 mi δXi (x), s.t. i mi = 1, mi ≥ 0 ∀i Computing the Lp -Wasserstein distance for one-dimensional histograms is still simple for p ≥ 1. Optimal transport cost writes [Villani, 2003] Wp(µ, ν) = H−1 µ − H−1 ν p = 1 0 H−1 µ (t) − H−1 ν (t) p dt 1 p where Hµ(t) = t −∞ dµ = Xi t mi is the cumulative distribution function of µ and H−1 µ (t) = inf {s \ Rµ(s) t} its pseudo-inverse. Time complexity: O(N) operations if bins are already sorted. J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 9 / 30 Exact solution in unidimensional case (d = 1) for histograms Can not be extended to higher dimensions as the cumulative function Hµ : x ∈ Rd → Hµ(x) ∈ R is not invertible for d > 1 J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 10 / 30 Exact solution in general case (d>1) Transport cost between normalized histograms µ and ν, where µ = M i=1 mi δXi , ν = N j=1 nj δYj , mi , nj ≥ 0 and i mi = j nj = 1. • mi , nj are the masses at locations Xi , Yj It can be recasted as a linear programming problem: linear cost + linear constraints W2(µ, ν)2 = min P∈Pµ,ν    P , C = i,j Pi,j Xi − Yj 2    = min A·p=b pT c • C is the fixed cost assignment matrix between histograms bins: Ci,j = d k=1 ||Xk i − Yk j ||2 • Pµ,ν is the set of non negative matrices P with marginals µ and ν, ie P(µ, ν) =    P ∈ RM×N , Pi,j 0, i,j Pi,j = 1, j Pi,j = mi , i Pi,j = nj    J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 11 / 30 Illustration in unidimensional case (d = 1) for histograms Two histograms µ = {1 3 , 2 3 } and ν = {1 3 , 1 6 , 1 2 } Example: µi is the production at plant i and νj is the storage capacity of storehouse j Matrix C defines the transport cost from i to j: C11 = 22 C21 = 62 C12 = 12 C22 = 52 C13 = 52 C23 = 12 The set of admissible matrices P is µ1 = 1/3 µ2 = 2/3 P11 P21 ν1 = 1/3 P12 P22 ν2 = 1/6 P13 P23 ν3 = 1/2 Pij is the mass that is transported from µi to νj . J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 11 / 30 Illustration in unidimensional case (d = 1) for histograms Two histograms µ = {1 3 , 2 3 } and ν = {1 3 , 1 6 , 1 2 } Example: µi is the production at plant i and νj is the storage capacity of storehouse j Matrix C defines the transport cost from i to j: C11 = 22 C21 = 62 C12 = 12 C22 = 52 C13 = 52 C23 = 12 The set of admissible matrices P is µ1 = 1/3 µ2 = 2/3 1/9 2/9 ν1 = 1/3 1/18 1/9 ν2 = 1/6 1/6 1/3 ν3 = 1/2 Pij is the mass that is transported from µi to νj . The transport cost is W(µ, ν) = ij Pij Cij = 15 J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 11 / 30 Illustration in unidimensional case (d = 1) for histograms Two histograms µ = {1 3 , 2 3 } and ν = {1 3 , 1 6 , 1 2 } Example: µi is the production at plant i and νj is the storage capacity of storehouse j Matrix C defines the transport cost from i to j: C11 = 22 C21 = 62 C12 = 12 C22 = 52 C13 = 52 C23 = 12 The set of admissible matrices P is µ1 = 1/3 µ2 = 2/3 1/3 0 ν1 = 1/3 0 1/6 ν2 = 1/6 0 1/2 ν3 = 1/2 Pij is the mass that is transported from µi to νj . The transport cost is W(µ, ν) = ij Pij Cij = 6 J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 12 / 30 Optimal transport solution illustration in 1D Histograms µ and ν (on uniform grid Ω) Optimal flow P J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 12 / 30 Optimal transport solution illustration in 1D Histograms µ and ν (on uniform grid Ω) Optimal flow P Remark: Masses can be splitted by transport J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 12 / 30 Optimal transport solution illustration in 1D Histograms µ and ν (on uniform grid Ω) Optimal flow P Remark: Masses can be splitted by transport J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Optimal transport framework 13 / 30 Optimal transport solution with linear programming method Discrete mass transportation problem for histograms can be solved with standard linear programming algorithms (simplex, interior point methods). Dedicated algorithms are more efficient for optimal assignment problem (e.g Hungarian and Auction algorithms in O(N3 )) Computation can be (slightly) accelerated when using other costs than L2 (e.g. L1 [Ling and Okada, 2007], Truncated L1 [Pele and Werman, 2008]) Advantages Complexity does not depend on feature dimension d Limitation Intractable for signal processing applications where N 103 (considering time complexity & memory limitation) J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 14 / 30 Part II Relaxation and regularization J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 15 / 30 Problem Statement Histogram specification exhibits strong limitations of optimal transport when dealing with image processing: • Color artifacts due to the exact specification (histograms can have very different shapes) • Irregularities: Transport map is not consistent in the color domain It does not take into account spatial information Histogram equalization + Filtering Proposed solution • Relax mass conservation constraint • Promote regular transport flows (color consistency) • Include spatial information (spatial consistency) J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 16 / 30 Constraint Relaxation Idea 1: Relaxation of mass conservation constraints [Ferradans et al., 2013] We consider the transport cost between normalized histograms µ and ν, µ(x) = M i=1 mi δXi (x), s.t. i mi = 1, mi ≥ 0 ∀i Relaxed Formulation : P ∈ arg min P∈Pκ(µ,ν)    P, C = 1 i N,1 j M Pi,j Ci,j    • with Ci,j = Xi − Yj 2 , where Xi ∈ Ω ⊂ Rd is bin centroid of µ for index i; • with new (linear) constraints: P(µ, ν) =    Pi,j 0, i,j Pi,j = 1, j Pi,j = mi , i Pi,j = nj    J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 16 / 30 Constraint Relaxation Idea 1: Relaxation of mass conservation constraints [Ferradans et al., 2013] We consider the transport cost between normalized histograms µ and ν, µ(x) = M i=1 mi δXi (x), s.t. i mi = 1, mi ≥ 0 ∀i Relaxed Formulation : P ∈ arg min P∈Pκ(µ,ν)    P, C = 1 i N,1 j M Pi,j Ci,j    • with Ci,j = Xi − Yj 2 , where Xi ∈ Ω ⊂ Rd is bin centroid of µ for index i; • with new (linear) constraints: Pκ(µ, ν) =    Pi,j 0, i,j Pi,j = 1, j Pi,j = mi , κnj ≤ i Pi,j ≤ Knj    where capacity parameters are such that κ ≤ 1 ≤ K: hard to tune J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 17 / 30 Proposed relaxed histogram matching Idea 2: Use capacity variables as unknowns {P , κ } ∈ arg min P∈Pκ(µ,ν) κ∈RN ,κ≥0, κ, n =1 P, C + ρ||κ − 1||1 where Pκ(µ, ν) =    Pi,j 0, i,j Pi,j = 1, j Pi,j = mi , i Pi,j = κj nj    ⇒ Still a linear program J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 18 / 30 Illustration of relaxed transport Optimal transport J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 18 / 30 Illustration of relaxed transport Relaxed optimal transport J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 19 / 30 Relaxed color transfer: comparison with raw OT Target Raw OT Relaxed OT Source No color or spatial regularization J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 20 / 30 Proposed relaxed and regularized histogram matching Idea 3: Add regularization prior {P , κ } ∈ arg min P∈Pκ(µ,ν) κ∈RN ,κ≥0, κ, n =1 P, C + ρ||κ − 1||1 + λR(P). where Pκ(µ, ν) =    Pi,j 0, i,j Pi,j = 1, j Pi,j = mi , i Pi,j = κj nj    and R(P) models some regularity priors ⇒ Still a linear program J. Rabin, S. Ferradans, N. Papadakis Adaptive color transfer with relaxed optimal transport Relaxation Regularization Conclusion 21 / 30 Regularity of transport map • Global regularization: Defining the regularity of the flow matrix is a NP-hard problem • Average transport map Instead, we use the Posterior mean to estimate a one-to-one transfer function T between µ and ν T(Xi )

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce the generalized Pareto distributions as a statistical model to describe thresholded edge-magnitude image filter results. Compared to the more commonWeibull or generalized extreme value distributions these distributions have at least two important advantages, the usage of the high threshold value assures that only the most important edge points enter the statistical analysis and the estimation is computationally more efficient since a much smaller number of data points have to be processed. The generalized Pareto distributions with a common threshold zero form a two-dimensional Riemann manifold with the metric given by the Fisher information matrix. We compute the Fisher matrix for shape parameters greater than -0.5 and show that the determinant of its inverse is a product of a polynomial in the shape parameter and the squared scale parameter. We apply this result by using the determinant as a sharpness function in an autofocus algorithm. We test the method on a large database of microscopy images with given ground truth focus results. We found that for a vast majority of the focus sequences the results are in the correct focal range. Cases where the algorithm fails are specimen with too few objects and sequences where contributions from different layers result in a multi-modal sharpness curve. Using the geometry of the manifold of generalized Pareto distributions more efficient autofocus algorithms can be constructed but these optimizations are not included here.
 
Generalized Pareto Distributions, Image Statistics and Autofocusing in Automated Microscopy

Generalized Pareto Distributions, Image Statistics and Autofocusing in Automated Microscopy Reiner Lenz Microscopy 34 slices changing focus along the optical axis Focal Sequence – First 4x16 images Focal Sequence – Next 4x16 images 4 5 Focal Sequence – Final 4x16 images Total Focus 6 Observations 7 • Auto-focus is easy • It is independent on image content (what is in the image) • It is independent of imaging method (how image is produced) • It is fast (‘real-time’) • It is local (which part of the image is in focus) • It is obviously useful in applications (microscopy, camera, …) • It is useful in understanding low-level vision processes • It is illustrates relation between scene-statistics and vision Processing Pipeline / Techniques 8 Filtering Thresholding Critical Points Group Representations Extreme Value Statistics Information Geometry Filtering Representations of dihedral Groups 9 Most images are defined on square grids The symmetry group of square grids is the dihedral group D(4) Consists of 8 elements: 4 rotations and 4 (rotation+reflection) For a 5x5 array choose six filter pairs resulting in a 6x2 vector at each pixel Fx = −1 1 −1 1 , Fy = −1 −1 1 1

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We study barycenters in the Wasserstein space Pp(E) of a locally compact geodesic space (E, d). In this framework, we define the barycenter of a measure ℙ on Pp(E) as its Fréchet mean. The paper establishes its existence and states consistency with respect to ℙ. We thus extends previous results on ℝ d , with conditions on ℙ or on the sequence converging to ℙ for consistency.
 
Barycenter in Wasserstein space existence and consistency

Barycenter in Wasserstein spaces: existence and consistency Thibaut Le Gouic and Jean-Michel Loubes* Institut de Math´ematiques de Marseille ´Ecole Centrale Marseille Institut Math´ematique de Toulouse* October 29th 2015 1 / 23 Barycenter in Wasserstein spaces Barycenter The barycenter of a set {xi }1≤i≤J of Rd for J points endowed with weights (λi )1≤i≤J is defined as 1≤i≤J λi xi . It is characterized by being the minimizer of x → 1≤i≤J λi x − xi 2 . 2 / 23 Barycenter in Wasserstein spaces Barycenter The barycenter of a set {xi }1≤i≤J of Rd for J points endowed with weights (λi )1≤i≤J is defined as 1≤i≤J λi xi . It is characterized by being the minimizer of x → 1≤i≤J λi x − xi 2 . Replace (Rd , . ) by a metric space (E, d), and minimize x → 1≤i≤J λi d(x, xi )2 . 2 / 23 Barycenter in Wasserstein spaces Barycenter Likewise, given a random variable/vector of law µ on Rd , its expectation EX is characterized by being the minimizer of x → E X − x 2 . 3 / 23 Barycenter in Wasserstein spaces Barycenter Likewise, given a random variable/vector of law µ on Rd , its expectation EX is characterized by being the minimizer of x → E X − x 2 . → extension to a metric space (it summarizes the information staying in a geodesic space) 3 / 23 Barycenter in Wasserstein spaces Barycenter Definition (p-barycenter) Given a probability measure µ on a geodesic space (E, d), the set arg min x ∈ E; d(x, y)p dµ(y) , is called the set of p-barycenters of µ. 4 / 23 Barycenter in Wasserstein spaces Barycenter Definition (p-barycenter) Given a probability measure µ on a geodesic space (E, d), the set arg min x ∈ E; d(x, y)p dµ(y) , is called the set of p-barycenters of µ. Existence ? 4 / 23 1 Geodesic space 2 Wasserstein space 3 Applications 5 / 23 Barycenter in Wasserstein spaces Geodesic space Definition (Geodesic space) A complete metric space (E, d) is said to be geodesic if for all x, y ∈ E, there exists z ∈ E such that 1 2 d(x, y) = d(x, z) = d(z, y). 6 / 23 Barycenter in Wasserstein spaces Geodesic space Definition (Geodesic space) A complete metric space (E, d) is said to be geodesic if for all x, y ∈ E, there exists z ∈ E such that 1 2 d(x, y) = d(x, z) = d(z, y). Include many spaces (vectorial normed spaces, compact manifolds, ...), 6 / 23 Barycenter in Wasserstein spaces Geodesic space Proposition (Existence) The p-barycenter of any probability measure on a locally compact geodesic space, with finite moments of order p, exists. 7 / 23 Barycenter in Wasserstein spaces Geodesic space Proposition (Existence) The p-barycenter of any probability measure on a locally compact geodesic space, with finite moments of order p, exists. Not unique e.g. the sphere Non positively curved space → unique barycenter, 1-Lipschitz on 2-Wasserstein space. 7 / 23 1 Geodesic space 2 Wasserstein space 3 Applications 8 / 23 Barycenter in Wasserstein spaces Wasserstein metric Definition (Wasserstein metric) Let µ and ν be two probability measures on a metric space (E, d) and p ≥ 1. The p-Wasserstein distance between µ and ν is defined as W p p (µ, ν) = inf π∈Γ(µ,ν) dE (x, y)p dπ(x, y), where Γ(µ, ν) is the set of all probability measures on E × E with marginals µ and ν. 9 / 23 Barycenter in Wasserstein spaces Wasserstein metric Definition (Wasserstein metric) Let µ and ν be two probability measures on a metric space (E, d) and p ≥ 1. The p-Wasserstein distance between µ and ν is defined as W p p (µ, ν) = inf π∈Γ(µ,ν) dE (x, y)p dπ(x, y), where Γ(µ, ν) is the set of all probability measures on E × E with marginals µ and ν. Defined for any measure for which moments of order p are finite : Ed(X, x0)p < ∞ (denote this set Pp(E)), It is a metric on Pp(E) ; (Pp(E), Wp) is called the Wasserstein space, The topology of this metric is the weak convergence topology and convergence of moments of order p. 9 / 23 Barycenter in Wasserstein spaces Wasserstein metric The Wasserstein space of a complete geodesic space is a complete geodesic space. (Pp(E), Wp) is locally compact ⇔ (E, d) is compact. (E, d) ⊂ (Pp(E), Wp) isometrically. Existence of the barycenter on (Pp(E), Wp) ? 10 / 23 Barycenter in Wasserstein spaces Measurable barycenter application Definition (Measurable barycenter application) Let (E, d) be a geodesic space. (E, d) is said to admit measurable barycenter applications if for any J ≥ 1 and any weights (λj )1≤j≤J, there exists a measurable application T from EJ to E such that for all (x1, ..., xJ) ∈ EJ, min x∈E J j=1 λj d(x, xj )p = J j=1 λj d(T(x1, ..., xJ), xj )p . 11 / 23 Barycenter in Wasserstein spaces Measurable barycenter application Definition (Measurable barycenter application) Let (E, d) be a geodesic space. (E, d) is said to admit measurable barycenter applications if for any J ≥ 1 and any weights (λj )1≤j≤J, there exists a measurable application T from EJ to E such that for all (x1, ..., xJ) ∈ EJ, min x∈E J j=1 λj d(x, xj )p = J j=1 λj d(T(x1, ..., xJ), xj )p . Locally compact geodesic spaces admit measurable barycenter applications. 11 / 23 Barycenter in Wasserstein spaces Existence of barycenter Theorem (Existence of barycenter) Let (E, d) be a geodesic space that admits measurable barycenter applications. Then any probability measure P on (Pp(E), Wp) has a barycenter. 12 / 23 Barycenter in Wasserstein spaces Existence of barycenter Theorem (Existence of barycenter) Let (E, d) be a geodesic space that admits measurable barycenter applications. Then any probability measure P on (Pp(E), Wp) has a barycenter. Barycenter is not unique e.g. : E = Rd with P = 1 2δµ1 + 1 2δµ2 , µ1 = 1 2δ(−1,−1) + 1 2δ(1,1) and µ2 = 1 2δ(1,−1) + δ(−1,1) 12 / 23 Barycenter in Wasserstein spaces Existence of barycenter Theorem (Existence of barycenter) Let (E, d) be a geodesic space that admits measurable barycenter applications. Then any probability measure P on (Pp(E), Wp) has a barycenter. Barycenter is not unique e.g. : E = Rd with P = 1 2δµ1 + 1 2δµ2 , µ1 = 1 2δ(−1,−1) + 1 2δ(1,1) and µ2 = 1 2δ(1,−1) + δ(−1,1) Consistency of the barycenter ? 12 / 23 Barycenter in Wasserstein spaces 3 steps for existence 1 Multimarginal problem 2 Weak consistency 3 Approximation by finitely supported measures 13 / 23 Barycenter in Wasserstein spaces Push forward Definition (Push forward) Given a measure ν on E and an measurable application T : E → (F, F), the push forward of ν by T is given by T#ν(A) = ν T−1 (A) , ∀A ∈ F. Probabilist version : X is a r.v. on (Ω, A, P), then PX = X#P. 14 / 23 Barycenter in Wasserstein spaces Multimarginal problem Theorem (Barycenter and multi-marginal problem [Agueh and Carlier, 2011]) Let (E, d) be a complete separable geodesic space, p ≥ 1 and J ∈ N∗. Given (µi )1≤i≤J ∈ Pp(E)J and weights (λi )1≤i≤J, there exists a measure γ ∈ Γ(µ1, ..., µJ) minimizing ˆγ → inf x∈E 1≤i≤J λi d(xi , x)p dˆγ(x1, ..., xJ). If (E, d) admits a measurable barycenter application T : EJ → E then the measure ν = T#γ is a barycenter of (µi )1≤i≤J If T is unique, ν is of the form ν = T#γ. 15 / 23 Barycenter in Wasserstein spaces Weak consistency Theorem (Weak consistency of the barycenter) Let (E, d) be a geodesic space that admits measurable barycenter. Take (Pj )j≥1 ⊂ Pp(E) converging to P ∈ Pp(E). Take any barycenter µj of Pj . Then the sequence (µj )j≥1 is (weakly) tight and any limit point is a barycenter of P. 16 / 23 Barycenter in Wasserstein spaces Approximation by finitely supported measure Proposition (Approximation by finitely supported measure) For any measure P on Pp(E) there exists a sequence of finitely supported measures (Pj )j≥1 ⊂ Pp(E) such that Wp(Pj , P) → 0 as j → ∞. 17 / 23 Barycenter in Wasserstein spaces 3 steps for existence 1 Multimarginal problem 2 Weak consistency 3 Approximation by finitely supported measures 18 / 23 Barycenter in Wasserstein spaces 3 steps for existence 1 Multimarginal problem → existence of barycenter for P finitely supported. 2 Weak consistency 3 Approximation by finitely supported measures 18 / 23 Barycenter in Wasserstein spaces 3 steps for existence 1 Multimarginal problem → existence of barycenter for P finitely supported. 2 Weak consistency → existence of barycenter for probabilities that can be approximated by measures with barycenters. 3 Approximation by finitely supported measures 18 / 23 Barycenter in Wasserstein spaces 3 steps for existence 1 Multimarginal problem → existence of barycenter for P finitely supported. 2 Weak consistency → existence of barycenter for probabilities that can be approximated by measures with barycenters. 3 Approximation by finitely supported measures → any probability can be approximated by a finitely supported probability measure. 18 / 23 Barycenter in Wasserstein spaces Consistency of the barycenter Theorem (Consistency of the barycenter) Let (E, d) be a geodesic space that admits measurable barycenter. Take (Pj )j≥1 ⊂ Pp(E) and P ∈ Pp(E). Take any barycenter µj of Pj . Then the sequence (µj )j≥1 is totally bounded in (Pp(E), Wp) and any limit point is a barycenter of P. 19 / 23 Barycenter in Wasserstein spaces Consistency of the barycenter Theorem (Consistency of the barycenter) Let (E, d) be a geodesic space that admits measurable barycenter. Take (Pj )j≥1 ⊂ Pp(E) and P ∈ Pp(E). Take any barycenter µj of Pj . Then the sequence (µj )j≥1 is totally bounded in (Pp(E), Wp) and any limit point is a barycenter of P. Imply continuity of barycenter when barycenter are unique. No rate of convergence (barycenter Lipschitz on (E, d) Lipschitz on Pp(E)). Imply compactness of the set of barycenters. 19 / 23 1 Geodesic space 2 Wasserstein space 3 Applications 20 / 23 Barycenter in Wasserstein spaces Statistical application : improvement of measures accuracy Take (µn i )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J. Set µn B the barycenter of (µn i )1≤j≤J. Then, as n → ∞, µn B → µB. 21 / 23 Barycenter in Wasserstein spaces Statistical application : improvement of measures accuracy Take (µn i )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J. Set µn B the barycenter of (µn i )1≤j≤J. Then, as n → ∞, µn B → µB. Texture mixing [Rabin et al., 2011] 21 / 23 Barycenter in Wasserstein spaces Statistical application : growing number of measures Take (µn)n≥1 such that 1 n n i=1 µi → P. Set µn B the barycenter of 1 n n i=1 δµi . Then, as n → ∞, µn B → µB 22 / 23 Barycenter in Wasserstein spaces Statistical application : growing number of measures Take (µn)n≥1 such that 1 n n i=1 µi → P. Set µn B the barycenter of 1 n n i=1 δµi . Then, as n → ∞, µn B → µB Average of template deformation [Bigot and Klein, 2012],[Agull´o-Antol´ın et al., 2015] 22 / 23 Agueh, M. and Carlier, G. (2011). Barycenters in the wasserstein space. SIAM Journal on Mathematical Analysis, 43(2) :904–924. Agull´o-Antol´ın, M., Cuesta-Albertos, J. A., Lescornel, H., and Loubes, J.-M. (2015). A parametric registration model for warped distributions with Wasserstein’s distance. J. Multivariate Anal., 135 :117–130. Bigot, J. and Klein, T. (2012). Consistent estimation of a population barycenter in the Wasserstein space. ArXiv e-prints. Rabin, J., Peyr´e, G., Delon, J., and Bernot, M. (2011). Wasserstein Barycenter and its Application to Texture Mixing. SSVM’11, pages 435–446. 23 / 23

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Univariate L-moments are expressed as projections of the quantile function onto an orthogonal basis of univariate polynomials. We present multivariate versions of L-moments expressed as collections of orthogonal projections of a multivariate quantile function on a basis of multivariate polynomials. We propose to consider quantile functions defined as transports from the uniform distribution on [0; 1] d onto the distribution of interest and present some properties of the subsequent L-moments. The properties of estimated L-moments are illustrated for heavy-tailed distributions.
 
Multivariate L-moments based on transports

Multivariate L-Moments Based on Transports Alexis Decurninge Huawei Technologies Geometric Science of Information October 29th, 2015 Outline 1 L-moments Definition of L-moments 2 Quantiles and multivariate L-moments Definitions and properties Rosenblatt quantiles and L-moments Monotone quantiles and L-moments Estimation of L-moments Numerical applications Definition of L-moments L-moments of a distribution : if X1,...,Xr are real random variables with common cumulative distribution function F λr = 1 r r−1 k=0 (−1)k r − 1 k E[Xr−k:r ] with X1:r ≤ X2:r ≤ ... ≤ Xr:r : order statistics λ1 = E[X] : localization λ2 = E[X2:2 − X1:2] : dispersion τ3 = λ3 λ2 = E[X3:3−2X2:3+X1:3] E[X2:2−X1:2] : asymmetry τ4 = λ4 λ2 = E[X4:4−3X3:4+3X2:4−X1:4] E[X2:2−X1:2] : kurtosis Existence if |x|dF(x) < ∞ Characterization of L-moments L-moments are projections of the quantile function on an orthogonal basis λr = 1 0 F−1 (t)Lr (t)dt F−1 generalized inverse of F F−1 (t) = inf {x ∈ R such that F(x) ≥ t} Lr Legendre polynomial (orthogonal basis in L2([0, 1])) Lr (t) = r k=0 (−1)k r k 2 tr−k (1 − t)k L-moments completely characterize a distribution F−1 (t) = ∞ r=1 (2r + 1)λr Lr (t) Definition of L-moments (discrete distributions) L-moments for a multinomial distribution of support x1 ≤ x2 ≤ ... ≤ xn and weights π1, ..., πn ( n i=1 πi = 1) λr = n i=1 w (r) i xi = n i=1 Kr i a=1 πa − Kr i−1 a=1 πa xi with Kr the respective primitive of Lr : Kr = Lr Empirical L-moments U-statistics : mean of all subsequences of size r without replacement ˆλr = 1 n r 1≤i1<··· 1, many multivariate quantiles has been proposed Quantiles coming from depth functions (Tukey, Zuo and Serfling) Spatial Quantiles (Chaudhuri) Generalized quantile processes (Einmahl and Mason) Quantiles as quadratic optimal transports (Galichon and Henry) Multivariate quantiles We define a quantile related to a probability measure ν as a transport from the uniform measure unif on [0; 1]d into ν. Definition Let U and X are random variables with respective measure µ and ν. T is a transport from µ into ν if T(U) = X (we note T#µ = ν). Example of transport families Optimal/monotone transports Rosenblatt transports Moser transports ... Multivariate L-moments X r.v. of interest with related measure ν such that E[ X ] < ∞. Definition Q : [0; 1]d → Rd a transport from unif in [0; 1]d into ν. L-moment λα of multi-index α = (i1, ..., id ) associated to Q : λα := [0;1]d Q(t1, ..., td )Lα(t1, ..., td )dt1...dtd ∈ Rd . with Lα(t1, ..., td ) = d k=1 Lik (tk ). ⇒ Definition compatible with the univariate case : the univariate quantile is a transport from the uniform measure on [0; 1] into the measure of interest (F−1(U) d = X) Multivariate L-moments L-moment of degree 1 λ1(= λ1,1,...,1) = [0;1]d Q(t1, ..., td )dt1...dtd = E[X]. L-moments of degree 2 can be regrouped in a matrix Λ2 = [0;1]d Qi (t1, ..., td )(2tj − 1)dt1...dtd 1≤i,j≤d . with Q(t1, ..., td ) =    Q1(t1, ..., td ) ... Qd (t1, ..., td )    Multivariate L-moments : characterization Proposition Assume that two quantiles Q and Q have same multivariate L-moments (λα)α∈Nd ∗ then Q = Q . Moreover Q(t1, ..., td ) = (i1,...,id )∈Nd ∗ d k=1 (2ik + 1) L(i1,...,id )(t1, ..., td )λ(i1,...,id ) A one-to-one correspondence between quantiles and random vectors is sufficient to guarantee the characteriation of a distribution by its L-moments Monotone transport Proposition Let µ, ν be two probability measures on Rd , such that µ does not give mass to "small sets". Then, there is exactly one measurable map T such that T#µ = ν and T = ϕ for some convex function ϕ. These transports, gradient of convex functions, are called monotone transports by analogy with the univariate case If defined, the transport is solution to the quadratic optimal transport ϕ∗ = arg inf T:T#µ=ν Rd u − T(u) 2 dµ(u) Example : monotone quantile for a random vector with independent marginals X = (X1, ..., Xd ) random vector with independent marginals. The monotone quantile of X is the collection of its marginals quantiles Q(t1, ..., td ) =    Q1(t1) ... Qd (td )    =    φ1(t1) ... φd (td )    Indeed, if φ(t1, ..., td ) = φ1(t1) + · · · + φd (td ) φ = Q The associated L-moments are then    λ1,...,1 = E[X] λ1...1,r,1,...,1 = (0, . . . , 0, λr (Xi ), 0, . . . , 0)T λα = 0 otherwise Monotone transport from the standard Gaussian distribution QN the monotone distribution from unif onto the standard Gaussian distribution N(0, Id ) defined by QN (t1, .., td ) =    N−1(t1) ... N−1(td )    T0 the monotone transport from the standard Gaussian distribution from ν (rotation equivariant) ([0; 1]d , du) QN → (Rd , dN) T0 → (Rd , dν) ⇒ Q = T0 ◦ QN is then a quantile. Monotone transport from the standard Gaussian distribution : Gaussian distribution with a random covariance For x ∈ Rd , A positive symmetric matrix ϕ(x) = m.x + 1 2 xT Ax T0(x) = ϕ(x) = m + Ax ⇒ T0(Nd (0, Id )) d = Nd (m, AAT ). The L-moments of a Gaussian with mean m and covariance AAT are : λα = m if α = (1, ..., 1) Aλα(Nd (0, Id )) otherwise In particular, the L-moments of degree 2 : Λ2 = (λ2,1...,1 . . . λ1,...,1,2) = 1 √ π A. Monotone transport from the standard Gaussian distribution : quasi-elliptic distribution For x ∈ Rd , u convex ϕ(x) = m.x + 1 2u(xT Ax) T0(x) = m + u (xT Ax)Ax. The L-moments of this distribution are then λα = m if α = (1, ..., 1) A Rd u (xT Ax)Lα(N(x))xdN(x) otherwise Si A = Id , T0(X) follows a spherical distribution Monotone transport from the standard Gaussian distribution : quasi-elliptic distribution Figure: Samples with T0(x) = − Ax xT Ax and A = Id (left) or A = 1 0.8 0.8 1 (right) Estimation : general case x1, ..., xn ∈ Rd an iid sample issued from a same r.v. X with measure ν of quantile Q. Empirical measure : νn = n i=1 δxi Estimation of Q : Qn corresponding transport from unif onto νn Empirical L-moment ˆλα = [0;1]d Qn(t)Lα(t)dt Estimation of a monotone transport Monotone transport of an absolutely continuous measure µ (of support Ω) onto the discrete measure νn Power diagrams of (x1, w1), ..., (xn, wn) 1≤i≤n u ∈ Ω s.t. u − xi 2 + wi ≤ u − xj 2 + wj ∀j = i Piecewise linear functions (PL) for any u ∈ Ω, φh(u) = max 1≤i≤n {u.xi + hi } . Areas of gradient of a PL function = power diagrams with weights wi = xi 2 + 2hi Wi (h) = {u ∈ Ω s.t. φh(u) = xi }. Estimation of a monotone transport Gradient of PL functions ⇒ Monotone transport Theorem φh is a monotone transport from µ onto νn for some h = h∗ , unique up to constant (b, ..., b), verifying h∗ = arg min h∈Rn Ω φh(u)dµ − 1 n n i=1 hi . For any 1 ≤ i ≤ n Wi (h∗ ) dµ(x) = 1 n Estimation of a monotone transport : Newton descent Computation of h∗ : minimization of E(h) = Ω φh(u)dµ − 1 n n i=1 hi Gradient descent : while | E(ht)| > η ht+1 = ht − γ( 2 E(ht))−1 E(ht) t ← t + 1 end while However : delicate Hessian computation Estimation of a monotone transport : sample in [0; 1]2 Voronoi cells of the sample Optimal power diagram Figure: Monotone transport for a sample of size 10 onto the uniform distribution on [0; 1]2 Estimation of a monotone transport : Gaussian sample Voronoi cells of the sample Optimal power diagram Figure: Monotone transport for a sample of size 100 onto the standard Gaussian Estimation of a monotone transport : consistency T transport from µ onto ν Tn transport from µ onto νn Theorem If ν verifies x dν(x) < +∞, T − Tn 1 = Rd T(x) − Tn(x) dµ(x) a.s. → 0. Q, Qn monotone quantiles having µ as a reference measure Theorem For α ∈ Nd ∗ . ˆλα = [0;1]d Qn(u)Lα(u)du a.s. → λα = [0;1]d Q(u)Lα(u)du Numerical applications We simulate a linear combination of independent vectors in R2 X = P σ1Z1 σ2Z2 with P a rotation matrix P = 1 √ 2 −1 1 1 1 Z1, Z2 are drawn from a symmetrical Weibull distribution Wν of scale parameter equal to 1 and shape parameter ν = 0.5. Numerical applications We perform N = 100 estimations of the second L-moment matrix Λ2 and the covariance matrix Σ for a sample of size n = 30 or 100. The mean of the different estimates The median of the different estimates The coefficient of variation of the estimates ˆθ1, ..., ˆθN (for an arbitrary parameter θ) CV = N i=1 ˆθi − 1 N N i=1 ˆθi 2 1/2 1 N N i=1 ˆθi n = 30 n = 100 Parameter True Value Mean Median CV Mean Median CV Λ2,11 0.38 0.28 0.27 0.30 0.38 0.37 0.18 Λ2,12 0.19 0.14 0.13 0.65 0.20 0.20 0.33 Σ11 0.69 0.70 0.48 1.23 0.69 0.59 0.55 Σ12 0.55 0.55 0.29 1.62 0.55 0.47 0.67 Thank you for your attention !

Probability Density Estimation (chaired by Jesús Angulo, S. Said)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The two main techniques of probability density estimation on symmetric spaces are reviewed in the hyperbolic case. For computational reasons we chose to focus on the kernel density estimation and we provide the expression of Pelletier estimator on hyperbolic space. The method is applied to density estimation of reflection coefficients derived from radar observations.
 
Probability density estimation on the hyperbolic space applied to radar processing

Probability density estimation on the hyperbolic space applied to radar processing October 28, 2015 Emmanuel Chevalliera, Frédéric Barbarescob, Jesús Anguloa a CMM-Centre de Morphologie Mathématique, MINES ParisTech; France b Thales Air Systems, Surface Radar Domain, Technical Directorate, Advanced Developments Department, 91470 Limours, France emmanuel.chevallier@mines-paristech.fr 1/20 Probability density estimation on the hyperbolic space Three techniques of non-parametric probability density estimation: histograms kernels orthogonal series The Hyperbolic space of dimension 2 Histograms, kernels and orthogonal series in the hyperbolic space Density estimation of radar data in the Poincaré disk 2/20 Probability density estimation on the hyperbolic space Three techniques of non-parametric probability density estimation Histograms: partition of the space into a set of bins counting the number of samples per bins 3/20 Probability density estimation on the hyperbolic space Kernels: a kernel is placed over each sample the density is evaluated by summing the kernels 4/20 Probability density estimation on the hyperbolic space Orthogonal series: the true density f is studied through the estimation of the scalar products between f and an orthonormal basis of real functions. Let f be the true density f , g = f gdµ let {ei } is a orthogonal Hilbert basis of real functions f = ∞ i=−∞ f , ei ei , since fI , ei = fI ei dµ = E (ei (I)) ≈ 1 n n j=1 ei (I(pj )) we can estimate f by: f ≈ N i=−N  1 n n j=1 ei (I(pj ))   ei = ˆf . 5/20 Probability density estimation on the hyperbolic space Homogeneity and isotropy consideration non homogeneous bins non istropic bins Absence of prior on f : the estimation should be as homogeneous and isotropic as possible. → choice of bins, kernels or orthogonal basis 6/20 Probability density estimation on the hyperbolic space Remark on homogeneity and isotropy Figure: Random variable X ∈ Circle. The underlying space is not homogeneous and not isotropic, the density estimation can not consider every points and directions in an equivalent way. 7/20 Probability density estimation on the hyperbolic space The 2 dimensional hyperbolic space and the Poincaré disk The only space of constant negative sectional curvature The Poincaré disk is a model of hyperbolic geometry ds2 D = 4 dx2 + dy2 (1 − x2 − y2)2 Homogeneous and isotropic 8/20 Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: histograms A good tilling: homogeneous and isotropic There are many polygonal tilings: There is no homotetic transformations for all λ ∈ R Problem: not always possible to scale the tiling to the studied density 9/20 Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: orthogonal series Standard choice of basis: eigenfunctions of the Laplacian operator ∆ In Rn: (ei ) = Fourier basis → characteristic function density estimator. f , [a, b] → R, f = ∞ i=−∞ f , ei ei , f , R → R, f = ∞ ω=−∞ f , eω eωdω, Compact case: estimation of a sum Non compact case: estimation of an integral 10/20 Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: orthogonal series On the Poincaré disk D, solutions of ∆f = λf are known for f , D → R but not for f , D ⊂ D → R with D compact Computational problem: the estimation involves an integral, even for bounded support functions 11/20 Probability density estimation on the hyperbolic space Kernel density estimation on Riemannian manifolds K : R+ → R+ such that: i) Rd K(||x||)dx = 1, ii) Rd xK(||x||)dx = 0, iii) K(x > 1) = 0, sup(K(x)) = K(0). Euclidean kernel estimator: ˆfk = 1 k i 1 rd K ||x, xi || r Riemannian case: K ||x − xi || r → K d(x − xi ) r 12/20 Probability density estimation on the hyperbolic space Figure: Volume change θxi induced by the exponential map θx : volume change (TM, Lebesgue) expx −→ (M, vol) Kernel density estimator proposed by Pelletier: ˆfk = 1 k i 1 rd 1 θxi (x) K d(x, xi ) r 13/20 Probability density estimation on the hyperbolic space θx in the hyperbolic space θx can easily be computed in hyperbolic geometry. Polar coordinates at p ∈ D: at p ∈ D, if the geodesic of angle α of length r leads to q, (r, α) ↔ q In polar coordinates: ds2 = dr2 + sinh(r)2 dα2 thus dvolpolar = sinh(r)drdα and θp((r, θ)) = sinh(r) r 14/20 Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: kernels Kernel density estimator: ˆfk = 1 k i 1 rd d(x, xi ) sinh(d(x, xi )) K d(x, xi ) r Formulation as a convolution Fourier−Helgason ←→ 0rthogonal series Reasonable computational cost 15/20 Probability density estimation on the hyperbolic space Radar data Succession of input vector z = (z0, .., zn−1) ∈ Cn z: background or target? Assumptions: z = (z0, .., zn−1) is a centered Gaussian process. Centered → dened by its covariance Rn = E[ZZ∗ ] =       r0 r1 . rn−1 r1 r0 r1 . rn−2 . . . r1 rn−1 . r1 r0       Rn ∈ T n: Toeplitz (additional stationary assumption) and SPD matrix 16/20 Probability density estimation on the hyperbolic space Auto regressive model Auto regressive model of order k: ˆzl = − k j=1 ak j zl−j k-th reection coecient : µk = ak k Dieomorphism ϕ: ϕ : T n → R∗ + × Dn−1 , Rn → (P0, µ1, · · · , µn−1) (z0, ..., zn−1) ↔ (P0, µ1, · · · , µn−1) 17/20 Probability density estimation on the hyperbolic space Geometry on T n ϕ : T n → R∗ + × Dn−1 , Rn → (P0, µ1, · · · , µn−1) metric on T n: product metric on R∗ + × Dn−1 Multiple acquisitions of an identical background: distribution of the µk? Potential use: identication of a non-background objects 18/20 Probability density estimation on the hyperbolic space Application of density estimation to radar data µ1, N = 0.007 µ2, N = 1.61 µ3, N = 14.86 µ1, N = 0.18 µ2, N = 2.13 µ3, N = 4.81 Figure: First row: ground, second row: Rain 19/20 Probability density estimation on the hyperbolic space Conclusion The density estimation on the hyperbolic space is not a fundamentally dicult problem Easiest solution: kernels Future works: computation of the volume change in kernels for Riemannian manifolds deepen the application for radar signals Thank you for your attention 20/20 Probability density estimation on the hyperbolic space

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We address here the problem of perceptual colour histograms. The Riemannian structure of perceptual distances is measured through standards sets of ellipses, such as Macadam ellipses. We propose an approach based on local Euclidean approximations that enables to take into account the Riemannian structure of perceptual distances, without introducing computational complexity during the construction of the histogram.
 
Histograms of images valued in the manifold of colours endowed with perceptual metrics

Color Histograms using the perceptual metric October 28, 2015 Emmanuel Chevalliera, Ivar Farupb, Jesús Anguloa a CMM-Centre de Morphologie Mathématique, MINES ParisTech; France b Gjovik University College; France emmanuel.chevallier@mines-paristech.fr 1/16 Color Histograms using the perceptual metric Plan of the presentation Formalization of the notion of image histogram Perceptual metric and Macadam ellipses Density estimation in the space of colors 2/16 Color Histograms using the perceptual metric Image histogram : formalization I : Ω → V p → I(p) Ω: support space of pixels: rectangle/parallelepiped. V: the value space (Ω, µΩ), (V , µV ), µΩ and µV are induced by the choosen geometries on Ω and V . Transport of µΩ on V : I∗(µΩ) Image histogram: estimation of f = dI∗(µΩ) dµV 3/16 Color Histograms using the perceptual metric pixels: p ∈ Ω, uniformly distributed with respect to µΩ {I(p), p a pixel }: set of independent draws of the "random variable" I Estimation of f = dI∗(µΩ) dµV from {I(p), p a pixel }: → standard problem of probability density estimation 4/16 Color Histograms using the perceptual metric Perceptual color histograms I : Ω → (M = colors, gperceptual ) p → I(p) Assumption: the perceptual distances between colors is induced by a Riemannian metric The manifold of colors was one of the rst example of Riemannian manifold, suggested by Riemann 5/16 Color Histograms using the perceptual metric Macadam ellipses: just noticeable dierences Chromaticity diagram (constant luminance): Ellipses: elementary unit balls → local L2 metric 6/16 Color Histograms using the perceptual metric Lab space The Euclidean metric of the Lab parametrization is supposed to be more perceptual than other parametrizations Figure: Macadam ellipses in the ab plan However, the ellipses are clearly not balls 7/16 Color Histograms using the perceptual metric Modiction of the density estimator Density → local notion. No need of knowing long geodesics Small distances → local approximation by an Euclidean metric Notations: dR: Perceptual metric ||.||Lab: Canonical Euclidean metric of Lab ||.||c: Euclidean metric on Lab induced by the ellipse at c Small distances around c: ||.||c is "better" than ||.||Lab 8/16 Color Histograms using the perceptual metric Modiction of the density estimator Standard kernel estimator: ˆf (x) = 1 k pi ∈{pixels} 1 r2 K ||x − I(pi )||Lab r Possible modication K ||x − I(pi )||Lab r → K ||x − I(pi )||I(pi ) r where ||.||I(pi ) is an Euclidean distance dened by the interpolated ellipse at I(pi ). 9/16 Color Histograms using the perceptual metric Generally, at c a color: limx→c ||x − c||c dR(x, c) = 1 = limx→c ||x − c||Lab dR(x, c) Thus, ∃A > 0 such that, ∀R > 0, ∃x ∈ BLab(c, R), A < ||x − c|| dR(x, c) − 1 . while ∃Rc = Rc,A such that, ∀x ∈ BLab(c, Rc), ||x − c||c dR(x, c) − 1 < A. hence supBLab(c,Rc ) ||x − c||c dR(x, c) − 1 < A < supBLab(c,Rc ) ||x − c|| dR(x, c) − 1 . 10/16 Color Histograms using the perceptual metric When the scaling factor r is small enough: r ≤ Rc and Bc(c, r) ⊂ BLab(c, Rc) x ∈ B(c, Rc), K ||x−c||c r better than K ||x−c||Lab r . x /∈ B(c, Rc), K ||x−c||c r = K ||x−c||Lab r = 0 11/16 Color Histograms using the perceptual metric Interpolation of a set of local metric: a deep question... What is a good interpolation? Interpolating a function: minimizing variation with respect to a metric. Interpolating a metric? No intrinsic method: depends on a choice of parametrization. Subject of the next study 12/16 Color Histograms using the perceptual metric Barycentric interpolation in the Lab space 13/16 Color Histograms using the perceptual metric Volume change (a) (b) Figure: (a): color photography (b): Zoom of the density change adapted to colours present in the photography 14/16 Color Histograms using the perceptual metric experimental results (a) (b) (c) Figure: The canonical Euclidean metric of the ab projective plane in (a), the canonical metric followed by a division by the local density of the perceptual metric in (b) and the modied kernel formula in (c). 15/16 Color Histograms using the perceptual metric Conclusion A simple observation which improve the consistency of the histogram without requiring additional computational costs Future works will focus on: The interpolation of the ellipses The construction of the geodesics and their applications Thank you for your attention 16/16 Color Histograms using the perceptual metric

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Air traffic management (ATM) aims at providing companies with a safe and ideally optimal aircraft trajectory planning. Air traffic controllers act on flight paths in such a way that no pair of aircraft come closer than the regulatory separation norm. With the increase of traffic, it is expected that the system will reach its limits in a near future: a paradigm change in ATM is planned with the introduction of trajectory based operations. This paper investigate a mean of producing realistic air routes from the output of an automated trajectory design tool. For that purpose, an entropy associated with a system of curves is defined and a mean of iteratively minimizing it is presented. The network produced is suitable for use in a semi-automated ATM system with human in the loop.
 
Entropy minimizing curves with application to automated flight path design

Entropy minimizing curves Application to automated ight path design S. Puechmorel ENAC 29th October 2015 Problem Statement Flight path planning • Trac is expected to double by 2050; • In future systems, trajectories will be negotiated and optimized well before the ights start; • But humans will be in the loop : generated ight plans must comply with operational constraints; Muti-agent systems • A promising approach to address the planning problem; • Does not end up with a human friendly trac! • Idea : start with the proposed solution and rebuild a route network from it. A curve optimization problem An entropy criterion • Route networks and currently made of straight segments connecting beacons; • May be viewed as a maximally concentrated spatial density distribution; • Minimizing the entropy with such a density will intuitively yield a ight path system close to what is expected. Problem modeling Density associated with a curve system • A classical measure : counting the number of aircraft in each bin of a spatial grid and averaging over time; • Suers from a severe aw : aircraft with low velocity will over-contribute; • May be corrected by enforcing invariance under re-parametrization of curves; • Combined with a non-parametric kernel estimate to yield : ˜d : x → N i=1 1 0 K ( x − γi (t) ) γi (t) dt N i=1 Ω 1 0 K ( x − γi (t) ) γi (t) dtdx (1) Problem modeling II The entropy criterion • Kernel K is normalized over the domain Ω so as to have a unit integral; • Density is directly related to lengths li , i = 1. . . n of curves γi , i = 1. . . N : ˜d : x → N i=1 1 0 K ( x − γi (t) ) γi (t) dt N i=1 li (2) • Associated entropy is : E(γ1, . . . , γN) = − Ω ˜d(x) log ˜d(x) dx (3) Optimal curve displacement eld Entropy variation • ˜d has integral 1 over the domain Ω ; • It implies that : − ∂ ∂γj E(γ1, . . . , γN)( ) = Ω ∂ ˜d(x) ∂γj ( ) log ˜d(x) dx (4) where is an admissible variation of curve γi . • The denominator in the expression of ˜d has derivative : [0,1] γj (t) γj (t) , (t) dt = − [0,1] γj (t) γj (t) N , dt (5) Optimal curve displacement eld Entropy variation • The numerator of ˜d has derivative : [0,1] γj (t) − x γj (t) − x N , K ( γj (t) − x ) γj (t) dt (6) − [0,1] γj (t) γj (t) N , K ( γj (t) − x ) dt (7) Optimal curve displacement eld II Normal move • Final expression yield a displacement eld normal to the curve : Ω γj (t) − x γj (t) − x N K ( γj (t) − x ) log ˜d(x)dx γj (t) (8) − Ω K ( γj (t) − x ) log ˜d(x))dx γj (t) γj (t) N (9) + Ω ˜d(x) log( ˜d(x))dx γj (t) γj (t) N n i=1 li (10) Implementation A gradient algorithm • The move is based on a tangent vector in the tangent space to Imm([0, 1], R3)/Di+ ([0, 1) ; • It is not directly implementable on a computer; • A simple, landmark based approach with evenly spaced points was used; • A compactly supported kernel (epanechnikov) was selected : it allows the computation of density ˜d on GPUs as a texture operation that is very fast. A output from the multi-agent system Integration in the complete system • Route building from initially conicting trajectories : Figure  Initial ight plans and nal ones Conclusion and future work An integrated algorithm • Entropy minimizer is now a part of the overall route design system; • Only a simple post-processing is necessary to output a usable airways network; • The complete algorithm is being ported to GPU. Future work : take the headings into account • The behavior is not completely satisfactory when routes are converging in opposite directions; • An improved version will make use of entropy of a distribution in a Lie group (publication in progress).

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
We introduce a novel kernel density estimator for a large class of symmetric spaces and prove a minimax rate of convergence as fast as the minimax rate on Euclidean space. We prove a minimax rate of convergence proven without any compactness assumptions on the space or Hölder-class assumptions on the densities. A main tool used in proving the convergence rate is the Helgason-Fourier transform, a generalization of the Fourier transform for semisimple Lie groups modulo maximal compact subgroups. This paper obtains a simplified formula in the special case when the symmetric space is the 2-dimensional hyperboloid.
 
Kernel Density Estimation on Symmetric Spaces

Dena Marie Asta Department of Statistics Ohio State University Supported by NSF grant DMS-1418124 and NSF Graduate Research Fellowship under grant DGE-1252522. Kernel Density Estimation on Symmetric Spaces 2 Geometric Methods for Statistical Analysis q  Classical statistics assumes data is unrestricted on Euclidean space q  Exploiting the geometry of the data leads to faster and more accurate tools ¯X = 1 n nX i=1 Xi var[X] = E[X2 ] E[X]2 implicit geometry in non-Euclidean data explicit geometry in networks Motivation: Non-Euclidean Data 3 Normal Distributions sphere Diffusion Tensor Imaging Material Stress, Gravitational Lensing Directional Headings 3x3 symmetric positive definite matrices 3x3 symmetric positive definite matrices hyperboloid 4 Nonparametric Methods: Non-Euclidean Data q  Classical non-parametric estimators assume Euclidean structure q  Sometimes the given data has other geometric structure to exploit. kernel density estimator kernel regression conditional density estimator Motivation: Non-Euclidean Distances 5 Normal Distributions sphere Diffusion Tensor Imaging Euclidean distances are often not the right notion of distance between data points. Material Stress, Gravitational Lensing Directional Headings 3x3 symmetric positive definite matrices 3x3 symmetric positive definite matrices hyperboloid Motivation: Non-Euclidean Distances 6 sphere Euclidean distances are often not the right notion of distance between data points. Directional Headings Distance between directional headings should be shortest path-length. Motivation: Non-Euclidean Distances 7 Normal Distributions Euclidean distances are often not the right notion of distance between data points. hyperboloid mean standarddeviation An isometric representation of the hyperboloid is the Poincare Half-Plane. Each point in either model represents a normal distribution. Distance is the Fisher Distance, which is similar to KL-Divergence. Motivation: Non-Euclidean Distances 8 Euclidean distance not the right distance à Euclidean volume not the right volume We want to minimize risk for density estimation on a (Riemmanian) manifold. estimator based on n samples true density manifold volume measure based on intrinsic distanceEf Z M (f ˆfn)2 dµ Existing Estimators 9 ˆfh (X1,...,Xn)(x) = 1 nh nX i=1 K ✓ x Xi h ◆ optimal rate of convergence1 (s=smoothness parameter, d=dimension) O(n-2s/(2s+d)) division by h undefined for general M subtraction undefined for general M Euclidean KDE Exploiting Geometry: Symmetries 10 q  symmetries = geometry q  symmetries make the smoothing of data (convolution by a kernel) tractable q  translations in Euclidean space are specific examples of symmetries q  other spaces call for other symmetries 11 Exploiting symmetries to convolve Kernel density estimation is about convolving a kernel with the data. More general spaces, depending on their geometry, we will require symmetries other than translations… ˆfh (X1,...,Xn) = Kh ⇤ empirical(X1, . . . , Xn) (g ⇤ f)(x) = Z Rn g(t)f(x t) dt density on the space of translations on Rn density on Rn (g ⇤ f)(x) = Z Rn g(t)f(x t) dt = Z Rn g(Tt)f(T 1 t (x)) dt 12 Exploiting symmetries to convolve Tv(w) = v + w Identify t with Tt and interpret g as a density on the space of Tt’s. Kernel density estimation is about convolving a kernel with the data. More general spaces, depending on their geometry, we will require symmetries other than translations… density on Rn density on the space of translations on Rn ˆfh (X1,...,Xn) = Kh ⇤ empirical(X1, . . . , Xn) (g ⇤ f)(x) = Z G g(T)f(T 1 (x)) dT 13 X is a symmetric space, a space having a suitable space G of symmetries. space of symmetries on X Generalized kernel density estimation involves convolving a generalized kernel with the data. density on X density on the space G Exploiting symmetries to convolve ˆfh (X1,...,Xn) = Kh ⇤ empirical(X1, . . . , Xn) (“empirical density”) G-Kernel Density Estimator: general form density on group of symmetries G “empirical density” on symmetric space Xbandwidth and cutoff parameters sample observations We can use harmonic analysis on symmetric spaces to define and analyze this estimator. 1Asta, D., 2014.
  ˆfh,C (X1,...,Xn) = Kh ⇤ empirical(X1, . . . , Xn) 15 Harmonic Analysis on Symmetric Spaces 1Terras, A., 1985.
  H : L2(X) ⌧ L2(· · · ) : H 1 The (Helgason-)Fourier Transform sends convolutions to products. Helgason-Fourier Transform: for symmetric space X, an isometry frequency space depends on the geometry of X F : L2(R) ⌧ L2(R) : F 1 Fourier Transform: an isometry 16 Generalization: G-Kernel Density Estimator 1Asta, D., 2014.
  q  The true density is sufficiently smooth (in Sobolev ball). q  The kernel transforms nicely with the space of data q  The kernel is sufficiently smooth assumptions on kernel and true density: G-Kernel Density Estimator 17 THEOREM: G-KDE achieves the same minimax rate on symmetric spaces as the ordinary KDE achieves on Rd.1 1Asta, D., 2014.
  O(n-2s/(2s+d)) optimal rate of convergence1 (s=Sobolev smoothness parameter, d=dimension) ˆfh,C (X1,...,Xn) = H 1 [ (X1,...,Xn)H[Kh]IC] 18 Kernels on Symmetries Symmetric Positive Definite (nxn) Matrices SPDn: Kernels are densities on space G=GLn of nxn invertible matrices. Hyperboloid H2: Kernels are densities on space G=SL2 of 2x2 invertible matrices having determinant 1. Each SL2-matrix M determines an isometry (distance-preserving function): M : H2 ⇠= H2 M (x) = M11x + M12 M21x + M22 Each GLn-matrix M determines an isometry (distance-preserving function): M (X) = MT XMM : SPDn ⇠= SPDn example of kernel K (hyperbolic version of the gaussian): solution to the heat equation on SL2: 19 Kernels on Symmetries Hyperboloid H2: Kernels are densities on space G=SL2 of 2x2 invertible matrices having determinant 1. Each SL2-matrix M determines an isometry (distance-preserving function): M : H2 ⇠= H2 M (x) = M11x + M12 M21x + M22 samples from K (points in SL2) represented in H2=SL2/SO2 H[Kh](s, k✓) / eh2 ¯s2 h¯s 20 Recap: G-KDE 1Asta, D., 2014.
  Exploiting the geometric structure of the data type: q  Tractable data smoothing = convolving a kernel on a space of symmetries q  Harmonic analysis on symmetric spaces allows us to prove minimax rate q  Symmetric spaces are general enough to include: Normal Distributions Diffusion Tensor Imaging Material Stress, Gravitational Lensing Directional Headings

Keynote speach Tudor Ratiu (chaired by Xavier Pennec)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. 
 
Symetry methods in geometrics mechanics

Dimension reduction on Riemannian manifolds (chaired by Xavier Pennec, Alain Trouvé)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper presents derivations of evolution equations for the family of paths that in the Diffusion PCA framework are used for approximating data likelihood. The paths that are formally interpreted as most probable paths generalize geodesics in extremizing an energy functional on the space of differentiable curves on a manifold with connection. We discuss how the paths arise as projections of geodesics for a (non bracket-generating) sub-Riemannian metric on the frame bundle. Evolution equations in coordinates for both metric and cometric formulations of the sub-Riemannian geometry are derived. We furthermore show how rank-deficient metrics can be mixed with an underlying Riemannian metric, and we use the construction to show how the evolution equations can be implemented on finite dimensional LDDMM landmark manifolds.
 
Evolution Equations with Anisotropic Distributions and Diffusion PCA

Faculty of Science Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations GSI 2015, Paris, France Stefan Sommer Department of Computer Science, University of Copenhagen October 29, 2015 Slide 1/21 Intrinsic Statistics in Geometric Spaces Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 2/21 Statistics on Manifolds • Frech´et mean: argminx∈M 1 N ∑N i=1 d(x,yi)2 • PGA (Fletcher et al., ’04); GPCA (Huckeman et al., ’10); HCA (Sommer, ’13); PNS (Jung et al., ’12); BS (Pennec, ’15) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 3/21 PGA GPCA HCA Infinitesimally defined Distributions; MLE • aim: construct a family NM(µ,Σ) of anisotropic Gaussian-like distributions; fit by MLE/MAP • in Rn , Gaussian distributions are transition distributions of diffusion processes dXt = dWt • on (M,g), Brownian motion is transition distribution of stochastic process (Eells-Elworthy-Malliavin construction), or solution to heat diffusion equation ∂ ∂t p(t,x) = 1 2 ∆p(t,x) • infinitesimal dXt vs. global pt (x;y) ∝ e− x−y 2 Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 4/21 MLE of Diffusion Processes • Eells-Elworthy-Malliavin construction gives map Diff : FM → Dens(M) • Diff(FM) = NM ⊂ Dens(M): the set of (normalized) transition densities from FM diffusions • γ = Diff(x,Xα) = pγγ0, the log-likelihood lnL(x,Xα) = lnL(γ) = N ∑ i=1 lnpγ(yi) • Estimated Template: argmax(x,Xα)∈FM lnL(x,Xα) • MLE of data yi under the assumption y ∼ γ ∈ NM • Diffusion PCA (Sommer ’14): argmax lnL(x,Xα +εI) generalizing Probabilistic PCA (Tipping, Bishop, ’99; Zhang, Fletcher ’13) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 5/21 Most Probable Paths to Samples • Euclidean: • density pt (x;y) ∝ e−(x−y)T Σ−1 (x−y) • transition density of diffusion processes with stationary generator • x −y most probable path from y to x • Manifolds: • which distributions correspond to anisotropic Gaussian distributions N(x,Σ)? • what is the most probable path from y to x? Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 6/21 Anisotropic Diffusions and Holonomy • driftless diffusion SDE in Rn , stationary generator: dXt = σdWt , σ ∈ Mn×d • diffusion field σ, infinitesimal generator σσT • curvature: stationary field/generator cannot be defined due to holonomy Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 7/21 Stochastic Development: Eells-Elworthy-Malliavin Construction • Xt : Rn valued Brownian motion (driving process) • Ut : FM valued (sub-elliptic) diffusion • Yt : M valued stochastic process (target process) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 8/21 The Frame Bundle • the manifold and frames (bases) for the tangent spaces TpM • F(M) consists of pairs u = (x,Xα), x ∈ M, Xα frame for Tx M • curves in the horizontal part of F(M) correspond to curves in M and parallel transport of frames Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 9/21 Driving process, FM valued process and Target process • Hi, i = 1...,n horizontal vector fields on F(M): Hi(u) = π−1 ∗ (ui) • SDE in Rn (driving): dXt = IdndBt , X0 = 0 • SDE in FM: dUt = Hi(Ut )◦dXi t , U0 = (x0,Xα) , Xα ∈ GL(Rn ,Tx0M) • Process on M (target): Yt = πFM(Ut ) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 10/21 Ut: Frame Bundle Diffusion Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 11/21 Estimated Templates Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 12/21 MLE template Most Probable Paths • in Rn , straight lines are most probable for stationary diffusion processes • Onsager-Machlup functional, σt curve on M: L(σt ) = − 1 2 σ (t) 2 g + 1 12 R(σ(t)) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 13/21 Most Probable Paths • in Rn , straight lines are most probable for stationary diffusion processes • Onsager-Machlup functional, σt curve on M: L(σt ) = − 1 2 σ (t) 2 g + 1 12 R(σ(t)) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 13/21 Most Probable Paths • in Rn , straight lines are most probable for stationary diffusion processes • Onsager-Machlup functional, σt curve on M: L(σt ) = − 1 2 σ (t) 2 g + 1 12 R(σ(t)) • MPP for target process Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 13/21 Most Probable Paths • in Rn , straight lines are most probable for stationary diffusion processes • Onsager-Machlup functional, σt curve on M: L(σt ) = − 1 2 σ (t) 2 g + 1 12 R(σ(t)) • MPP for driving process Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 13/21 R=0 Definition (MPPs for Driving Process) Let Xt be the driving process for the diffusion Yt and x ∈ M, i.e. Yt = π(φ(Xt )). Then σ is a most probable path for the driving process if it satisfies σ = argminc∈H(Rd ),φ(c)(1)=x 1 0 −L(ct )dt Proposition Let Yα be a frame for Ty M, and let Yt = π(φ(y,Yα)(Xt )), i.e. Yt is the development of Xt starting at (y,Yα). Then MPPs for the driving process Xt maps to geodesics of a lifted sub-Riemannian metric on FM: w, ˜w FM = X−1 α π∗w,X−1 α π∗ ˜w Rn . • isotropic case, MPPs for drv. process maps to geodesics • if −lnL(x,Xα) ≈ c + 1 N ∑N i=1 p(MPP(x,yi )). Then Frech´et mean ≈ MLE, isotropic case Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 14/21 MPPs on S2 increasing anisotropy −→ (a) cov. diag(1,1) (b) cov. diag(2,.5) (c) cov. diag(4,.25) Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 15/21 Sub-Riemannian Geometry on FM • Xα : Rn → Tx M gives inner-product v,w Xα = X−1 α v,X−1 α w Rn • optimal control problem with nonholonomic constraints xt = arg min ct ,c0=x,c1=y 1 0 ˙ct 2 Xα,t dt • let ˜v, ˜w HFM = X−1 α,t π∗(˜v),X−1 α,t π∗(˜w) Rn on H(xt ,Xα,t )FM. This defines a sub-Riemannian metric G on TFM and equivalent problem (xt ,Xα,t ) = arg min (ct ,Cα,t ),c0=x,c1=y 1 0 (˙ct , ˙Cα,t ) 2 HFMdt with horizontality constraint (˙ct , ˙Cα,t ) ∈ H(ct ,Cα,t )FM Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 16/21 MPP Evolution Equations • sub-Riemannian Hamilton-Jacobi equations ˙yk t = Gkj (yt )ξt,j , ˙ξt,k = − 1 2 ∂Gpq ∂yk ξt,pξt,q • in coordinates (xi ) for M, Xi α for Xα, and W encoding the inner product Wkl = δαβXk αXl β: ˙xi = Wij ξj −Wih Γ jβ h ξjβ , ˙Xi α = −Γiα h Whj ξj +Γiα k Wkh Γ jβ h ξjβ ˙ξi = Whl Γ kδ l,i ξhξkδ − 1 2 Γ hγ k,iWkh Γ kδ h +Γ hγ k Wkh Γ kδ h,i ξhγξkδ ˙ξiα = Γ hγ k,iα Wkh Γ kδ h ξhγξkδ − Whl ,iα Γ kδ l +Whl Γ kδ l,iα ξhξkδ − 1 2 Whk ,iα ξhξk +Γ hγ k Wkh ,iα Γ kδ h ξhγξkδ Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 17/21 Landmark LDDMM • Christoffel symbols (Michelli et al. ’08) Γk ij = 1 2 gir gkl grs ,l −gsl grk ,l −grl gks ,l gsj • mix of transported frame and cometric: Fd M bundle of rank d linear maps Rd → Tx M, ξ,˜ξ ∈ T∗Fd M, cometric gFd M +λgR: ξ,˜ξ = δαβ (ξ|π−1 ∗ Xα)(˜ξ|π−1 ∗ Xβ)+λ ξ,˜ξ gR • the whole frame need not be transported Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 18/21 LDDMM Landmark MPPs Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 19/21 + horz. var. isotropic + vert. var. Statistical Manifold: Geometry of Γ • Densities Dens(M) = {γ ∈ Ωn (M) : M γ = 1,γ > 0} • Fisher-Rao metric: GFR γ (α,β) = M α γ β γ γ • Γ finite dim. subset of Dens(M) Diff : FM → Dens(M) • naturally defined on bundle of symmetric positive T0 2 tensors Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 20/21 Summary • infinitesimal definition of anisotropic normal distributions NM(µ,Σ) on M • diffusion map Diff : FM → Dens(M) from Eells-Elworthy-Malliavin construction, stoch. develop. • MLE of template / covariance (in FM) • MPPs for driving processes generalize geodesics being sub-Riemannian geodesics 1 Sommer: Diffusion Processes and PCA on Manifolds, Oberwolfach extended abstract (Asymptotic Statistics on Stratified Spaces), 2014. 2 Sommer: Anisotropic Distributions on Manifolds: Template Estimation and Most Probable Paths, Information Processing in Medical Imaging (IPMI) 2015. 3 Sommer: Evolution Equations with Anisotropic Distributions and Diffusion PCA, Geometric Science of Information (GSI) 2015. 4 Svane, Sommer: Similarities, SDEs, and Most Probable Paths, SIMBAD15 extended abstract. 5 Sommer, Svane: Holonomy, Curvature, and Anisotropic Diffusions, MOTR15 extended abstract. Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution Equations Slide 21/21

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper addresses the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. Current methods like Principal Geodesic Analysis (PGA) and Geodesic PCA (GPCA) minimize the distance to a “Geodesic subspace”. This allows to build sequences of nested subspaces which are consistent with a forward component analysis approach. However, these methods cannot be adapted to a backward analysis and they are not symmetric in the parametrization of the subspaces. We propose in this paper a new and more general type of family of subspaces in manifolds: barycentric subspaces are implicitly defined as the locus of points which are weighted means of k + 1 reference points. Depending on the generalization of the mean that we use, we obtain the Fréchet/Karcher barycentric subspaces (FBS/KBS) or the affine span (with exponential barycenter). This definition restores the full symmetry between all parameters of the subspaces, contrarily to the geodesic subspaces which intrinsically privilege one point. We show that this definition defines locally a submanifold of dimension k and that it generalizes in some sense geodesic subspaces. Like PGA, barycentric subspaces allow the construction of a forward nested sequence of subspaces which contains the Fréchet mean. However, the definition also allows the construction of backward nested sequence which may not contain the mean. As this definition relies on points and do not explicitly refer to tangent vectors, it can be extended to non Riemannian geodesic spaces. For instance, principal subspaces may naturally span over several strata in stratified spaces, which is not the case with more classical generalizations of PCA.
 
Barycentric Subspaces and Affine Spans in Manifolds

Barycentric Subspaces and Affine Spans in Manifolds GSI 30-10-2015 Xavier Pennec Asclepios team, INRIA Sophia-Antipolis – Mediterranée, France and Côte d’Azur University (UCA) Statistical Analysis of Geometric Features Computational Anatomy deals with noisy Geometric Measures  Tensors, covariance matrices  Curves, tracts  Surfaces, shapes  Images  Deformations Data live on non-Euclidean manifolds X. Pennec - GSI 2015 2  Manifold dimension reduction  When embedding structure is already manifold (e.g. Riemannian): Not manifold learning (LLE, Isomap,…) but submanifold learning Low dimensional subspace approximation? X. Pennec - GSI 2015 3 Manifold of cerebral ventricles Etyngier, Keriven, Segonne 2007. Manifold of brain images S. Gerber et al, Medical Image analysis, 2009. X. Pennec - GSI 2015 4 Barycentric Subspaces and Affine Spans in Manifolds PCA in manifolds: tPCA / PGA / GPCA / HCA Affine span and barycentric subspaces Conclusion 5 Bases of Algorithms in Riemannian Manifolds Reformulate algorithms with Expx and Logx Vector -> Bi-point (no more equivalence classes) Exponential map (Normal coordinate system):  Expx = geodesic shooting parameterized by the initial tangent  Logx = development of the manifold in the tangent space along geodesics  Geodesics = straight lines with Euclidean distance  Local  global domain: star-shaped, limited by the cut-locus  Covers all the manifold if geodesically complete 6 Statistical tools: Moments Frechet / Karcher mean minimize the variance 

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a novel method that adaptively deforms a polysphere (a product of spheres) into a single high dimensional sphere which then allows for principal nested spheres (PNS) analysis. Applying our method to skeletal representations of simulated bodies as well as of data from real human hippocampi yields promising results in view of dimension reduction. Specifically in comparison to composite PNS (CPNS), our method of principal nested deformed spheres (PNDS) captures essential modes of variation by lower dimensional representations.
 
Dimension Reduction on Polyspheres with Application to Skeletal Representations

Introduction Deformation Skeletal Representations Conclusion Dimension Reduction on Polyspheres with Application to Skeletal Representations joint work with Stephan Huckemann and Sungkyu Jung Benjamin Eltzner University of Göttingen conference on Geometric Science of Information, 2015-10-30 Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Dimension Reduction on Manifolds PCA relies on linearity. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Dimension Reduction on Manifolds PCA relies on linearity. Tangent space approaches ignore geometry and periodic topology. Intrinsic approaches rely on manifold geometry. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Dimension Reduction on Manifolds PCA relies on linearity. Tangent space approaches ignore geometry and periodic topology. Intrinsic approaches rely on manifold geometry. Two classes: Forward methods: Submanifold dimension d = 1, 2, 3, . . . Needs “good” geodesics and a construction scheme. Backward methods: d = D − 1, D − 2, D − 3, . . . Needs rich (parametric) set of submanifolds. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Polysphere Dimension Reduction Almost all geodesics of PD = Sd1 r1 × · · · × SdK rK are dense in (S1 )K . Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Polysphere Dimension Reduction Almost all geodesics of PD = Sd1 r1 × · · · × SdK rK are dense in (S1 )K . Low symmetry isom(PD ) = SO(d1 + 1) × · · · × SO(dK + 1), no generic rich set of submanifolds. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Deformation for Unit Spheres Dimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3 Recursively deform polysphere to sphere f : PD → SD . Squared line elements of two unit spheres: ds2 1 = d1 k=1   k−1 j=1 sin2 φ1,j   dφ2 1,k, ds2 2 = d2 k=1   k−1 j=1 sin2 φ2,j   dφ2 2,k Deformation: ds2 = ds2 2 + d2 j=1 sin2 φ2,j ds2 1 1 S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319. 2 S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83. 3 S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Deformation for Unit Spheres Dimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3 Recursively deform polysphere to sphere f : PD → SD . Squared line elements of two unit spheres: ds2 1 = d1 k=1   k−1 j=1 sin2 φ1,j   dφ2 1,k, ds2 2 = d2 k=1   k−1 j=1 sin2 φ2,j   dφ2 2,k Deformation: ds2 = ds2 2 + d2 j=1 sin2 φ2,j ds2 1 Degrees of freedom: Rotation and ordering of spheres. 1 S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319. 2 S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83. 3 S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Fixing Degrees of Freedom Rotation: Embed Sdi ri into Rdi+1 . Determine Fréchet mean ˆµi and use rotation along a geodesic to move it to positive xi,di+1-direction (north pole). Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Fixing Degrees of Freedom Rotation: Embed Sdi ri into Rdi+1 . Determine Fréchet mean ˆµi and use rotation along a geodesic to move it to positive xi,di+1-direction (north pole). Ordering: Data spread: si = N n=1 d2 (ψi,n, ˆµi) Choose permutation p such that sp−1(1) is maximal and sp−1(K) is minimal. Minimizes distortion due to factors sin2 φj, i. e. deviation from polysphere geometry. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Mapping Data Points Embedding Sdi 1 ⊂ Rdi+1 we get ∀1 ≤ j ≤ d2 : yj = x2,j, ∀1 ≤ k ≤ d1 + 1 : yd2+k = x2,d1+1x1,j Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Mapping Data Points Embedding Sdi 1 ⊂ Rdi+1 we get ∀1 ≤ j ≤ d2 : yj = x2,j, ∀1 ≤ k ≤ d1 + 1 : yd2+k = x2,d1+1x1,j For different radii, rescale ∀1 ≤ j ≤ d1 + 1 : x1,j → ˜x1,j = R1x1,j, ∀i > 1 ∀1 ≤ j ≤ di : xi,j → ˜xi,j = Rixi,j and use ˜x in definition of y coordinates. This yields an ellipsoid x ∈ Rd2+d1+1 d2 k=1 R−2 2 x2 2,k + d1+1 k=1 R−2 1 (x2,d2+1x1,k)2 = 1 . Normalize all y-vectors to length R := K j=1 Rj 1 K as final step. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Illustration for Different Radii 1. Map from blue polysphere to green ellipsoid. 2. Map to red sphere. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion A Brief Review of Principal Nested Spheres (PNS) PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}. Recursively fit small subsphere Sd−1 ⊂ Sd minimizing sum of squared geodesic projection distances. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion A Brief Review of Principal Nested Spheres (PNS) PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}. Recursively fit small subsphere Sd−1 ⊂ Sd minimizing sum of squared geodesic projection distances. At every projection, save signed projection distance (residuals). Parameter space dimension for Sd−1 ⊂ Sd is p = d + 1, compared to linear PCA where for Rd−1 ⊂ Rd it is p = d. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Skeletal Representation (s-rep) Parameter Space S-rep consists of 1. A two-dimensional mesh of m × n skeletal points. 2. Spokes from mesh points to the surface. Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539 Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Skeletal Representation (s-rep) Parameter Space S-rep consists of 1. A two-dimensional mesh of m × n skeletal points. 2. Spokes from mesh points to the surface. Parameters: Size of centered mesh, spoke lengths, normalized mesh-points, spoke directions: Q = R+ × RK + × S3mn−1 × S2 K Polysphere deformation on S3mn−1 × S2 K yields Q = S5mn+2m+2n−5 Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539 Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Dimension Reduction for Real S-reps PNDS: Deform polysphere to sphere and apply PNS. CPNS: PNS on spheres individually and linear PCA on joint residuals. 0 10 20 30 40 50 Dimension 0 20 40 60 80 100 Variances[%] PNDS CPNS Figure : PNDS vs. CPNS: residual variances for s-reps of 51 hippocampi5 . 5 S. M. Pizer et al. Ed. by M. Breuß, Bruckstein, and Maragos. Springer, Berlin, 2013, pp. 93–115. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Dimension Reduction for Simulated S-reps −100 −50 0 50 100 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 component 3 variance = 0.64% −100 −50 0 50 100 −100 −50 0 50 100 components 3 and 2 −100 −50 0 50 100 −100 −50 0 50 100 components 3 and 1 −100 −50 0 50 100 −100 −50 0 50 100 components 2 and 3 −100 −50 0 50 100 0.0 0.2 0.4 0.6 0.8 1.0 1.2 component 2 variance = 5.95% −100 −50 0 50 100 −100 −50 0 50 100 components 2 and 1 −100 −50 0 50 100 −100 −50 0 50 100 components 1 and 3 −100 −50 0 50 100 −100 −50 0 50 100 components 1 and 2 −100 −50 0 50 100 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 component 1 variance = 92.02% −100 −50 0 50 100 0.0 0.2 0.4 0.6 0.8 1.0 1.2 component 3 variance = 2.17% −100 −50 0 50 100 −100 −50 0 50 100 components 3 and 2 −100 −50 0 50 100 −100 −50 0 50 100 components 3 and 1 −100 −50 0 50 100 −100 −50 0 50 100 components 2 and 3 −100 −50 0 50 100 0.0 0.1 0.2 0.3 0.4 0.5 component 2 variance = 32.10% −100 −50 0 50 100 −100 −50 0 50 100 components 2 and 1 −100 −50 0 50 100 −100 −50 0 50 100 components 1 and 3 −100 −50 0 50 100 −100 −50 0 50 100 components 1 and 2 −100 −50 0 50 100 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 component 1 variance = 62.73% Figure : PNDS vs. CPNS for simulated twisted ellipsoids: scatter plots of residual signed distances for the first three components. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Reflection on Parameter Space Dimension −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Figure : Simulated twisted ellipsoid data projected to the second component (a small two-sphere) in PNDS with first component (a small circle) inside. Parameter space dimensions: PNS on SD : p = 1 2 D(D + 3) − 1. PCA on RD : p = 1 2 D(D + 1). Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations Introduction Deformation Skeletal Representations Conclusion Conclusion We propose a deformation procedure mapping data on a polysphere to sphere. The construction aims at minimizing geometric distortion. We achieve lower dimensional representations than CPNS. The success of our method is rooted in the higher parameter space dimension. Benjamin Eltzner University of Göttingen Dimension Reduction on Polyspheres with Application to Skeletal Representations

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper studies the affine-invariant Riemannian distance on the Riemann-Hilbert manifold of positive definite operators on a separable Hilbert space. This is the generalization of the Riemannian manifold of symmetric, positive definite matrices to the infinite-dimensional setting. In particular, in the case of covariance operators in a Reproducing Kernel Hilbert Space (RKHS), we provide a closed form solution, expressed via the corresponding Gram matrices.
 
Affine-invariant Riemannian Distance Between Infinite-dimensional Covariance Operators

Affine-invariant Riemannian distance between infinite-dimensional covariance operators H`a Quang Minh Istituto Italiano di Tecnologia, ITALY Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.1/52 From finite to infinite dimensions Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.2/52 Outline 1. Review of finite-dimensional setting: Affine-invariant Riemannian metric on the manifold of symmetric positive definite matrices 2. Infinite-dimensional generalization: Riemann-Hilbert manifold of positive definite unitized Hilbert-Schmidt operators 3. Affine-invariant Riemannian distance between Reproducing Kernel Hilbert Spaces (RKHS) covariance operators Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.3/52 Positive definite matrices Sym++ (n) = symmetric, positive definite n × n matrices Have been studied extensively mathematically Numerous practical applications Brain imaging (Arsigny et al 2005, Dryden et al 2009, Qiu et al 2015) Computer vision: object detection (Tuzel et al 2008, Tosato et al 2013), image retrieval (Cherian et al 2013), visual recognition (Jayasumana et al 2015) Radar signal processing: Barbaresco (2013), Formont et al 2013 Machine learning: kernel learning (Kulis et al 2009) Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.4/52 Positive definite matrices Sym++ (n) = symmetric, positive definite n × n matrices Differentiable manifold viewpoint Tangent space TP (Sym++ )(n) ∼= Sym(n) = vector space of symmetric matrices Affine-invariant Riemannian metric: on TP (Sym++ (n)) A, B P = P−1/2 AP−1/2 , P−1/2 BP−1/2 F = tr[P−1 AP−1 B] with the Frobenius inner product A, B F = tr(AT B) Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.5/52 Positive definite matrices Riemannian metric: on TP (Sym++ (n)) = Sym(n) A, B P = P−1/2 AP−1/2 , P−1/2 BP−1/2 F with the Frobenius inner product A, B F = tr(AT B) Affine-invariance CACT , CBCT CPCT = A, B P for any matrix C ∈ GL(n) Siegel (1943), Mostow (1955), Pennec et al 2006, Bhatia 2007, Moakher and Zéraï 2011, Bini and Iannazzo 2013 Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.6/52 Positive definite matrices Geodesically complete, with nonpositive curvature Geodesic joining P, Q ∈ Sym++ (n) γPQ(t) = P1/2 (P−1/2 QP−1/2 )t P1/2 The exponential map ExpP : TP (Sym++ (n)) → Sym++ (n) ExpP (V ) = P1/2 exp(P−1/2 V P−1/2 )P1/2 is defined on all of TP (Sym++ (n) Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.7/52 Positive definite matrices Riemannian distance daiE(A, B) = || log(A−1/2 BA−1/2 ||F where log(A) is the principal logarithm of A A

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We develop a generic framework to build large deformations from a combination of base modules. These modules constitute a dynamical dictionary to describe transformations. The method, built on a coherent sub-Riemannian framework, defines a metric on modular deformations and characterises optimal deformations as geodesics for this metric. We will present a generic way to build local affine transformations as deformation modules, and display examples.
 
A sub-Riemannian modular approach for diffeomorphic deformations

A sub-Riemannian modular approach for diffeomorphic deformations GSI 2015 Barbara Gris Advisors: Alain Trouvé (CMLA) and Stanley Durrleman (ICM) gris@cmla.ens-cachan.fr October 30, 2015 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Introduction "Is it possible to mechanize human intuitive understanding of biological pictures that typically exhibit a lot of variability but also possess characteristic structure ?" Ulf Grenander Hands : a Pattern Theoric Study of Biological Shapes, 1991 Introduction Structure in data Introduction Structure in data ˙ϕt = vt ◦ ϕt , ϕt=0 = Id Introduction Structure in data Structure in deformations Introduction Structure in data Structure in deformations Type of vector fields Previous works locally affine deformations Poly-affine [C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees. Medical image analysis, 2012] Previous works locally affine deformations Poly-affine [C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees. Medical image analysis, 2012] v(x) = i wi (x)Ai (x) Previous works locally affine deformations Poly-affine [C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees. Medical image analysis, 2012] v(x) = i wi (x)Ai (x) Deformation structure does not evolve with the flow Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems for human anatomy, 2014] Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems for human anatomy, 2014] Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-order momentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013] Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems for human anatomy, 2014] Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-order momentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013] Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparse parameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging , pages 123-134. Springer, 2011] Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems for human anatomy, 2014] Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-order momentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013] Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparse parameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging , pages 123-134. Springer, 2011] Deformation structure evolves with flow Previous works Shape space (S. Arguillère) Shape space [S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhD thesis, 2014.]: Deformation structure imposed by shapes and action of vector fields Previous works : LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems for human anatomy, 2014] Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-order momentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013] Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparse parameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging , pages 123-134. Springer, 2011] Deformation structure evolves with flow No control on deformation structure Previous works Constraints Diffeons [L. Younes. Constrained diffeomorphic shape evolution. Foundations of Computational Mathematics, 2012.] Our model : Deformation modules Purpose : Our model : Deformation modules Purpose : Incorporate constraints in the deformation model Our model : Deformation modules Purpose : Incorporate constraints in the deformation model Merge different constraints in a complex one Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Deformation modules Definition and first examples A deformation module : Deformation modules Definition and first examples A deformation module : Contains a space of shapes Deformation modules Definition and first examples A deformation module : Contains a space of shapes Can generate vector fields that : Deformation modules Definition and first examples A deformation module : Contains a space of shapes Can generate vector fields that : are of a particular type Deformation modules Definition and first examples A deformation module : Contains a space of shapes Can generate vector fields that : are of a particular type −→ deformation structure Deformation modules Definition and first examples A deformation module : Contains a space of shapes Can generate vector fields that : are of a particular type −→ deformation structure depend on the state of the shape Deformation modules Definition and first examples A deformation module : Contains a space of shapes Can generate vector fields that : are of a particular type −→ deformation structure depend on the state of the shape −→ the deformation structure evolves with the flow Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Deformation modules Definition and first examples : local translation of scale σ Example of generated vector field Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) Deformation modules Definition and first examples : local translation of scale σ M = (O, H, V, ζ, ξ, c) O is a shape space (S. Arguillère) There exists C > 0 : ∀(o, h) ∈ O × H: |ζ(o, h)|2 V ≤ C c(o, h) Deformation modules Definition and first examples : local scaling of scale σ Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field z1z2 z3 Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field z1z2 z3 d1d2 d3 Deformation modules Definition and first examples : local scaling of scale σ Example of generated vector field z1z2 z3 d1d2 d3 Deformation modules Definition and first examples : local scaling of scale σ Deformation modules Definition and first examples : local rotation of scale σ Introduction Definition and first examples : local translation of scale σ and fixed direction Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Deformation modules Modular large deformations M = (O, H, V, ζ, ξ, c) Deformation modules Modular large deformations Studied trajectories : t → (ot , ht ) ∈ O × H such that ˙ot = ξot (vt ) where vt = ζot (ht ) ∈ ζot (H). Deformation modules Modular large deformations Studied trajectories : t → (ot , ht ) ∈ O × H such that ˙ot = ξot (vt ) where vt = ζot (ht ) ∈ ζot (H). −→ Solutions of ˙ϕv t = vt ◦ ϕv t , ϕv t=0 = Id exist. Deformation modules Modular large deformations Studied trajectories : t → (ot , ht ) ∈ O × H such that ˙ot = ξot (vt ) where vt = ζot (ht ) ∈ ζot (H). −→ Solutions of ˙ϕv t = vt ◦ ϕv t , ϕv t=0 = Id exist. −→ ϕv = modular large deformation. Deformation modules Modular large deformations : an example Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Deformation modules Combination Deformation modules Combination Deformation modules Combination Features : if ci oi (hi) = |ζi oi (hi)|2 Vi then co(h) = i |ζi oi (hi)|2 Vi = | i ζi oi (hi)|2 V Deformation modules Combination Features : if ci oi (hi) = |ζi oi (hi)|2 Vi then co(h) = i |ζi oi (hi)|2 Vi = | i ζi oi (hi)|2 V Geometrical descriptors are transported by the global vector field Deformation modules Combination Features : if ci oi (hi) = |ζi oi (hi)|2 Vi then co(h) = i |ζi oi (hi)|2 Vi = | i ζi oi (hi)|2 V Geometrical descriptors are transported by the global vector field Coherent mathematical framework : possibility to combine any modules Deformation modules Combination : Example of modular large deformation Sommaire 1 Introduction 2 Deformation modules Definition and first examples Modular large deformations Combining deformation modules 3 Numerical results Deformation modules Matching problem Deformation modules Matching problem Deformation modules Matching problem 1 0 co(h) + g(ϕv t=1 · fsource, ftarget ) v = ζo(h) [N. Charon and A. Trouvé. The varifold representation of non-oriented shapes for diffeomorphic registration, 2013] Deformation modules Matching problem Deformation modules Matching problem Deformation modules Matching problem Deformation modules Matching problem Deformation modules Matching problem Deformation modules Matching problem Conclusion We have presented Conclusion We have presented a coherent mathematical framework Conclusion We have presented a coherent mathematical framework to build modular large deformations. Conclusion We have presented a coherent mathematical framework to build modular large deformations. We showed how easily incorporating constraints in a deformation model Conclusion We have presented a coherent mathematical framework to build modular large deformations. We showed how easily incorporating constraints in a deformation model and merging different constraints in a global one. Conclusion "Is it possible to mechanize human intuitive understanding of biological pictures that typically exhibit a lot of variability but also possess characteristic structure ?" Ulf Grenander Hands : a Pattern Theoric Study of Biological Shapes, 1991 Thank you for your attention !

Optimization on Manifold (chaired by Pierre-Antoine Absil, Rodolphe Sepulchre)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The Riemannian trust-region algorithm (RTR) is designed to optimize differentiable cost functions on Riemannian manifolds. It proceeds by iteratively optimizing local models of the cost function. When these models are exact up to second order, RTR boasts a quadratic convergence rate to critical points. In practice, building such models requires computing the Riemannian Hessian, which may be challenging. A simple idea to alleviate this difficulty is to approximate the Hessian using finite differences of the gradient. Unfortunately, this is a nonlinear approximation, which breaks the known convergence results for RTR. We propose RTR-FD: a modification of RTR which retains global convergence when the Hessian is approximated using finite differences. Importantly, RTR-FD reduces gracefully to RTR if a linear approximation is used. This algorithm is available in the Manopt toolbox.
 
Riemannian trust regions with finite-difference Hessian approximations are globally convergent

Ditch the Hessian Hassle with Riemannian Trust Regions Nicolas Boumal, Inria & ENS Paris Geometric Science of Information, GSI 2015 Oct. 30, 2015, Paris The goal is to optimize a smooth function on a smooth manifold The Trust Region method is like Newton’s with a safeguard