GSI2015
About
LIX Colloquium 2015 conferences

Provide an overview on the most recent stateoftheart

Exchange mathematical information/knowledge/expertise in the area

Identify research areas/applications for future collaboration

Identify academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will unify skills from Geometry, Probability and Information Theory. The conference proceedings are published in Springer's Lecture Note in Computer Science (LNCS) series.
Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI 
Provisional Topics of Special Sessions:

Manifold/Topology Learning

Riemannian Geometry in Manifold Learning

Optimal Transport theory and applications in Imagery/Statistics

Shape Space & Diffeomorphic mappings

Geometry of distributed optimization

Random Geometry/Homology

Hessian Information Geometry

Topology and Information

Information Geometry Optimization

Divergence Geometry

Optimization on Manifold

Lie Groups and Geometric Mechanics/Thermodynamics

Quantum Information Geometry

Infinite Dimensional Shape spaces

Geometry on Graphs

Bayesian and Information geometry for inverse problems

Geometry of Time Series and Linear Dynamical Systems

Geometric structure of Audio Processing

Lie groups in Structural Biology

Computational Information Geometry
Committees
Secrétaire
 Valérie Alidor  SEE, France https://www.see.asso.fr
Webmestre
 Jean Vieille  SyntropicFactory http://www.syntropicfactory.com
Program chairs
 Frédéric Barbaresco  Thales, France http://www.thalesgroup.com
 Frank Nielsen  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
Scientific committee
 PierreAntoine Absil  University of Louvain, Belgium http://sites.uclouvain.be/absil/
 Bijan Afsari  Johns Hopkins University, USA http://www.cis.jhu.edu/~bijan/
 Stéphanie Allassonnière  Ecole Polytechnique, France https://sites.google.com/site/stephanieallassonniere/home
 Shunichi Amari  RIKEN, Japan http://www.brain.riken.jp/labs/mns/amari/homeE.html
 Jesus Angulo  Mines ParisTech, France http://cmm.ensmp.fr/~angulo/
 JeanPhilippe Anker  Université d'Orléans, France http://www.univorleans.fr/mapmo/membres/anker/
 Sylvain Arguillère  John Hopkins University, USA http://www.cis.jhu.edu/~arguille/
 Marc Arnaudon  Université de Bordeaux, France http://www.math.ubordeaux1.fr/~marnaudo/
 Dena Asta  Carnegie Mellon University, USA http://www.stat.cmu.edu/~dasta/
 Michael Aupetit  Qatar Computing Research Institute, Quatar http://michael.aupetit.free.fr/
 Roger Balian  Academy of Sciences, France https://en.wikipedia.org/wiki/Roger_Balian
 Trivellato Barbara  Politecnico di Torino, Italy http://calvino.polito.it/~trivellato/
 Frédéric Barbaresco  Thales, France http://www.thalesgroup.com
 Michèle Basseville  IRISA, France http://people.irisa.fr/Michele.Basseville/
 Pierre Baudot  Max Planck Institute for Mathematic in the Sciences http://www.mis.mpg.de/jjost/members/pierrebaudot.html
 Martin Bauer  University of Vienna, Austria http://mat.univie.ac.at/~bauerm/Home_Page_of_Martin_Bauer/Home.html
 Roman Belavkin  Middlesex University, UK http://www.eis.mdx.ac.uk/staffpages/rvb/
 Daniel Bennequin  ParisDiderot University http://webusers.imjprg.fr/~daniel.bennequin/
 Jérémy Bensadon  LRI, France https://www.lri.fr/~bensadon/
 JeanFrançois Bercher  ESIEE, France http://perso.esiee.fr/~bercherj/
 Yannick Berthoumieu  IMS Université de Bordeaux, France https://sites.google.com/site/berthoumieuims/
 Jérémie Bigot  Université de Bordeaux, France https://sites.google.com/site/webpagejbigot/
 Michael Blum  IMAG, France http://membrestimc.imag.fr/Michael.Blum/
 Lionel Bombrun  IMS, Université de Bordeaux, France https://www.imsbordeaux.fr/fr/annuaire/4158bombrunlionel
 Silvère Bonnabel  MinesParistech http://www.silverebonnabel.com/
 Ugo Boscain  Ecole polytechnique, France http://www.cmapx.polytechnique.fr/~boscain/
 Nicolas Boumal  Inria & ENS Paris, France http://perso.uclouvain.be/nicolas.boumal/
 Charles Bouveyron  University Paris Descartes, France http://w3.mi.parisdescartes.fr/~cbouveyr/
 Michel Boyom  Université de Montpellier, France http://www.i3m.univmontp2.fr/
 Michel Broniatowski  University of Pierre and Marie Curie, France http://www.lsta.upmc.fr/Broniatowski/
 Martins Bruveris  Brunel University London, UK http://www.brunel.ac.uk/~mastmmb/
 Olivier Cappé  Telecom Paris, France http://perso.telecomparistech.fr/~cappe/
 Charles Cavalcante  Federal University of Ceará, Brazil http://www.ppgeti.ufc.br/charles/
 Antonin Chambolle  Ecole Polytechnique, France http://www.cmap.polytechnique.fr/~antonin/
 Frédéric Chazal  INRIA, France http://geometrica.saclay.inria.fr/team/Fred.Chazal/
 Emmanuel Chevallier  Mines ParisTech, France http://cmm.ensmp.fr/~chevallier/
 Sylvain Chevallier  IUT de Vélizy, France https://sites.google.com/site/sylvchev/
 Arshia Cont  Ircam, France http://repmus.ircam.fr/arshiacont
 Benjamin Couéraud  LAREMA Université d'Angers, France
 Philippe Cuvillier  Ircam, France http://repmus.ircam.fr/cuvillier
 Laurent Decreusefond  Telecom ParisTech, France http://www.infres.enst.fr/~decreuse/
 Alexis Decurninge  Huawei Technologies, Paris, France http://www.huawei.com/en/
 Michel Deza  Ecole Normale Supérieure Paris, CNRS, France http://www.liga.ens.fr/~deza/
 Stanley Durrleman  INRIA, France https://who.rocq.inria.fr/Stanley.Durrleman/index.html
 Patrizio Frosini  Università di Bologna, Italy http://www.dm.unibo.it/~frosini/
 Alfred Galichon  New York University, USA http://alfredgalichon.com/
 JeanPaul Gauthier  University of Toulon, France http://www.lsis.org/gauthierjp/
 Alexis Glaunès  Mines ParisTech, France http://www.mi.parisdescartes.fr/~glaunes/
 PierreYves Gousenbourger  Ecole Polytechnique de Louvain, Belgium http://www.uclouvain.be/pierreyves.gousenbourger
 Piotr Graczyk  University of Angers, France math.univangers.fr
 Peter Grunwald  CWI, Amsterdam, The Netherlands http://homepages.cwi.nl/~pdg/
 Nikolaus Hansen  INRIA, France www.lri.fr
 K V Harsha  Indian Institute of Space Science & Technology, India http://www.iist.ac.in/departments/
 Susan Holmes  Stanford University, USA http://statweb.stanford.edu/~susan/
 Wen Huang  University of Louvain, Belgium
 Stephan Huckemann  Institut für Mathematische Stochastik, Göttingen, Germany http://www.stochastik.math.unigoettingen.de/index.php?id=huckemann
 Shiro Ikeda  ISM, Japan http://www.ism.ac.jp/~shiro/
 Alexander Ivanov  Lomonosov Moscow State University, Russia  Imperial College, UK http://www.imperial.ac.uk/people/a.ivanov
 Jérémie Jakubowicz  Institut Mines Telecom, France http://wwwpublic.itsudparis.eu/~jakubowi/
 Martin Kleinsteuber  Technische Universität München, Germany http://www.professoren.tum.de/en/kleinsteubermartin/
 Ryszard Kostecki  Perimeter Institute for Theoretical Physics, Canada http://www.fuw.edu.pl/~kostecki/
 Hong Van Le  Mathematical Institute of ASCR, Czech Republik http://users.math.cas.cz/~hvle/
 Nicolas Le Bihan  Université de Grenoble, CNRS, France  University of Melbourne, Australia http://www.gipsalab.grenobleinp.fr/~nicolas.lebihan/
 Christian Léonard  Ecole Polytechnique, France http://www.cmap.polytechnique.fr/~leonard/
 Hervé Lombaert  INRIA, France http://step.polymtl.ca/~rv101/
 Jeanmichel Loubes  Toulouse University, France http://perso.math.univtoulouse.fr/loubes/
 Luigi Malagò  Shinshu University, Japan http://malago.di.unimi.it/
 Jonathan Manton  The University of Melbourne http://people.eng.unimelb.edu.au/jmanton/
 Matilde Marcolli  Caltech, USA http://www.its.caltech.edu/~matilde/
 JeanFrançois Marcotorchino  Thales, France https://www.thalesgroup.com/
 CharlesMichel Marle  Université Pierre et Marie Curie, France http://charlesmichel.marle.pagespersoorange.fr/
 Juliette Mattioli  THALES, France https://www.thalesgroup.com/en
 Bertrand Maury  Université Paris Sud, France http://www.math.upsud.fr/~maury/
 Quentin Mérigot  Université ParisDauphine / CNRS, France http://quentin.mrgt.fr/
 Fernand Meyer  Mines ParisTech, France fernandmeyer
 Klas Modin  Chalmers University of Technology, Göteborg, Sweden https://klasmodin.wordpress.com/
 Ali MohammadDjafari  Supelec, CNRS, France http://djafari.free.fr/
 Guido Montufar  Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany http://personalhomepages.mis.mpg.de/montufar/
 Subrahamanian Moosath  Indian Institute of Space Science and Technology, India http://www.iist.ac.in
 Eric Moulines  Telecom ParisTech, France http://perso.telecomparistech.fr/~moulines/
 Jan Naudts  Universiteit Antwerpen, Belgium https://www.uantwerpen.be/en/staff/jannaudts/mywebsite/
 Frank Nielsen  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~nielsen/
 Richard Nock  Université des Antilles et de la Guyane, France  NICTA, Australia http://www.univag.fr/rnock/index.html
 Yann Ollivier  Université Paris Sud, France http://www.yannollivier.org/
 JeanPhilippe Ovarlez  ONERA & SONDRA Lab, France http://www.jeanphilippeovarlez.com
 Bruno Pelletier  University of Rennes, France http://pelletierb.perso.math.cnrs.fr/
 Xavier Pennec  INRIA, France http://wwwsop.inria.fr/members/Xavier.Pennec/
 Michel Petitjean  Université Paris Diderot, CNRS, France http://petitjeanmichel.free.fr/itoweb.petitjean.html
 Gabriel Peyre  Université Paris Dauphine, CNRS, France http://gpeyre.github.io/
 Giovanni Pistone  Collegio Carlo Alberto, Moncalieri, Italie http://www.giannidiorestino.it/
 Julien Rabin  Université de Caen, France https://sites.google.com/site/rabinjulien/
 Tudor Ratiu  Ecole Polytechnique Federale de Lausanne, Swiss http://cag.epfl.ch/page39504en.html
 Johannes Rauh  Leibniz Universität hannover, Germany http://www2.iag.unihannover.de/~jrauh/index.php
 Olivier Rioul  Telecom ParisTech, France http://perso.telecomparistech.fr/~rioul/
 Said Salem  Université de Bordeaux, France https://www.imsbordeaux.fr/fr/annuaire/4069saidsalem
 Alessandro Sarti  Ecole des hautes études en sciences sociales, France http://cams.ehess.fr/document.php?id=1194
 Gery de Saxcé  Université des Sciences et des Technologies de Lille, France http://www.univlille1.fr/
 Olivier Schwander  Ecole Polytechnique, France http://www.lix.polytechnique.fr/~schwander/en/
 Rodolphe Sepulchre  Cambridge University, Department of Engineering, UK http://wwwcontrol.eng.cam.ac.uk/Main/RodolpheSepulchre
 Hichem Snoussi  Université de Technologie de Troyes, France http://h.snoussi.free.fr/
 Anuj Srivastava  Florida State University, USA http://stat.fsu.edu/~anuj/
 Udo von Toussaint  MaxPlanckInstitut fuer Plasmaphysik, Garching, Germany http://home.rzg.mpg.de/~udt/
 Emmanuel Trelat  UPMC, France https://www.ljll.math.upmc.fr/trelat/
 Alain Trouvé  ENS Cachan, France http://atrouve.perso.math.cnrs.fr/
 Corinne Vachier  Université Paris Est Créteil, France www.upec.fr
 Claude Vallée  Poitiers University, France http://www.univpoitiers.fr/
 Geert Verdoolaege  Ghent University, Belgium http://www.ugent.be/ea/appliedphysics/en/research/fusion/personal_pages.htm/verdoolaege.htm
 JeanPhilippe Vert  Mines ParisTech, France http://cbio.ensmp.fr/~jvert/
 FrançoisXavier Vialard  Ceremade, Paris, France https://www.ceremade.dauphine.fr/~vialard/
 Rui Vigelis  Universidade Federal do ceará, Brazil
 Stephan Weis  Unicamp, Brazil http://www.stephanweis.info
 Laurent Younes  John Hopkins University, USA www.cis.jhu.edu
 Jun Zhang  University of Michigan, Ann Arbor, USA http://www.lsa.umich.edu/psych/junz/
Sponsors and Organizers
Links
Documents
Opening Session (chaired by Frédéric Barbaresco)
Keynote speach Matilde Marcolli (chaired by Daniel Bennequin)
From Geometry and Physics to ComputationalLinguisticsMatilde MarcolliGeometric Science of Information, Paris, October 2015Matilde MarcolliGeometry, Physics, LinguisticsA Mathematical Physicist’s adventures in LinguisticsBased on:1Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark,Crystal Liang, Shival Dasu, Matilde Marcolli, PersistentTopology of Syntax, arXiv:1507.051342Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models ofSyntax and Language Evolution, arXiv:1508.005043Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun,Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence andrecoverability of syntactic parameters in sparse distributedmemories, arXiv:1510.063424Sharjeel Aziz, VyLuan Huynh, David Warrick, MatildeMarcolli, Syntactic Phylogenetic Trees, in preparation...coming soon to an arXiv near youMatilde MarcolliGeometry, Physics, LinguisticsWhat is Linguistics?• Linguistics is the scientiﬁc study of language What is Language? (langage, lenguaje, ...) What is a Language? (lange, lengua,...)Similar to ‘What is Life?’ or ‘What is an organism?’ in biology• natural languageas opposed to artiﬁcial (formal, programming, ...) languages• The point of view we will focus on:Language is a kind of Structure It can be approached mathematically and computationally, likemany other kinds of structures The main purpose of mathematics is the understanding ofstructuresMatilde MarcolliGeometry, Physics, Linguistics• How are di↵erent languages related?What does it mean that they come in families?• How do languages evolve in time?Phylogenetics, Historical Linguistics, Etymology• How does the process of language acquisition work?(Neuroscience)• Semiotic viewpoint (mathematical theory of communication)• Discrete versus Continuum(probabilistic methods, versus discrete structures)• Descriptive or Predictive?to be predictive, a science needs good mathematical modelsMatilde MarcolliGeometry, Physics, LinguisticsA language exists at many di↵erent levels of structureAn Analogy: Physics looks very di↵erent at di↵erent scales:General Relativity and Cosmology (1010 m)Classical Physics (⇠ 1 m)Quantum Physics ( 10Quantum Gravity (103510m)m)Despite dreams of a Uniﬁed Theory, we deal with di↵erentmathematical models for di↵erent levels of structureMatilde MarcolliGeometry, Physics, LinguisticsSimilarly, we view language at di↵erent “scales”:units of sound (phonology)words (morphology)sentences (syntax)global meaning (semantics)We expect to be dealing with di↵erent mathematical structuresand di↵erent models at these various di↵erent levelsMain level I will focus on: SyntaxMatilde MarcolliGeometry, Physics, LinguisticsLinguistics view of syntax kind of looks like this...Alexander Calder, Mobile, 1960Matilde MarcolliGeometry, Physics, LinguisticsModern Syntactic Theory:• grammaticality: judgement on whether a sentence is well formed(grammatical) in a given language, ilanguage gives people thecapacity to decide on grammaticality• generative grammar: produce a set of rules that correctly predictgrammaticality of sentences• universal grammar: ability to learn grammar is built in thehuman brain, e.g. properties like distinction between nouns andverbs are universal ... is universal grammar a falsiﬁable theory?Matilde MarcolliGeometry, Physics, LinguisticsPrinciples and Parameters (Government and Binding)(Chomsky, 1981)• principles: general rules of grammar• parameters: binary variables (on/o↵ switches) that distinguishlanguages in terms of syntactic structures• Example of parameter: headdirectionality(headinitial versus headﬁnal)English is headinitial, Japanese is headﬁnalVP= verb phrase, TP= tense phrase, DP= determiner phraseMatilde MarcolliGeometry, Physics, Linguistics...but not always so clearcut: German can use both structuresauf seine Kinder stolze Vater (headﬁnal) orer ist stolz auf seine Kinder (headinitial)AP= adjective phrase, PP= prepositional phrase• Corpora based statistical analysis of headdirectionality (HaitaoLiu, 2010): a continuum between headinitial and headﬁnalMatilde MarcolliGeometry, Physics, LinguisticsExamples of ParametersHeaddirectionalitySubjectsideProdropNullsubjectProblems• Interdependencies between parameters• Diachronic changes of parameters in language evolutionMatilde MarcolliGeometry, Physics, LinguisticsDependent parameters• nullsubject parameter: can drop subjectExample: among Latin languages, Italian and Spanish havenullsubject (+), French does not ()it rains, piove, llueve, il pleut• prodrop parameter: can drop pronouns in sentences• Prodrop controls NullsubjectHow many independent parameters?Geometry of the space of syntactic parameters?Matilde MarcolliGeometry, Physics, LinguisticsPersistent Topology of Syntax• Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark,Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topologyof Syntax, arXiv:1507.05134Databases of Syntactic Parameters of World Languages:1Syntactic Structures of World Languages (SSWL)http://sswl.railsplayground.net/2TerraLing http://www.terraling.com/3World Atlas of Language Structures (WALS)http://wals.info/Matilde MarcolliGeometry, Physics, LinguisticsPersistent Topology of Data Setshow data cluster around topological shapes at di↵erent scalesMatilde MarcolliGeometry, Physics, LinguisticsVietoris–Rips complexes• set X = {x↵ } of points in Euclidean space EN , distancePd(x, y ) = kx y k = ( N (xj yj )2 )1/2j=1• VietorisRips complex R(X , ✏) of scale ✏ over ﬁeld K:Rn (X , ✏) is Kvector space spanned by all unordered (n + 1)tuplesof points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distancesd(x↵i , x↵j ) ✏Matilde MarcolliGeometry, Physics, Linguistics• inclusion maps R(X , ✏1 ) ,! R(X , ✏2 ) for ✏1 < ✏2 induce maps inhomology by functoriality Hn (X , ✏1 ) ! Hn (X , ✏2 )barcode diagrams: births and deaths of persistent generatorsMatilde MarcolliGeometry, Physics, LinguisticsPersistent Topology of Syntactic Parameters• Data: 252 languages from SSWL with 115 parameters• if consider all world languages together too much noise in thepersistent topology: subdivide by language families• Principal Component Analysis: reduce dimensionality of data• compute Vietoris–Rips complex and barcode diagramsPersistent H0 : clustering of data in components– language subfamiliesPersistent H1 : clustering of data along closed curves (circles)– linguistic meaning?Matilde MarcolliGeometry, Physics, LinguisticsSources of Persistent H1• “Hopf bifurcation” type phenomenon• two di↵erent branches of a tree closing up in a looptwo di↵erent types of phenomena of historical linguisticdevelopment within a language familyMatilde MarcolliGeometry, Physics, LinguisticsPersistent Topology of IndoEuropean Languages• Two persistent generators of H0 (IndoIranian, European)• One persistent generator of H1Matilde MarcolliGeometry, Physics, LinguisticsPersistent Topology of Niger–Congo Languages• Three persistent components of H0(Mande, AtlanticCongo, Kordofanian)• No persistent H1Matilde MarcolliGeometry, Physics, LinguisticsThe origin of persistent H1 of IndoEuropean Languages?Naive guess: the AngloNorman bridge ... but lexical not syntacticMatilde MarcolliGeometry, Physics, LinguisticsAnswer: No, it is not the AngloNorman bridge!Persistent topology of the Germanic+Latin languagesMatilde MarcolliGeometry, Physics, LinguisticsAnswer: It’s all because of Ancient Greek!Persistent topology with Hellenic (and IndoIranic) branch removedMatilde MarcolliGeometry, Physics, LinguisticsSyntactic Parameters as Dynamical Variables• Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSVVery uneven distribution across world languagesMatilde MarcolliGeometry, Physics, Linguistics• Word order distribution: a neuroscience explanation? D. Kemmerer, The crosslinguistic prevalence of SOV and SVOword orders reﬂects the sequential and hierarchical representationof action in Broca’s area, Language and Linguistics Compass, 6(2012) N.1, 50–66.• Internal reasons for diachronic switch? F.Antinucci, A.Duranti, L.Gebert, Relative clause structure,relative clause perception, and the change from SOV to SVO,Cognition, Vol.7 (1979) N.2 145–176.Matilde MarcolliGeometry, Physics, LinguisticsChanges over time in Word Order• Ancient Greek: switched from Homeric to Classical A. Taylor, The change from SOV to SVO in Ancient Greek,Language Variation and Change, 6 (1994) 1–37• Sanskrit: di↵erent word orders allowed, but prevalent one inVedic Sanskrit is SOV (switched at least twice by inﬂuence ofDravidian languages) F.J. Staal, Word Order in Sanskrit and Universal Grammar,Springer, 1967• English: switched from Old English (transitional between SOVand SVO) to Middle English (SVO) J. McLaughlin, Old English Syntax: a handbook, Walter deGruyter, 1983.Syntactic Parameters are Dynamical in Language EvolutionMatilde MarcolliGeometry, Physics, LinguisticsSpin Glass Models of Syntax• Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models ofSyntax and Language Evolution, arXiv:1508.00504– focus on linguistic change caused by language interactions– think of syntactic parameters as spin variables– spin interaction tends to align (ferromagnet)– strength of interaction proportional to bilingualism (MediaLab)– role of temperature parameter: probabilistic interpretation ofparameters– not all parameters are independent: entailment relations– Metropolis–Hastings algorithm: simulate evolutionMatilde MarcolliGeometry, Physics, LinguisticsThe Ising Model of spin systems on a graph G• conﬁgurations of spins s : V (G ) ! {±1}• magnetic ﬁeld B and correlation strength J: HamiltonianXXH(s) =Jsv sv 0Bsve2E (G ):@(e)={v ,v 0 }v 2V (G )• ﬁrst term measures degree of alignment of nearby spins• second term measures alignment of spins with direction ofmagnetic ﬁeldMatilde MarcolliGeometry, Physics, LinguisticsEquilibrium Probability Distribution• Partition Function ZG ( )ZG ( ) =Xexp(H(s))s:V (G )!{±1}• Probability distribution on the conﬁguration space: Gibbsmeasuree H(s)PG , (s) =ZG ( )• low energy states weight most• at low temperature (large ): ground state dominates; at highertemperature ( small) higher energy states also contributeMatilde MarcolliGeometry, Physics, LinguisticsAverage Spin MagnetizationMG ( ) =1#V (G )XXsv P(s)s:V (G )!{±1} v 2V (G )• Free energy FG ( , B) = log ZG ( , B)✓◆11 @FG ( , B)MG ( ) =B=0#V (G )@BIsing Model on a 2dimensional lattice• 9 critical temperature T = Tc where phase transition occurs• for T > Tc equilibrium state has m(T ) = 0 (computed withrespect to the equilibrium Gibbs measure PG ,• demagnetization: on average as many up as down spins• for T < Tc have m(T ) > 0: spontaneous magnetizationMatilde MarcolliGeometry, Physics, LinguisticsSyntactic Parameters and Ising/Potts Models• characterize set of n = 2N languages Li by binary strings of Nsyntactic parameters (Ising model)• or by ternary strings (Potts model) if take values ±1 forparameters that are set and 0 for parameters that are not deﬁnedin a certain language• a system of n interacting languages = graph G with n = #V (G )• languages Li = vertices of the graph (e.g. language thatoccupies a certain geographic area)• languages that have interaction with each other = edges E (G )(geographical proximity, or high volume of exchange for otherreasons)Matilde MarcolliGeometry, Physics, Linguisticsgraph of language interaction (detail) from Global LanguageNetwork of MIT MediaLab, with interaction strengths Je on edgesbased on number of book translations (or Wikipedia edits)Matilde MarcolliGeometry, Physics, Linguistics• if only one syntactic parameter, would have an Ising model onthe graph G : conﬁgurations s : V (G ) ! {±1} set the parameterat all the locations on the graph• variable interaction energies along edges (some pairs oflanguages interact more than others) • magnetic ﬁeld B andcorrelation strength J: HamiltonianH(s) =Xe2E (G ):@(e)={v ,v 0 }NXJe sv ,i sv 0 ,ii=1• if N parameters, conﬁgurationss = (s1 , . . . , sN ) : V (G ) ! {±1}N• if all N parameters are independent, then it would be like havingN noninteracting copies of a Ising model on the same graph G (orN independent choices of an initial state in an Ising model on G )Matilde MarcolliGeometry, Physics, LinguisticsMetropolis–Hastings• detailed balance condition P(s)P(s ! s 0 ) = P(s 0 )P(s 0 ! s) forprobabilities of transitioning between states (Markov process)• transition probabilities P(s ! s 0 ) = ⇡A (s ! s 0 ) · ⇡(s ! s 0 ) with⇡(s ! s 0 ) conditional probability of proposing state s 0 given states and ⇡A (s ! s 0 ) conditional probability of accepting it• Metropolis–Hastings choice of acceptance distribution (Gibbs)⇢1if H(s 0 ) H(s) 00⇡A (s ! s ) =0)exp( (H(sH(s))) if H(s 0 ) H(s) > 0.satisfying detailed balance• selection probabilities ⇡(s ! s 0 ) singlespinﬂip dynamics• ergodicity of Markov process ) unique stationary distributionMatilde MarcolliGeometry, Physics, LinguisticsExample: Single parameter dynamics SubjectVerb parameterInitial conﬁguration: most languages in SSWL have +1 forSubjectVerb; use interaction energies from MediaLab dataMatilde MarcolliGeometry, Physics, LinguisticsEquilibrium: low temperature all aligned to +1; high temperature:Temperature: ﬂuctuations in bilingual users between di↵erentstructures (“codeswitching” in Linguistics)Matilde MarcolliGeometry, Physics, LinguisticsEntailment relations among parameters• Example: {p1 , p2 } = {Strong Deixis, Strong Anaphoricity}`1`2`3`4p1+11+1+1p2+10+11{`1 , `2 , `3 , `4 } = {English, Welsh, Russian, Bulgarian}Matilde MarcolliGeometry, Physics, LinguisticsModeling Entailment• variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} andY`,p2 = S`,p2  2 {0, 1}• Hamiltonian H = HE + HVXHE = Hp 1 + Hp 2 =J``0`,`0 2languagesHV =XHV ,` =`J` > 0 antiferromagneticX⇣J`S`,p1 ,S`0 ,p1+S`,p2 ,S`0 ,p2⌘X`,p1 ,Y`,p2`• two parameters: temperature as before and coupling energy ofentailment• if freeze p1 and evolution for p2 : Potts model with externalmagnetic ﬁeldMatilde MarcolliGeometry, Physics, LinguisticsAcceptance probabilities⇡A (s ! s ± 1 (mod 3)) =H⇢1exp(:= min{H(s + 1 (mod 3)), H(sH)ififHH1 (mod 3))}0> 0.H(s)Equilibrium conﬁguration(p1 , p2 )`1`2`3`4HT/HE(+1, 0)(+1, 1)( 1, 0)(+1, +1)HT/LE(+1, 1)( 1, 1)( 1, +1)( 1, 1)Matilde MarcolliLT/HE(+1, +1)(+1, +1)(+1, +1)(+1, +1)LT/LE(+1, 1)(+1, 1)( 1, 0)( 1, 0)Geometry, Physics, LinguisticsAverage value of spinp1 left and p2 right in low entailment energy caseMatilde MarcolliGeometry, Physics, LinguisticsSyntactic Parameters in Kanerva Networks• Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun,Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence andrecoverability of syntactic parameters in sparse distributedmemories, arXiv:1510.06342– Address two issues: relative prevalence of di↵erent syntacticparameters and “degree of recoverability” (as sign of underlyingrelations between parameters)– If corrupt information about one parameter in data of group oflanguages can recover it from the data of the other parameters?– Answer: di↵erent parameters have di↵erent degrees ofrecoverability– Used 21 parameters and 165 languages from SSWL databaseMatilde MarcolliGeometry, Physics, LinguisticsKanerva networks (sparse distributed memories)• P. Kanerva, Sparse Distributed Memory, MIT Press, 1988.• ﬁeld F2 = {0, 1}, vector space FN large N2• uniform random sample of 2k hard locations with 2k << 2N• median Hamming distance between hard locations• Hamming spheres of radius slightly larger than median value(access sphere)• writing to network: storing datum X 2 FN , each hard location in2access sphere of X gets ith coordinate (initialized at zero)incremented depending on ith entry ot X• reading at a location: ith entry determined by majority rule ofith entries of all stored data in hard locations within access sphereKanerva networks are good at reconstructing corrupted dataMatilde MarcolliGeometry, Physics, LinguisticsProcedure• 165 data points (languages) stored in a Kanerva Network in F212(choice of 21 parameters)• corrupting one parameter at a time: analyze recoverability• language bitstring with a single corrupted bit used as readlocation and resulting bit string compared to original bitstring(Hamming distance)• resulting average Hamming distance used as score ofrecoverability (lowest = most easily recoverable parameter)Matilde MarcolliGeometry, Physics, LinguisticsParameters and frequencies01 SubjectVerb (0.64957267)02 VerbSubject (0.31623933)03 VerbObject (0.61538464)04 ObjectVerb (0.32478634)05 SubjectVerbObject (0.56837606)06 SubjectObjectVerb (0.30769232)07 VerbSubjectObject (0.1923077)08 VerbObjectSubject (0.15811966)09 ObjectSubjectVerb (0.12393162)10 ObjectVerbSubject (0.10683761)11 AdpositionNounPhrase (0.58974361)12 NounPhraseAdposition (0.2905983)13 AdjectiveNoun (0.41025642)14 NounAdjective (0.52564102)15 NumeralNoun (0.48290598)16 NounNumeral (0.38034189)17 DemonstrativeNoun (0.47435898)18 NounDemonstrative (0.38461539)19 PossessorNoun (0.38034189)20 NounPossessor (0.49145299)A01 AttributiveAdjectiveAgreement (0.46581197)Matilde MarcolliGeometry, Physics, LinguisticsMatilde MarcolliGeometry, Physics, LinguisticsOverall e↵ect related to relative prevalence of a parameterMatilde MarcolliGeometry, Physics, LinguisticsMore reﬁned e↵ect after normalizing for prelavence (syntacticdependencies)Matilde MarcolliGeometry, Physics, Linguistics• Overall e↵ect relating recoverability in a Kanerva Network toprevalence of a certain parameter among languages (depends onlyon frequencies: see in random data with assigned frequencies)• Additional e↵ects (that deviate from random case) which detectpossible dependencies among syntactic parameters: increasedrecoverability beyond what e↵ect based on frequency• Possible neuroscience implications? Kanerva Networks as modelsof human memory (parameter prevalence linked to neurosciencemodels)• More reﬁned data if divided by language families?Matilde MarcolliGeometry, Physics, LinguisticsPhylogenetic Linguistics (WORK IN PROGRESS)• Constructing family trees for languages(sometimes possibly graphs with loops)• Main information about subgrouping: shared innovationa speciﬁc change with respect to other languages in the family thatonly happens in a certain subset of languages Example: among Mayan languages: Huastecan branchcharacterized by initial w becoming voiceless before a vowel and tsbecoming t, q becoming k, ... Quichean branch by velar nasalbecoming velar fricative, ´ becoming ˇ (prepalatal a↵ricate toccpalatoalveolar)...Known result by traditional Historical Linguistics methods:Matilde MarcolliGeometry, Physics, LinguisticsMayan Language TreeMatilde MarcolliGeometry, Physics, LinguisticsComputational Methods for Phylogenetic Linguistics• Peter Foster, Colin Renfrew, Phylogenetic methods and theprehistory of languages, McDonald Institute Monographs, 2006• Several computational methods for constructing phylogenetictrees available from mathematical and computational biology• Phylogeny Programshttp://evolution.genetics.washington.edu/phylip/software.html• Standardized lexical databases: Swadesh list(100 words, or 207 words)Matilde MarcolliGeometry, Physics, Linguistics• Use Swadesh lists of languages in a given family to look forcognates: without additional etymological information (keep false positives) with additional etymological information (remove false positives)• Two further choices about loan words: remove loan words keep loan words• Keeping loan words produces graphs that are not trees• Without loan words it should produce trees, but small loops stillappear due to ambiguities (di↵erent possible trees matching samedata)... more precisely: coding of lexical data ...Matilde MarcolliGeometry, Physics, LinguisticsCoding of lexical data• After compiling lists of cognate words for pairs of languageswithin a given family(with/without lexical information and loan words)• Produce a binary string S(L1 , L2 ) = (s1 , . . . , sN ) for each pair oflanguages L1 , L2 , with entry 0 or 1 at the ith word of the lexicallist of N words if cognates for that meaning exist in the twolanguages or not (important to pay attention to synonyms)• lexical Hamming distance between two languagesd(L1 , L2 ) = #{i 2 {1, . . . , N}  si = 1}counts words in the list that do not have cognates in L1 and L2Matilde MarcolliGeometry, Physics, LinguisticsDistancematrix method of phylogenetic inference• after producing a measure of “genetic distance”Hamming metric dH (La , Lb )• hierarchical data clustering: collecting objects in clustersaccording to their distance• simplest method of tree construction: neighbor joining(1)  create a (leaf) vertex for each index a(ranging over languages in given family)(2)  given distance matrix D = (Dab )distances between each pair Dab = dH (La , Lb )construct a new matrix QtestQ = (Qab )withQab = (n2)DabnXDakk=1this matrix Q decides ﬁrst pairs of vertices to joinMatilde MarcolliGeometry, Physics, LinguisticsnXk=1Dbk(3)  identify entries Qab with lowest values: join each such pair(a, b) of leaf vertices to a newly created vertex vab(4)  set distances to new vertex by11d(a, vab ) = Dab +22(n 2)d(b, vab )
Random Geometry/Homology (chaired by Laurent Decreusefond/Frédéric Chazal)
Let m be a random tessellation in R d , d ≥ 1, observed in the window W p = ρ1/d[0, 1] d , ρ > 0, and let f be a geometrical characteristic. We investigate the asymptotic behaviour of the maximum of f(C) over all cells C ∈ m with nucleus W p as ρ goes to infinity.When the normalized maximum converges, we show that its asymptotic distribution depends on the socalled extremal index. Two examples of extremal indices are provided for PoissonVoronoi and PoissonDelaunay tessellations.

Random tessellationsMain problemThe extremal index for a random tessellationNicolas ChenavierUniversité Littoral Côte d’OpaleOctober 28, 2015Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemPlan1Random tessellations2Main problem3Extremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexRandom tessellationsDeﬁnitionA (convex) random tessellation m in Rd is a partition of the Euclideanspace into random polytopes (called cells).We will only consider the particular case where m is a :PoissonVoronoi tessellation ;PoissonDelaunay tessellation.Nicolas ChenavierThe extremal index for a random tessellationRandom tessellationsMain problemExtremal indexPoissonVoronoi tessellationX, Poisson point process in Rd ;∀x ∈ X, CX (x ) := {y ∈ Rd , y − x  ≤ y − x , x ∈ X} (Voronoi cellwith nucleus x ) ;mPVT := {CX (x ), x ∈ X}, PoissonVoronoi tessellation ;∀CX (x ) ∈ mPVT , we let z(CX (x )) := x .Mosaique de PoissonVoronoiCX (x)xFigure: PoissonVoronoi tessellation.Nicolas ChenavierThe extremal index for a random tessellationRandom tessellationsMain problemPoissonDelaunay tessellationX, Poisson point process in Rd ;∀x , x ∈ X, x and x deﬁne an edge if CX (x ) ∩ CX (x ) = ∅ ;mPDT , PoissonDelaunay tessellation ;∀C ∈ mPDT , we let z(C ) as the circumcenter of C .Mosaique de PoissonDelaunayxz(C)xFigure: PoissonDelaunay tessellation.Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexTypical cellDeﬁnitionLet m be a stationary random tessellation. The typical cell of m is arandom polytope C in Rd which distribution given as follows : for eachbounded translationinvariant function g : {polytopes} → R, we haveE [g(C)] :=1EN(B) C ∈m,g(C ) ,z(C )∈Bwhere :B ⊂ R is any Borel subset with ﬁnite and nonempty volume ;N(B) is the mean number of cells with nucleus in B.Nicolas ChenavierThe extremal index for a random tessellationRandom tessellations1Random tessellations2Main problem3Main problemExtremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemMain problemFramework :m = mPVT , mPDT ;Wρ := [0, ρ]d , with ρ > 0 ;g : {polytopes} → R, geometrical characteristic.Aim : asymptotic behaviour, when ρ → ∞, ofMg,ρ = max g(C )?C ∈m,z(C )∈WρFigure: Voronoi cell maximizing the area in the square.Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemObjective and applicationsObjective : ﬁnd ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρ t + bg,ρconverges, as ρ → ∞, for each t ∈ R.Applications :regularity of the tessellation ;discrimination of point processes and tessellations ;PoissonVoronoi approximation.Approximation de PoissonVoronoiFigure: PoissonVoronoi approximation.Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexAsymptotics under a local correlation conditionNotation : let vρ := ag,ρ t + bρ be a threshold such thatρd · P (g(C) > vρ ) −→ τ,ρ→∞for some τ := τ (t) ≥ 0.Local Correlation Condition (LCC)ρd· E(log ρ)d(C1 ,C2 )= ∈m2 ,1g(C1 )>vρ ,g(C2 )>vρ −→ 0. ρ→∞z(C1 ),z(C2 )∈[0,log ρ]dTheoremUnder (LCC), we have :P (Mg,ρ ≤ vρ ) −→ e −τ .ρ→∞Nicolas ChenavierThe extremal index for a random tessellationRandom tessellations1Random tessellations2Main problem3Main problemExtremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexDeﬁnition of the extremal indexProposition(τ )Assume that for all τ ≥ 0, there exists a threshold vρ depending on ρ(τ )such that ρd · P(g(C) > vρ ) −→ τ . Then there exists θ ∈ [0, 1] suchρ→∞that, for all τ ≥ 0,(τlim P(Mg,ρ ≤ vρ ) ) = e −θτ ,ρ→∞provided that the limit exists.DeﬁnitionAccording to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if,for each τ ≥ 0, we have :(τρd · P g(C) > vρ )(τ−→ τ and lim P(Mg,ρ ≤ vρ ) ) = e −θτ .ρ→∞Nicolas Chenavierρ→∞The extremal index for a random tessellationRandom tessellationsMain problemExample 1Framework :m := mPVT : PoissonVoronoi tessellation ;g(C ) := r (C ) : inradius of any cell C := CX (x ) with x ∈ X, i.e.r (C ) := r (CX (x )) := max{r ∈ R+ : B(x , r ) ⊂ CX (x )}.rmin,PVT (ρ) := minx ∈X∩Wρ r (CX (x )).Extremal index : θ = 1/2 for each d ≥ 1.Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexMinimum of inradius for a PoissonVoronoi tessellation−1.0−0.5y0.00.51.0(b) Typical Poisson−Voronoï cell with a small inradii−1.0−0.50.00.51.0xNicolas ChenavierThe extremal index for a random tessellationRandom tessellationsMain problemExample 2Framework :m := mPDT : PoissonDelaunay tessellation ;g(C ) := R(C ) : circumradius of any cell C , i.e.R(C ) := min{r ∈ R+ : B(x , r ) ⊃ C }.Rmax,PDT (ρ) := maxC ∈mPDT :z(C )∈Wρ R(C ).Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3.Nicolas ChenavierThe extremal index for a random tessellationExtremal indexRandom tessellationsMain problemExtremal indexMaximum of circumradius for a PoissonDelaunaytessellation−15−10−5y051015(d) Typical Poisson−Delaunay cell with a large circumradii−15−10−5051015xNicolas ChenavierThe extremal index for a random tessellationRandom tessellationsMain problemExtremal indexWork in progressJoint work with C. Robert (ISFA, Lyon 1) :new characterization of the extremal index (not based onclassical block and run estimators appearing in the classical ExtremeValue Theory) ;simulation and estimation for the extremal index and clustersize distribution (for PoissonVoronoi and PoissonDelaunaytessellations).Nicolas ChenavierThe extremal index for a random tessellation
A model of twotype (or twocolor) interacting random balls is introduced. Each colored random set is a union of random balls and the interaction relies on the volume of the intersection between the two random sets. This model is motivated by the detection and quantification of colocalization between two proteins. Simulation and inference are discussed. Since all individual balls cannot been identified, e.g. a ball may contain another one, standard methods of inference as likelihood or pseudolikelihood are not available and we apply the TakacsFiksel method with a specific choice of test functions.

A testing procedureA model for colocalizationEstimationA twocolor interacting random ballsmodel for colocalization analysis ofproteins.Frédéric Lavancier,Laboratoire de Mathématiques Jean Leray, NantesINRIA Rennes, Serpico teamJoint work with C. Kervrann (INRIA Rennes, Serpico team).GSI’15, 2830 October 2015.A testing procedureA model for colocalizationEstimationIntroduction : some dataVesicular traﬃcking analysis and colocalization quantiﬁcation byTIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA]?=⇒Langerin proteins (left) and Rab11 GTPase proteins (right).Is there colocalization ?⇔Is there some spatial dependencies between the two types of proteins ?A testing procedureA model for colocalizationImage preprocessingAfter segmentationSuperposition :?⇒After a Gaussian weights thresholding?⇒Superposition :EstimationA testing procedureA model for colocalizationThe problem of colocalization can be described as follows :We observe two binary images in a domain Ω :First image (green) : realization of a random set Γ1 ∩ ΩSecond image (red) : realization of a random set Γ2 ∩ Ω−→ Is there some dependencies between Γ1 and Γ2 ?−→ If so, can we quantify/model this dependency ?EstimationA testing procedureA model for colocalization1A testing procedure2A model for colocalization3Estimation problemEstimationA testing procedureA model for colocalization1A testing procedure2A model for colocalization3Estimation problemEstimationA testing procedureA model for colocalizationTesting procedureLet a generic point o ∈ Rd andp1 = P (o ∈ Γ1 ),p2 = P (o ∈ Γ2 ),p12 = P (o ∈ Γ1 ∩ Γ2 ).If Γ1 and Γ2 are independent, then p12 = p1 p2 .EstimationA testing procedureA model for colocalizationEstimationTesting procedureLet a generic point o ∈ Rd andp1 = P (o ∈ Γ1 ),p2 = P (o ∈ Γ2 ),p12 = P (o ∈ Γ1 ∩ Γ2 ).If Γ1 and Γ2 are independent, then p12 = p1 p2 .A natural measure of departure from independency isp12 − p1 p2ˆˆ ˆwherep1 = Ω−1ˆ1Γ1 (x),x∈Ωp2 = Ω−1ˆ1Γ2 (x),x∈Ωp12 = Ω−1ˆ1Γ1 ∩Γ2 (x).x∈ΩA testing procedureA model for colocalizationEstimationTesting procedureAssume Γ1 and Γ2 are mdependent stationary random sets.If Γ1 is independent of Γ2 , then as Ω tends to inﬁnity,p12 − p1 p2ˆˆ ˆT := Ωx∈Ωy∈ΩˆˆC1 (x − y)C2 (x − y)→ N (0, 1)ˆˆwhere C1 and C2 are the empirical covariance functions of Γ1 ∩ Ω andΓ2 ∩ Ω respectively.Hence to test the null hypothesis of independence between Γ1 and Γ2pvalue = 2(1 − Φ(T ))where Φ is the c.d.f. of the standard normal distribution.A testing procedureA model for colocalizationSome simulationsSimulations when Γ1 and Γ2 are union of random ballsEstimationA testing procedureA model for colocalizationSome simulationsSimulations when Γ1 and Γ2 are union of random ballsIndependent case (and each color ∼ Poisson)Number of p−values < 0.05 over 100 realizations : 4.EstimationA testing procedureA model for colocalizationSome simulationsDependent case (see later for the model)Number of p−values < 0.05 over 100 realizations : 100.EstimationA testing procedureA model for colocalizationSome simulationsIndependent case, larger radiiNumber of p−values < 0.05 over 100 realizations : 5.EstimationA testing procedureA model for colocalizationSome simulationsDependent case, larger radii and "small" dependenceNumber of p−values < 0.05 over 100 realizations : 97.EstimationA testing procedureA model for colocalizationReal DataDepending on the preprocessing :T = 9.9p − value = 0T = 17p − value = 0EstimationA testing procedureA model for colocalization1A testing procedure2A model for colocalization3Estimation problemEstimationA testing procedureA model for colocalizationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .EstimationA testing procedureA model for colocalizationEstimationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .The reference model is a twotype (two colors) Boolean modelwith equiprobable marks, where the radii follow somedistribution µ on [Rmin , Rmax ].A testing procedureA model for colocalizationEstimationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .The reference model is a twotype (two colors) Boolean modelwith equiprobable marks, where the radii follow somedistribution µ on [Rmin , Rmax ].Notation :(ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}.→ viewed as a marked point, marked by R and i.xi : collection of all marked points with color i. HenceΓi =(ξ, R)i(ξ,R)i ∈xix = x1 ∪ x2 : collection of all marked points.A testing procedureA model for colocalizationExample : three realizations of the reference processEstimationA testing procedureA model for colocalizationEstimationThe modelWe consider a density on any bounded domain Ω with respect to thereference modeln nf (x) ∝ z1 1 z2 2 eθ Γ1 ∩ Γ2 where n1 : number of green balls and n2 : number of red balls.This density depends on 3 parametersz1 : rules the mean number of green ballsz2 : rules the mean number of red ballsθ : interaction parameter.If θ > 0 : attraction (colocalization) between Γ1 and Γ2If θ = 0 : back to the reference model, up to the intensities(independence between Γ1 and Γ2 ).A testing procedureA model for colocalizationSimulationRealizations can be generated by a standard birthdeathMetropolisHastings algorithm.Examples :EstimationA testing procedureA model for colocalization1A testing procedure2A model for colocalization3Estimation problemEstimationA testing procedureA model for colocalizationEstimation problemAim : Assume that the law µ of the radii is known. Given arealization of Γ1 ∪ Γ2 on Ω, estimate z1 , z2 and θ in1z n1 z n2 eθ Γ1 ∩ Γ2  ,c(z1 , z2 , θ) 1 2where c(z1 , z2 , θ) is the normalizing constant.f (x) =EstimationA testing procedureA model for colocalizationEstimation problemAim : Assume that the law µ of the radii is known. Given arealization of Γ1 ∪ Γ2 on Ω, estimate z1 , z2 and θ in1z n1 z n2 eθ Γ1 ∩ Γ2  ,c(z1 , z2 , θ) 1 2where c(z1 , z2 , θ) is the normalizing constant.f (x) =Issue :The number of balls n1 and n2 is not observed.⇒ likelihood or pseudolikelihood based inference is not feasible.=EstimationA testing procedureA model for colocalizationEstimationAn equilibrium equationConsider, for any nonnegative function h,C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h)whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈Ωand for i = 1, 2,RmaxIi (θ; h) =h((ξ, R)i , x)Rmin∗z1 ,∗z2Ωλ((ξ, R)i , x)dξ µ(dR).2zi∗Denoting byand θ the true unknown values of the parameters,we know from the GeorgiiNguyenZessin equation that for any h∗ ∗E(C(z1 , z2 , θ∗ ; h)) = 0.A testing procedureA model for colocalizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the TakacsFiksel estimator isdeﬁned byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .k=1(1)A testing procedureA model for colocalizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the TakacsFiksel estimator isdeﬁned byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012.A testing procedureA model for colocalizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the TakacsFiksel estimator isdeﬁned byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012.Recall that C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h) whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈ΩTo be able to compute (1), we must ﬁnd test functions hk such thatS(h) is computableA testing procedureA model for colocalizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the TakacsFiksel estimator isdeﬁned byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012.Recall that C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h) whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈ΩTo be able to compute (1), we must ﬁnd test functions hk such thatS(h) is computableHow many ? At least K = 3 because 3 parameters to estimate.A testing procedureA model for colocalizationEstimationA ﬁrst possibility :h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}where S(ξ, R) is the sphere {y, y − ξ = R}.⇓⇓⇓⇓A testing procedureWhat about S(h1 ) =A model for colocalization(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?EstimationA testing procedureA model for colocalizationWhat about S(h1 ) ==(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?EstimationA testing procedureA model for colocalizationWhat about S(h1 ) =(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?=S(h1 ) = P(Γ1 )⇒(the perimeter of Γ1 )EstimationA testing procedureA model for colocalizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the TakacsFiksel contrast function C(z1 , z2 , θ; h1 ) is computable.A testing procedureA model for colocalizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the TakacsFiksel contrast function C(z1 , z2 , θ; h1 ) is computable.Similarly,Let h2 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ2 )c 1{i=2} thenS(h2 ) = P(Γ2 ).A testing procedureA model for colocalizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the TakacsFiksel contrast function C(z1 , z2 , θ; h1 ) is computable.Similarly,Let h2 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ2 )c 1{i=2} thenS(h2 ) = P(Γ2 ).Let h3 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2 )c thenS(h3 ) = P(Γ1 ∪ Γ2 ).A testing procedureA model for colocalizationEstimationSimulations with test functions h1 , h2 and h3 over 100 realizationsθ = 0.05 (and large radii)30Frequency102010050Frequency154020θ = 0.2 (and small radii)0.150.200.250.300.000.010.020.030.040.050.060.07A testing procedureA model for colocalizationReal DataWe assume the law of the radii is uniform on [Rmin , Rmax ].(each image is embedded in [0, 250] × [0, 280])Rmin = 0.5, Rmax = 2.5ˆθ = 0.45Rmin = 0.5, Rmax = 10ˆθ = 0.03EstimationA testing procedureA model for colocalizationEstimationConclusionThe testing procedureallows to detect colocalization between two binary imagesis easy and fast to implementdoes not depend too much on the image preprocessingThe model for colocalizationrelies on geometric features (area of intersection)can be ﬁtted by the TakacsFiksel methodallows to compare the degree of colocalization θ between twopairs of images if the laws of radii are similar
The characteristic independence property of Poisson point processes gives an intuitive way to explain why a sequence of point processes becoming less and less repulsive can converge to a Poisson point process. The aim of this paper is to show this convergence for sequences built by superposing, thinning or rescaling determinantal processes. We use Papangelou intensities and Stein’s method to prove this result with a topology based on total variation distance.

IGeneralities on point processesIIKantorovichRubinstein distanceIIIApplications2nd conference on Geometric Science of InformationAurélien VASSEURAsymptotics of some Point Processes TransformationsEcole Polytechnique, ParisSaclay, October 28, 20151/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplications300020001000100020003000Mobile network in Paris  Motivation−2000020004000−2000020004000Figure: On the left, positions of all BS in Paris. On the right, locationsof BS for one frequency band.2/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsTable of ContentsIGeneralities on point processesCorrelation function, Papangelou intensity and repulsivenessDeterminantal point processesIIKantorovichRubinstein distanceConvergence dened by dKRdKR (PPP, Φ) ≤ "nice" upper boundIIIApplications to transformations of point processesSuperpositionThinningRescaling3/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processFrameworkY a locally compact metric spaceµ a diuse and locally nite measure of reference on YNY the space of congurations on YNY the space of nite congurations on Y4/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processCorrelation function  Papangelou intensityCorrelation function+∞f (α)] =E[α∈NYα⊂Φk=01k!ρ of a point process Φ:ˆYkf · ρ({x1 , . . . , xk })µ(dx1 ) . . . µ(dxk )ρ(α) ≈ probability of nding a point in at least each point of αof a point process Φ:Papangelou intensity cˆE[x∈ΦE[c(x, Φ)f (x, Φ)]µ(dx)f (x, Φ \ {x})] =Yc(x, ξ) ≈ conditionnal probability of nding a point in x given ξ5/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processPoint processPropertiesIntensity measure: A ∈ FY →´A ρ({x})µ(dx)ρ({x}) = E[c(x, Φ)]If Φ is nite, then:ˆI (ΦP= 1) =c(x, ∅)µ(dx)I (ΦP= 0).Y6/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processPoisson point processPropertiesΦ PPP with intensity M(dy ) = m(y )dyCorrelation function: ρ(α) =m(x)x∈αPapangelou intensity: c(x, ξ) = m(x)7/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processRepulsive point processDenitionPoint process repulsive ifφ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ)Point process weakly repulsive ifc(x, ξ) ≤ c(x, ∅)8/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processDeterminantal point processDenitionDeterminantal point process DPP(K , µ):ρ({x1 , · · · , xk }) = det(K (xi , xj ), 1 ≤ i, j ≤ k)PropositionPapangelou intensity of DPP(K , µ):c(x0 , {x1 , · · · , xk }) =det(J(xi , xj ), 0 ≤ i, j ≤ k)det(J(xi , xj ), 1 ≤ i, j ≤ k)where J = (I − K )−1 K .9/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsFrameworkDeterminantal point processGinibre point processDenitionGinibre point process on B(0, R):1 − 1 (x2 +y 2 ) xye 2e 1{x∈B(0,R)} 1{y ∈B(0,R)}πβ Ginibre point process on B(0, R):K (x, y ) =Kβ (x, y ) =11 − 21 (x2 +y 2 ) β xye βeπ1{x∈B(0,R)} 1{y ∈B(0,R)}10/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsβ GinibreFrameworkDeterminantal point processpoint processes11/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsKantorovichRubinstein distanceTotal variation distance:supdTV (ν1 , ν2 ) :=A∈FYν1 (A),ν2 (A)<∞ν1 (A) − ν2 (A)F : NY → IR is 1Lipschitz (F ∈ Lip1 ) ifF (φ1 ) − F (φ2 ) ≤ dTV (φ1 , φ2 )for allφ1 , φ2 ∈ NYKantorovichRubinstein distance:ˆdKR (IP1 , IP2 ) = supF ∈Lip1NYˆF (φ) IP1 (dφ) −NYF (φ) IP2 (dφ)Convergence in K.R. distance =⇒ Convergence in lawstrictly12/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsUpper bound theoremTheorem (L. Decreusefond, AV)Φ a nite point process on YζM a PPP with nite control measure M(dy ) = m(y )µ(dy ).Then, we have:ˆ ˆdKR (IPΦ ,I ζM )Pm(y ) − c(y , φ)IPΦ (dφ)µ(dy ).≤YNY13/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningSuperposition of weakly repulsive point processesΦn,1 , . . . , Φn,n : n independent, nite andpoint processes on Yweakly repulsivenΦn :=Φn,ii=1Rn :=´nYρn,i (x) − m(x)µ(dx)i=1ζM a PPP with control measure M(dx) = m(x)µ(dx)14/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningSuperposition of weakly repulsive point processesProposition (LD, AV)nΦn =Φn,ii=1ζM a PPP with control measure M(dx) = m(x)µ(dx)ˆdKR (IPΦn , IPζM ) ≤ Rn + max1≤i≤n Yρn,i (x)µ(dx)15/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningConsequenceCorollary (LD, AV)f pdf on [0; 1] such that f (0+ ) := limx→0+ f (x) ∈ IRΛ compact subset ofI +R1 1X1 , . . . , Xn i.i.d. with pdf fn = n f ( n ·)Φn = {X1 , . . . , Xn } ∩ ΛˆdKR (Φn , ζ) ≤Λ11fx − f (0+ ) dx +nnˆfΛ1x dxnwhere ζ is the PPP(f (0+ )) reduced to Λ.16/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsβ GinibreApplication to superpositionApplication to β Ginibre point processesApplication to thinningpoint processesProposition (LD, AV)Φn the βn Ginibre process reduced to a compact set Λζ the PPP with intensity 1/π on ΛdKR (IPΦn , IPζ ) ≤ C βn17/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningKallenberg's theoremTheorem (O. Kallenberg)Φn a nite point process on Yuniformlypn : Y → [0; 1) − − − 0−−→Φn the pn thinning of ΦnγM a Cox processlawlaw(pn Φn ) −→ M ⇐⇒ (Φn ) −→ γM−−18/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningPolish distance(fn ) a sequence in the space of real continuous functions withcompact support generating FYd ∗ (ν1 , ν2 ) =n≥11xΨ(ν1 (fn ) − ν2 (fn )) with Ψ(x) =n21+x∗dKR the KantorovichRubinstein distance associated to thedistance d ∗19/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningThinned point processesProposition (LD, AV)Φn a nite point process on Ypn : Y → [0; 1)Φn the pn thinning of ΦnγM a Cox processThen, we have:∗dKR (IPΦn , IPγM ) ≤ 2E[2∗pn (x)] + dKR (IPM , IPpn Φn ).x∈Φn20/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsApplication to superpositionApplication to β Ginibre point processesApplication to thinningReferencesL.Decreusefond, and A.Vasseur, Asymptotics of superpositionof point processes, 2015.H.O. Georgii, and H.J. Yoo, Conditional intensity andgibbsianness of determinantal point processes, J. Statist.Phys. (118), January 2004.J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P.Martins, and Wei Chen, A Case Study on Regularity in CellularNetwork Deployment, IEEE Wireless Communications Letters,2015.A.F. Karr, Point Processes and their Statistical Inference, Ann.Probab. 15 (1987), no. 3, 12261227.21/22Aurélien VASSEURTélécom ParisTechIGeneralities on point processesIIKantorovichRubinstein distanceIIIApplicationsThank you ...... for your attention. Questions?22/22Aurélien VASSEURTélécom ParisTech
Random polytopes have constituted some of the central objects of stochastic geometry for more than 150 years. They are in general generated as convex hulls of a random set of points in the Euclidean space. The study of such models requires the use of ingredients coming from both convex geometry and probability theory. In the last decades, the study has been focused on their asymptotic properties and in particular expectation and variance estimates. In several joint works with Tomasz Schreiber and J. E. Yukich, we have investigated the scaling limit of several models (uniform model in the unitball, uniform model in a smooth convex body, Gaussian model) and have deduced from it limiting variances for several geometric characteristics including the number of kdimensional faces and the volume. In this paper, we survey the most recent advances on these questions and we emphasize the particular cases of random polytopes in the unitball and Gaussian polytopes.

Asymptotic properties of random polytopesPierre Calka2nd conference on Geometric Science of Information´Ecole Polytechnique, ParisSaclay, 28 October 2015defaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsSketch of proof: Gaussian caseJoint work with Joseph Yukich (Lehigh University, USA) & TomaszSchreiber (Toru´ University, Poland)ndefaultOutlineRandom polytopes: an overviewUniform polytopesGaussian polytopesExpectation asymptoticsMain results: variance asymptoticsSketch of proof: Gaussian casedefaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 50 , K balln≥1K 50 , K squaredefaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 100 , K balln≥1K 100 , K squaredefaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 500 , K balln≥1K 500 , K squaredefaultUniform polytopesPoissonian modelK := convex body of RdPλ ,λ > 0:=Poisson point process of intensity measure λdxKλ := Conv(Pλ ∩ K )K 500 , K ballK 500 , K squaredefaultGaussian polytopesBinomial modelΦd (x) :=(Xk ,1e− x(2π)d/2k ∈ N∗ ):=2 /2, x ∈ Rd ,d ≥2independent and with density ΦdK n := Conv(X1 , · · · , Xn )Poissonian modelPλ ,λ > 0:=Poisson point process of intensity measure λΦd (x)dxKλ := Conv(Pλ )defaultGaussian polytopesK 50K 100K 500defaultGaussian polytopes: spherical shapeK 50K 100K 500defaultAsymptotic spherical shape of the Gaussian polytopeGeﬀroy (1961) :dH (K n , B(0,2 log(n))) → 0 a.s.n→∞K 50000defaultExpectation asymptoticsConsidered functionalsfk (·) := number of kdimensional faces, 0 ≤ k ≤ dVol(·) := volumeB. Efron’s relation (1965): Ef0(K n ) = n 1 −EVol(K n−1 )Vol(K )1Uniform polytope, K smoothλ→∞d−1κsd+1 ds λ d+1E[fk (Kλ )] ∼ cd,k∂Kκs := Gaussian curvature of ∂KUniform polytope, K polytope′E[fk (Kλ )] ∼ cd,k F (K ) logd−1 (λ)λ→∞F (K ) := number of ﬂags of KGaussian polytope′′E[fk (Kλ )] ∼ cd,k logλ→∞d−12(λ)A. R´nyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Aﬀentranger & R. Schneider (1992)edefaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsUniform model, K smoothUniform model, K polytopeGaussian modelSketch of proof: Gaussian casedefaultUniform model, K smoothK := convex body of Rd with volume 1 and with a C 3 boundaryκ := Gaussian curvature of ∂Klim λ−(d−1)/(d+1) Var[fk (Kλ )] = ck,dλ→∞′lim λ(d+3)/(d+1) Var [Vol(Kλ )] = cdλ→∞′(ck,d , cd explicit positive constants)M. Reitzner (2005): Var[fk (Kλ )] = Θ(λ(d −1)/(d +1) )κ(z)1/(d+1) dz∂Kκ(z)1/(d+1) dz∂KdefaultUniform model, K polytopeK := simple polytope of Rd with volume 1i.e. each vertex of K is included in exactly d facets.lim log−(d−1) (λ)Var[fk (Kλ )] = cd,k f0 (K )λ→∞′lim λ2 log−(d−1) (λ)Var[Vol(Kλ )] = cd,k f0 (K )λ→∞′(ck,d , ck,d explicit positive constants)I. B´r´ny & M. Reitzner (2010): Var[fk (Kλ )] = Θ(log(d −1) (λ))aadefaultGaussian modellim log−d−12λ→∞lim log−k+λ→∞EVol(Kλ )Vol(B(0, 2 log(n)))d+32(λ)Var[fk (Kλ )] = ck,d′(λ)Var[Vol(Kλ )] = ck,d= 1−λ→∞d log(log(λ))+O4 log(λ)′(ck,d , ck,d explicit positive constants)D. Hug & M. Reitzner (2005), I. B´r´ny & V. Vu (2007): Var[fk (Kλ )] = Θ(log(d −1)/2 (λ))aa1log(λ)defaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsSketch of proof: Gaussian caseCalculation of the expectation of fk (Kλ )Calculation of the variance of fk (Kλ )Scaling transformdefaultCalculation of the expectation of fk (Kλ )1. Decomposition:E[fk (Kλ )] = E x∈Pλξ(x, Pλ ) :=1k+1 #kfaceξ(x, Pλ )containing x0if x extremeif not2. MeckeSlivnyak formulaE[fk (Kλ )] = λE[ξ(x, Pλ ∪ {x})]Φd (x)dx3. Limit of the expectation of one scoredefaultCalculation of the variance of fk (Kλ)Var[fk (Kλ )]= Eξ 2 (x, Pλ ) +x∈Pλ=λx=y∈Pλ2ξ(x, Pλ )ξ(y , Pλ ) − (E[fk (Kλ )])E[ξ 2 (x, Pλ ∪ {x})]Φd (x)dx+ λ2− λ2=λE[ξ(x, Pλ ∪ {x, y })ξ(y , Pλ ∪ {x, y })]Φd (x)Φd (y )dxdyE[ξ(x, Pλ ∪ {x})]E[ξ(y , Pλ ∪ {y })]Φd (x)Φd (y )dxdyE[ξ 2 (x, Pλ ∪ {x})]Φd (x)dx+ λ2”Cov”(ξ(x, Pλ ∪ {x}), ξ(y , Pλ ∪ {y }))Φd (x)Φd (y )dxdydefaultScaling transformQuestion : Limits of E[ξ(x, Pλ )] and ”Cov”(ξ(x, Pλ ), ξ(y , Pλ )) ?Answer : deﬁnition of limit scores in a new space◮ Critical radius Rλ :=2 log λ − log(2 · (2π)d · log λ)◮ Scaling transform :λT :Rd \ {0} −→ Rd−1 × Rx−→Rλ exp−1d−1xx ,2Rλ (1 −xRλ )expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1◮ Image of a score : ξ (λ) (T λ (x), T λ (Pλ )) := ξ(x, Pλ )D◮ Convergence of Pλ : T λ (Pλ ) → P o`uP : Poisson point process in Rd−1 × R of intensity measure e h dv dhdefaultAction of the scaling transformΠ↑ := {(v , h) ∈ Rd−1 × R : h ≥v2Π↓ := {(v , h) ∈ Rd−1 × R : h ≤ −HalfspaceSphere containing OConvexityExtreme pointkface of KλRλ Vol2}v22}Translate of Π↓Translate of ∂Π↑Parabolic convexity(x + Π↑ ) not fully coveredParabolic kfaceVoldefaultLimiting pictureΨ :=x∈P (x+ Π↑ )In red : image of the balls of diameter [0, x] where x is extremedefaultLimiting pictureΦ :=x∈Rd−1 ×R:x+Π↓ ∩P=∅ (x+ Π↓ )In green : image of the boundary of the convex hull KλdefaultThank you for your attention!
Asymmetric information distances are used to define asymmetric norms and quasimetrics on the statistical manifold and its dual space of random variables. Quasimetric topology, generated by the KullbackLeibler (KL) divergence, is considered as the main example, and some of its topological properties are investigated.

Asymmetric Topologies on Statistical ManifoldsRoman V. BelavkinSchool of Science and TechnologyMiddlesex University, London NW4 4BT, UKGSI2015, October 28, 2015Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20151 / 16Sources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20152 / 16Sources and Consequences of AsymmetrySources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20153 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}qRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]Roman Belavkin (Middlesex University)Asymmetric TopologiesqOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)Roman Belavkin (Middlesex University)Asymmetric TopologiesqOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1}Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p − q = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1}sup{Ep−q {x} : Eq {ex − 1 − x} ≤ 1}xRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryAsymmetric Information DistancesKullbackLeibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KLdivergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p−q= inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1}sup{Ep−q {x} : Eq {ex − 1 − x} ≤ 1}xRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) KCauchy, weakly left (right) KCauchy, Cauchy.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) KCauchy, weakly left (right) KCauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) KCauchy, weakly left (right) KCauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) KCauchy, weakly left (right) KCauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.An asymmetric seminormed space may fail to be a topological vectorspace, because y → αy can be discontinuous (Borodin, 2001).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasipseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorﬀ T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodiﬀerent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) KCauchy, weakly left (right) KCauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.An asymmetric seminormed space may fail to be a topological vectorspace, because y → αy can be discontinuous (Borodin, 2001).Practically all other results have to be reconsidered (e.g. Bairecategory theorem, AlaogluBourbaki, etc). (Cobzas, 2013).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesMOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesMOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:µM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M = {u : D[(1 + u)z, z] ≤ 1}Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M = {u : D[(1 + u)z, z] ≤ 1}D = (1 + u) ln(1 + u) − u, zRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M◦M = {u : D[(1 + u)z, z] ≤ 1}{y : D∗ [x, 0] ≤ 1}D = (1 + u) ln(1 + u) − u, zRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M◦M = {u : D[(1 + u)z, z] ≤ 1}{y : D∗ [x, 0] ≤ 1}D∗ [x, 0]=ex− 1 − x, zRoman Belavkin (Middlesex University)D = (1 + u) ln(1 + u) − u, zAsymmetric TopologiesOctober 28, 20156 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )Roman Belavkin (Middlesex University)→∞Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−bRoman Belavkin (Middlesex University)22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.∞n nn=1 (2 /2 )Eq {x} =→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.12a−b22∈ dom Eq⊗p {ex }, − 1 a − b/2Roman Belavkin (Middlesex University)Asymmetric Topologies22∈ dom Eq⊗p {ex }October 28, 20157 / 16Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.∞n nn=1 (2 /2 )Eq {x} =→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.12a−b22∈ dom Eq⊗p {ex }, − 1 a − b/222∈ dom Eq⊗p {ex }0 ∈ Int(dom Eq⊗p {ex })/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16Method: Symmetric SandwichSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20158 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sAµM ◦ ≤ µ(−M ◦ ) ∨ µM ◦Roman Belavkin (Middlesex University)µ(−M )co ∧ µM ≤ µMAsymmetric TopologiesOctober 28, 20159 / 16Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sAµ(−M ◦ )co ∧ µM ◦ ≤ µM ◦Roman Belavkin (Middlesex University)Asymmetric TopologiesµM ≤ µ(−M ) ∨ µMOctober 28, 20159 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−1012ϕ∗ (x) = ex − 1 − xRoman Belavkin (Middlesex University)−2−1012ϕ(u) = (1 + u) ln(1 + u) − uAsymmetric TopologiesOctober 28, 201510 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+∗012ϕ(u) = (1 + u) ln(1 + u) − u= ϕ (x) ∈ ∆2/Roman Belavkin (Middlesex University)−1ϕ+ (u) = ϕ(u) ∈ ∆2Asymmetric TopologiesOctober 28, 201510 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−−1012ϕ(u) = (1 + u) ln(1 + u) − u∗ϕ+ (u) = ϕ(u) ∈ ∆2∗ϕ− (u) = ϕ(−u) ∈ ∆2/= ϕ (x) ∈ ∆2/= ϕ (−x) ∈ ∆2Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x∗ϕ= ϕ (x) ∈ ∆2/12ϕ+ (u) = ϕ(u) ∈ ∆2∗= ϕ (−x) ∈ ∆2ϕ− (u) = ϕ(−u) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}Roman Belavkin (Middlesex University)0ϕ(u) = (1 + u) ln(1 + u) − u∗∗−1uϕ = µ{u : ϕ(u), z ≤ 1}Asymmetric TopologiesOctober 28, 201510 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x∗ϕ012ϕ(u) = (1 + u) ln(1 + u) − u∗= ϕ (x) ∈ ∆2/ϕ+ (u) = ϕ(u) ∈ ∆2∗= ϕ (−x) ∈ ∆2∗−1ϕ− (u) = ϕ(−u) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}uϕ = µ{u : ϕ(u), z ≤ 1}Proposition·∗ ,ϕ+·∗ϕ−are Luxemburg norms and x∗ϕ−≤ x∗ ≤ xϕ∗ϕ+·ϕ+ ,·ϕ−are Luxemburg norms and uϕ+≤ uϕ ≤ uϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x∗ϕ012ϕ(u) = (1 + u) ln(1 + u) − u∗= ϕ (x) ∈ ∆2/ϕ+ (u) = ϕ(u) ∈ ∆2∗= ϕ (−x) ∈ ∆2∗−1ϕ− (u) = ϕ(−u) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}uϕ = µ{u : ϕ(u), z ≤ 1}Proposition·∗ ,ϕ+·∗ϕ−are Luxemburg norms and x∗ϕ−≤ x∗ ≤ xϕ∗ϕ+·ϕ+ ,·ϕ−are Luxemburg norms and uϕ+≤ uϕ ≤ uϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16ResultsSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201511 / 16ResultsKL Induces Hausdorﬀ (T2 ) Asymmetric TopologyTheorem(Y, · ϕ ) (resp. (X, · ∗ )) is Hausdorﬀ.ϕRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201512 / 16ResultsKL Induces Hausdorﬀ (T2 ) Asymmetric TopologyTheorem(Y, · ϕ ) (resp. (X, · ∗ )) is Hausdorﬀ.ϕProof.u ϕ+ ≤ uϕ (resp. x ϕ− ≤ xϕ ) implies (Y, · ϕ ) (resp. (X, · ∗ )) isϕﬁner than normed space (Y, · ϕ+ ) (resp. (X, · ∗ )).ϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201512 / 16ResultsSeparable SubspacesTheorem(Y, · ϕ+ ) (resp. (X, ·(resp. (X, · ∗ )).ϕRoman Belavkin (Middlesex University)∗ ))ϕ−is a separable Orlicz subspace of (Y, · ϕ )Asymmetric TopologiesOctober 28, 201513 / 16ResultsSeparable SubspacesTheorem(Y, · ϕ+ ) (resp. (X, ·(resp. (X, · ∗ )).ϕ∗ ))ϕ−is a separable Orlicz subspace of (Y, · ϕ )Proof.ϕ+ (u) = (1 + u) ln(1 + u) − u ∈ ∆2 (resp.ϕ∗ (x) = e−x − 1 + x ∈ ∆2 ). Note that ϕ− ∈ ∆2 and ϕ∗ ∈ ∆2 ./−+ /Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201513 / 16ResultsCompletenessTheorem(Y, · ϕ ) (resp. (X, · ∗ )) isϕ1ρsBiComplete: ρs Cauchy yn → y.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201514 / 16ResultsCompletenessTheorem(Y, · ϕ ) (resp. (X, · ∗ )) isϕρs1BiComplete: ρs Cauchy yn → y.2ρsequentially complete: ρs Cauchy yn → y.Roman Belavkin (Middlesex University)ρAsymmetric TopologiesOctober 28, 201514 / 16ResultsCompletenessTheorem(Y, · ϕ ) (resp. (X, · ∗ )) isϕρs1BiComplete: ρs Cauchy yn → y.2ρsequentially complete: ρs Cauchy yn → y.3Right Ksequentially complete: right KCauchy yn → y.ρRoman Belavkin (Middlesex University)ρAsymmetric TopologiesOctober 28, 201514 / 16ResultsCompletenessTheorem(Y, · ϕ ) (resp. (X, · ∗ )) isϕρs1BiComplete: ρs Cauchy yn → y.2ρsequentially complete: ρs Cauchy yn → y.3Right Ksequentially complete: right KCauchy yn → y.ρρProof.ρs (y, z) = z − yϕ ∨ y − zϕ ≤ y − z ϕ− , where (Y, · ϕ− ) is Banach.Then use theorems of Reilly et al. (1982) and Chen et al. (2007).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201514 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Hausdorﬀ.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Hausdorﬀ.Bicomplete, ρsequentially complete and right Ksequentially complete.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Hausdorﬀ.Bicomplete, ρsequentially complete and right Ksequentially complete.Contain a separable Orlicz subspace.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Hausdorﬀ.Bicomplete, ρsequentially complete and right Ksequentially complete.Contain a separable Orlicz subspace.Total boundedness, compactness?Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bereexamined.We have proved that topologies induced by the KLdivergence are:Hausdorﬀ.Bicomplete, ρsequentially complete and right Ksequentially complete.Contain a separable Orlicz subspace.Total boundedness, compactness?Other asymmetric information distances (e.g. Renyi divergence).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16ReferencesSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201516 / 16ResultsBorodin, P. A. (2001). The BanachMazur theorem for spaces withasymmetric norm. Mathematical Notes, 69(3–4), 298–305.Chen, S.A., Li, W., Zou, D., & Chen, S.B. (2007, Aug). Fixed pointtheorems in quasimetric spaces. In Machine learning andcybernetics, 2007 international conference on (Vol. 5, p. 24992504).IEEE.Cobzas, S. (2013). Functional analysis in asymmetric normed spaces.Birkh¨user.aFletcher, P., & Lindgren, W. F. (1982). Quasiuniform spaces (Vol. 77).New York: Marcel Dekker.Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982).Cauchy sequences in quasipseudometric spaces. Monatshefte f¨ruMathematik, 93, 127–140.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201516 / 16
Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)
We introduce a new approach to goodnessoffit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.

IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryGeometry of GoodnessofFit Testing in High Dimensional LowSample Size ModellingR. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 .1The Open University (EPSRC grant EP/L010429/1), United Kingdom2 University of Waterloo, CanadaGSI 2015, October 28th 2015Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryKey pointsIn CIG, the multinomial model∆k =(π0 , . . . , πk ) : πi ≥ 0,πi = 1iprovides a universal model.12goodnessofﬁt testing in large sparse extended multinomial contextsCressieRead power divergence λfamily  equivalent to Amari’s αfamilyasymptotic properties of two test statistics: Pearson’s χ2 test and deviancesimulation study for other statistics within power divergence family3kasymptotics instead of N asymptoticsRadka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryOutline1Introduction2Pearson’s χ2 versus the deviance3Other test statistics from power divergence family4SummaryRadka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryBig dataStatistical Theory and Methods for Complex, HighDimensional Data programme,Isaac Newton Institute (2008):. . . the practical environment has changed dramatically over the last twentyyears, with the spectacular evolution of computing facilities and theemergence of applications in which the number of experimental units isrelatively small but the underlying dimension is massive. . . . Areas ofapplication include image analysis, microarray analysis, ﬁnance, documentclassiﬁcation, astronomy and atmospheric science.continuous data  High dimensional low sample size data (HDLSS)discrete datadatabasesimage analysisSparsity (N << k) changes everything!Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryImage analysis  exampleFigure: m1 = 10, m2 = 10Dimension of a state space: k = 2m1 m2 − 1Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummarySparsity changes everythingS. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in LogLinearModelsDespite the widespread usage of these [loglinear] models, the applicabilityand statistical properties of loglinear models under sparse settings are stillvery poorly understood. As a result, even though highdimensionalsparse contingency tables constitute a type of data that is common inpractice, their analysis remains exceptionally difﬁcult.Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryOutline1Introduction2Pearson’s χ2 versus the deviance3Other test statistics from power divergence family4SummaryRadka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryExtended multinomial distributionLetn = (ni ) ∼ Mult(N, (πi )), i = 0, 1, . . . , k,where each πi ≥0.Goodnessofﬁt testH0 : π = π ∗ .Pearson’s χ2 test (Wald, score statistic)kW :=i=0∗(πi − ni /N )21≡ 2∗πiNki=0n2i− 1.∗πiRule of thumb (for accuracy of χ2 asymptotic approximation)kN πi ≥ 5Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryPerformance of Pearson’s χ2 test on the boundary  example(b) Sample of Wald Statistic20004000Wald Statistic0.030.020.0100.00Cell probability60000.0480000.05(a) Null distribution050100150200Rank of cell probability0200400600800IndexFigure: N = 50, k = 200, exponentially decreasing πiRadka SabolováGeometry of GOF Testing in HDLSS Modelling1000IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryPerformance of Pearson’s χ2 test on the boundary  theoryTheoremFor k > 1 and N ≥ 6, the ﬁrst three moments of W are:E(W )=k,Nπ (−1) − (k + 1)2 + 2k(N − 1)var(W ) =N3and E[{W − E(W )}3 ] given byπ (−2) − (k + 1)3 − (3k + 25 − 22N ) π (−1) − (k + 1)2 + g(k, N )N5where g(k, N ) = 4(N − 1)k(k + 2N − 5) > 0 and π (a) :=In particular, for ﬁxed k and N , as πmin → 0iaπi .var(W ) → ∞ and γ(W ) → +∞where γ(W ) := E[{W − E(W )}3 ]/{var(W )}3/2 .Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryThe deviance statisticDeﬁne the deviance D viaD/2==={0≤i≤k:ni >0}{0≤i≤k:ni >0}{0≤i≤k:ni >0}{ni log(ni /N ) − log(πi )}ni log(ni /N ) + log1πini log(ni /µi ),where µi := E(ni ) = N πi .Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryDistribution of deviancelet {n∗ , i = 0, . . . , k} be mutually independent, with n∗ ∼ P o(µi )iithen N ∗ := k n∗ ∼ P o(N ) and ni = (n∗ N ∗ = N ) ∼ M ult(N, πi )ii=0 ideﬁnekn∗N∗iS ∗ :==n∗ log(n∗ /µi )D∗ /2iii=0Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryDistribution of deviancelet {n∗ , i = 0, . . . , k} be mutually independent, with n∗ ∼ P o(µi )iithen N ∗ := k n∗ ∼ P o(N ) and ni = (n∗ N ∗ = N ) ∼ M ult(N, πi )ii=0 ideﬁnekn∗N∗iS ∗ :==n∗ log(n∗ /µi )D∗ /2iii=0deﬁne ν, τ and ρ viaNνN·:= E(S ∗ ) =√ρτ Nτ2ki=0NE(n∗ log {n∗ /µi })ii:= cov(S ∗ ) =where Ci := Cov(n∗ , n∗ log(n∗ /µi )) and Vi := Viii,ki=0 Ci,ki=0 Viar(n∗ log(n∗ /µi )).iiN·Then under equicontinuityDD/2 − − → N1 (ν, τ 2 (1 − ρ2 )).−−k→∞Radka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryUniformity near the boundary(b) Sample of Wald Statistic1109080Deviance70601500Wald Statistic25001000.050.040.030.025000.005000.01Cell probability(c) Sample of Deviance Statistic3500(a) Null distribution0501001502000200Rank of cell probability400600Index800100002004006008001000IndexFigure: Stability of sampling distributions  Pearson’s χ2 and deviance statistic, N = 50,k = 200, exponentially decreasing πiRadka SabolováGeometry of GOF Testing in HDLSS ModellingIntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryAsymptotic approximationsnormal approximation can be improvedχ2 approximation, correction for skewnesssymmetrised deviance statisticsChi−squared ApproximationSymmetrised Deviance6080Deviance quantiles10012090806050504070Normal quantiles90807060Chi−squared quantiles80706050Normal quantiles90100Normal Approximation6080100Deviance quantiles120406080100Symmetric Deviance quantilesFigure: Quality of kasymptotics approximations near the boundaryRadka SabolováGeometry of GOF Testing in HDLSS Modelling120IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryUniformity and higher momentsdoes kasymptotic approximation hold uniformly across the simplex?rewrite deviance asD∗ /2={0≤i≤k:n∗ >0}in∗ log(n∗ /µi ) = Γ∗ + ∆∗iiwherekΓ∗ :=αi n∗ and ∆∗ :
Local mixture models give an inferentially tractable but still flexible alternative to general mixture models. Their parameter space naturally includes boundaries; near these the behaviour of the likelihood is not standard. This paper shows how convex and differential geometries help in characterising these boundaries. In particular the geometry of polytopes, ruled and developable surfaces is exploited to develop efficient inferential algorithms.

Computing Boundaries in Local Mixture ModelsComputing Boundaries in Local MixtureModelsVahed Maroufy&Paul MarriottDepartment of Statistics and Actuarial ScienceUniversity of WaterlooOctober 28GSI 2015, ParisComputing Boundaries in Local Mixture ModelsOutlineOutline1Inﬂuence of boundaries on parameter inference2Local mixture models (LMM)3Parameter space and boundariesHard boundaries and Soft boundaries4Computing the boundaries for LMMs5Summary and future directionComputing Boundaries in Local Mixture ModelsBoundary inﬂuenceWhen boundary exits:MLE does not exist =⇒ ﬁnd the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,loglinear and graphical modelsGeyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004)Computing Boundaries in Local Mixture ModelsBoundary inﬂuenceWhen boundary exits:MLE does not exist =⇒ ﬁnd the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,loglinear and graphical modelsGeyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004)Computing Boundaries in Local Mixture ModelsBoundary inﬂuenceWhen boundary exits:MLE does not exist =⇒ ﬁnd the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,loglinear and graphical modelsGeyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004)Computing Boundaries in Local Mixture ModelsBoundary inﬂuenceWhen boundary exits:MLE does not exist =⇒ ﬁnd the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,loglinear and graphical modelsGeyer (2009), Rinaldo et al. (2009), AnayaIzquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004)Computing Boundaries in Local Mixture ModelsLMMsLocal Mixture ModelsDeﬁnitionMarriott (2002)g (x; µ, λ) = f (x; µ) +Propertieskj=2λj f (j) (x; µ),λ ∈ Λµ ⊂ R k−1AnayaIzquierdo and Marriott (2007)g is identiﬁable in all parameters and the parametrization (µ, λ) isorthogonal at λ = 0The log likelihood function of g is a concave function of λ at aﬁxed µ0Λµ is convexApproximate continuous mixture models when mixing is “small”f (x, µ) dQ(µ)MFamily of LMMs is richer that Family of mixturesComputing Boundaries in Local Mixture ModelsLMMsLocal Mixture ModelsDeﬁnitionMarriott (2002)g (x; µ, λ) = f (x; µ) +Propertieskj=2λj f (j) (x; µ),λ ∈ Λµ ⊂ R k−1AnayaIzquierdo and Marriott (2007)g is identiﬁable in all parameters and the parametrization (µ, λ) isorthogonal at λ = 0The log likelihood function of g is a concave function of λ at aﬁxed µ0Λµ is convexApproximate continuous mixture models when mixing is “small”f (x, µ) dQ(µ)MFamily of LMMs is richer that Family of mixturesComputing Boundaries in Local Mixture ModelsExample and MotivationExampleLMM of Normalf (x; µ) = φ(x; µ, σ 2 ), (σ 2 is known).g (x; µ, λ) = φ(x; µ, σ 2 ) 1 +kj=2λj pj (x) ,λ ∈ Λµpj (x) polynomial of degree j .Why we care about λ and Λµ ?They are interpretable (2) µg = σ 2 + 2λ2(3)µ = 6λ3 g(4) (4)µg = µφ + 12σ 2 λ2 + 24λ4λ represents the mixing distribution Q via its moments inf (x, µ) dQ(µ)M(1)Computing Boundaries in Local Mixture ModelsExample and MotivationExampleLMM of Normalf (x; µ) = φ(x; µ, σ 2 ), (σ 2 is known).g (x; µ, λ) = φ(x; µ, σ 2 ) 1 +kj=2λj pj (x) ,λ ∈ Λµpj (x) polynomial of degree j .Why we care about λ and Λµ ?They are interpretable (2) µg = σ 2 + 2λ2(3)µ = 6λ3 g(4) (4)µg = µφ + 12σ 2 λ2 + 24λ4λ represents the mixing distribution Q via its moments inf (x, µ) dQ(µ)M(1)Computing Boundaries in Local Mixture ModelsExample and MotivationThe costs for all these good properties and ﬂexibility areHard boundary =⇒ Positivity (boundary of Λµ )Soft boundary =⇒ Mixture behaviorWe compute them for two models here:PoissonandNormalWe ﬁx k = 4Computing Boundaries in Local Mixture ModelsBoundariesHard boundaryΛµ = λ  1 +kj=2λj qj (x; µ) ≥ 0, ∀x ∈ S ,Λµ is intersection of halfspaces so convexHard boundary is constructed by a set of (hyper)planesSoft boundaryDeﬁnitionFor a density function f (x; µ) with k ﬁnite moments let,Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).and for compact M deﬁneC = convhull{Mr (f )µ ∈ M}Then, the boundary of C is called the soft boundary.Computing Boundaries in Local Mixture ModelsBoundariesHard boundaryΛµ = λ  1 +kj=2λj qj (x; µ) ≥ 0, ∀x ∈ S ,Λµ is intersection of halfspaces so convexHard boundary is constructed by a set of (hyper)planesSoft boundaryDeﬁnitionFor a density function f (x; µ) with k ﬁnite moments let,Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).and for compact M deﬁneC = convhull{Mr (f )µ ∈ M}Then, the boundary of C is called the soft boundary.Computing Boundaries in Local Mixture ModelsComputing hard boundaryPoisson modelΛµ = λ  A2 (x) λ2 + A3 (x)λ3 + A4 (x) λ4 + 1 ≥ 0, ∀x ∈ Z+ ,Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3.TheoremFor a LMM of a Poisson distribution, for each µ, the space Λµ can bearbitrarily well approximated, as measured by volume for example, by aﬁnite polytope.Computing Boundaries in Local Mixture ModelsComputing hard boundaryPoisson modelΛµ = λ  A2 (x) λ2 + A3 (x)λ3 + A4 (x) λ4 + 1 ≥ 0, ∀x ∈ Z+ ,Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3.TheoremFor a LMM of a Poisson distribution, for each µ, the space Λµ can bearbitrarily well approximated, as measured by volume for example, by aﬁnite polytope.Computing Boundaries in Local Mixture ModelsComputing hard boundaryNormal modellet y = x−µσ2Λµ = {λ  (y 2 − 1)λ2 + (y 3 − 3y )λ3 + (y 4 − 6y 2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}.We need a more geometric tools to compute this boundary.Computing Boundaries in Local Mixture ModelsRuled and developable surfacesRuled and developable surfacesDeﬁnitionRuled surface:Γ(x, γ) = α(x) + γ · β(x),x ∈ I ⊂ R, γ ∈ R kDevelopable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I .Computing Boundaries in Local Mixture ModelsRuled and developable surfacesDeﬁnitionThe family of planes,A = {λ ∈ R3  a(x) · λ + d(x) = 0, x ∈ R},each determined by an x ∈ R, is called a oneparameter inﬁnite family ofplanes. Each element of the set{λ ∈ R3 a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R}is called a characteristic line of the surface at x and the union is calledthe envelope of the family.A characteristic line is the intersection of two consecutiveplanesThe envelope is a developable surfaceComputing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMHard boundary of for Normal LMM(y 2 − 1)λ2 + (y 3 − 3y )λ3 + (y 4 − 6y 2 + 3)λ4 + 1 = 0, ∀y ∈ R .λ2λ2λ3λ4λ4λ3Figure : Left: The hard boundary for the normal LMM (shaded) as a subset ofa self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2.Computing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMSoft boundary of for Normal LMMrecap : Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).For visualization purposes let k = 3, (µ ∈ M, ﬁx σ)M3 (f )=(µ, µ2 + σ 2 , µ3 + 3µσ 2 ),M3 (g )=(µ, µ2 + σ 2 + 2λ2 , µ3 + 3µσ 2 + 6µλ2 + 6λ3 ).Figure : the 3D curve ϕ(µ); Middle: the bounding ruled surface γa (µ, u); Right: theconvex subspace restricted to soft boundary.Computing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMRuled surface parametrizationTwo boundary surfaces, each constructed by a curve and a set of lines attachedto it.γa (µ, u) = ϕ(µ) + u La (µ)γb (µ, u) = ϕ(µ) + u Lb (µ)wherefor M = [a, b] and ϕ(µ) = M3 (f )La (µ): lines between ϕ(a) and ϕ(µ)Lb (µ): lines between ϕ(µ) and ϕ(b)Computing Boundaries in Local Mixture ModelsSummarySummaryUnderstanding these boundaries is important if we want to exploitthe nice statistical properties of LMMThe boundaries described in this paper have both discrete aspectsand smooth aspectsThe two example discussed represent the structure for almost allexponential family modelsIt is a interesting problem to design optimization algorithms onthese boundaries for ﬁnding boundary maximizers of likelihoodComputing Boundaries in Local Mixture ModelsReferencesAnayaIzquierdo, K., Critchley, F., and Marriott, P. (2013). when are ﬁrst order asymptotics adequate? adiagnostic. Stat, 3(1):17–22.AnayaIzquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640.Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics ResearchNotices, rnt078.Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20.Boroczky, K. and Fodor, F. (2008). Approximating 3dimensional convex bodies by polytopes with a restrictednumber of edges. Contributions to Algebra and Geometry, 49(1):177–193.Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal ofSymbolic Computation, 38(4):1261–1272.Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal ofStatistics, 3:259–289.Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of DiﬀerentialGeometry, 57(2):239–271.Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society,36(4):483–492.Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93.Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families withapplication to exponential random graph models. Electronic Journal of Statistics, 3:446–484.Computing Boundaries in Local Mixture ModelsENDThank You
We generalize the O(dnϵ2)time (1 + ε)approximation algorithm for the smallest enclosing Euclidean ball [2,10] to point sets in hyperbolic geometry of arbitrary dimension. We guarantee a O(1/ϵ2) convergence time by using a closedform formula to compute the geodesic αmidpoint between any two points. Those results allow us to apply the hyperbolic kcenter clustering for statistical locationscale families or for multivariate spherical normal distributions by using their Fisher information matrix as the underlying Riemannian hyperbolic metric.

Approximating Covering and Minimum EnclosingBalls in Hyperbolic GeometryFrank Nielsen1Ga¨tan Hadjeres2e´Ecole Polytechnique1Sony Computer Science Laboratories, Inc1,2Conference on Geometric Science of Informationc 2015 Frank Nielsen  Ga¨tan Hadjerese1The Minimum Enclosing Ball problemFinding the Minimum Enclosing Ball (or the 1center) of a ﬁnitepoint set P = {p1 , . . . , pn } in the metric space (X , dX (., .))consists in ﬁnding c ∈ X such thatc = argminc∈Xmax dX (c , p)p∈PFigure : A ﬁnite point set P and its minimum enclosing ball MEB(P)c 2015 Frank Nielsen  Ga¨tan Hadjerese2The approximating minimum enclosing ball problemIn a euclidean setting, this problem iswelldeﬁned: uniqueness of the center c ∗ and radius R ∗ of theMEBcomputationally intractable in high dimensions.We ﬁx an > 0 and focus on the Approximate Minimum EnclosingBall problem of ﬁnding an approximation c ∈ X of MEB(P) suchthatdX (c, p) ≤ (1 + )R ∗ ∀p ∈ P.c 2015 Frank Nielsen  Ga¨tan Hadjerese3The approximating minimum enclosing ball problem: priorworkApproximate solution in the euclidean case are given by Badoiuand Clarkson’s algorithm [Badoiu and Clarkson, 2008]:Initialize center c1 ∈ PRepeat 1/2times the following update:ci+1 = ci +fi − cii +1where fi ∈ P is the farthest point from ci .How to deal with point sets whose underlying geometry is noteuclidean ?c 2015 Frank Nielsen  Ga¨tan Hadjerese4The approximating minimum enclosing ball problem: priorworkThis algorithm has been generalized todually ﬂat manifolds [Nock and Nielsen, 2005]Riemannian manifolds [Arnaudon and Nielsen, 2013]Applying these results to hyperbolic geometry give the existenceand uniqueness of MEB(P), butgive no explicit bounds on the number of iterationsassume that we are able to precisely cut geodesics.c 2015 Frank Nielsen  Ga¨tan Hadjerese5The approximating minimum enclosing ball problem: ourcontributionWe analyze the case of point sets whose underlying geometry ishyperbolic.Using a closedform formula to compute geodesic αmidpoints, weobtaina intrinsic (1 + )approximation algorithm to the approximateminimum enclosing ball problema O(1/ 2 ) convergence time guaranteea oneclass clustering algorithm for speciﬁc subfamilies ofnormal distributions using their Fisher information metricc 2015 Frank Nielsen  Ga¨tan Hadjerese6Model of ddimensional hyperbolic geometry: ThePoincar´ ball modeleThe Poincar´ ball model (Bd , ρ(., .)) consists in the open unit balleBd = {x ∈ Rd : x < 1} together with the hyperbolic distanceρ (p, q) = arcosh 1 +2 p−q 2(1 − p 2 ) (1 − q 2 ),∀p, q ∈ Bd .This distance induces on the metric space (Bd , ρ) a Riemannianstructure.c 2015 Frank Nielsen  Ga¨tan Hadjerese7Geodesics in the Poincar´ ball modeleShorter paths between two points (geodesics) are exactlystraight (euclidean) lines passing through the origincircle arcs orthogonal to the unit sphereFigure : “Straight” lines in the Poincar´ ball modelec 2015 Frank Nielsen  Ga¨tan Hadjerese8Circles in the Poincar´ ball modeleCircles in the Poincar´ ball modelelook like euclidean circlesbut with diﬀerent centerFigure : Diﬀerence between euclidean MEB (in blue) and hyperbolicMEB (in red) for the set of blue points in hyperbolic Poincar´ disk (ineblack). The red cross is the hyperbolic center of the red circle while thepink one is its euclidean center.c 2015 Frank Nielsen  Ga¨tan Hadjerese9Translations in the Poincar´ ball modeleTp (x) =1− px + x 2 + 2 x, p + 1 pp 2 x 2 + 2 x, p + 12Figure : Tiling of the hyperbolic plane by squaresc 2015 Frank Nielsen  Ga¨tan Hadjerese10Closedform formula for computing αmidpointsA point m is the αmidpoint p#α q of two points p, q for α ∈ [0, 1]ifm belongs to the geodesic joining the two points p, qm veriﬁesρ (p, mα ) = αρ (p, q) .c 2015 Frank Nielsen  Ga¨tan Hadjerese11Closedform formula for computing αmidpointsA point m is the αmidpoint p#α q of two points p, q for α ∈ [0, 1]ifm belongs to the geodesic joining the two points p, qm veriﬁesρ (p, mα ) = αρ (p, q) .For the special case p = (0, . . . , 0), q = (xq , 0, . . . , 0), we havep#α q := (xα , 0, . . . , 0)withcα,q − 1xα =,cα,q + 1c 2015 Frank Nielsen  Ga¨tan Hadjeresewherecα,q := eαρ(p,q)=1 + xq1 − xqα.11Closedform formula for computing αmidpointsNoting thatp#α q = Tp (T−p (p) #α T−p (q))∀p, q ∈ Bdwe obtaina closedform formula for computing p#α qhow to compute p#α q in linear time O(d)that these transformations are exact.c 2015 Frank Nielsen  Ga¨tan Hadjerese12(1+ )approximation of an hyperbolic enclosing ball ofﬁxed radiusFor a ﬁxed radius r > R ∗ , we can ﬁnd c ∈ Bd such thatρ (c, P) ≤ (1 + )r∀p ∈ PwithAlgorithm 1: (1 + )approximation of EHB(P, r )1: c0 := p12: t := 03: while ∃p ∈ P such that p ∈ B (ct , (1 + ) r ) do/4:let p ∈ P be such a point5:α := ρ(ct ,p)−rρ(ct ,p)6:ct+1 := ct #α p7:t := t+18: end while9: return ctc 2015 Frank Nielsen  Ga¨tan Hadjerese13Idea of the proofptBy the hyperbolic law ofcosines :rch (ρt ) ≥ ch (h) ch (ρt+1 )TTch (ρ1 ) ≥ ch (h) ≥ ch ( r ) .ct+1h> rr ≤rθ ρt+1θρtc∗ctFigure : Update of ctc 2015 Frank Nielsen  Ga¨tan Hadjerese14(1+ )approximation of an hyperbolic enclosing ball ofﬁxed radiusThe EHB(P, r ) algorithm is a O(1/ 2 )time algorithm whichreturnsthe center of a hyperbolic enclosing ball with radius(1 + )rin less than 4/c 2015 Frank Nielsen  Ga¨tan Hadjerese2iterations.15(1+ )approximation of an hyperbolic enclosing ball ofﬁxed radiusThe EHB(P, r ) algorithm is a O(1/ 2 )time algorithm whichreturnsthe center of a hyperbolic enclosing ball with radius(1 + )rin less than 4/2iterations.Our error with the true MEHB center c ∗ veriﬁesρ (c, c ∗ ) ≤ arcoshc 2015 Frank Nielsen  Ga¨tan Hadjeresech ((1 + ) r )ch (R ∗ )15(1 + + 2 /4)approximation of MEHB(P)In fact, as R ∗ is unknown in general, the EHB algorithm returnsfor any r :an (1 + )approximation of EHB(P) if r ≥ R ∗the fact that r < R ∗ if the result obtained after more than4/ 2 iterations is not good enough.c 2015 Frank Nielsen  Ga¨tan Hadjerese16(1 + + 2 /4)approximation of MEHB(P)In fact, as R ∗ is unknown in general, the EHB algorithm returnsfor any r :an (1 + )approximation of EHB(P) if r ≥ R ∗the fact that r < R ∗ if the result obtained after more than4/ 2 iterations is not good enough.This suggests to implement a dichotomic search in order tocompute an approximation of the minimal hyperbolic enclosingball. We obtaina O(1 + +in ON2logc 2015 Frank Nielsen  Ga¨tan Hadjerese2 /4)approximation1of MEHB(P)iterations.16(1 + + 2 /4)approximation of MEHB(P) algorithmAlgorithm 2: (1 + )approximation of MEHB(P)1: c := p12: rmax := ρ (c, P); rmin = rmax ; tmax := +∞23: r := rmax ;4: repeat5:ctemp := Alg1 P, r , 2 , interrupt if t > tmax in Alg16:if call of Alg1 has been interrupted then7:rmin := r8:else9:rmax := r ; c := ctemp10:end if11:dr := rmax −rmin ; r := rmin + dr ;2log(ch(1+ /2)r )−log(ch(rmin ))tmax :=log(ch(r /2))12: until 2dr < rmin 213: return cc 2015 Frank Nielsen  Ga¨tan Hadjerese17Experimental resultsThe number of iterations does not depend on d.Figure : Number of αmidpoint calculations as a function oflogarithmic scale for diﬀerent values of d.c 2015 Frank Nielsen  Ga¨tan Hadjeresein18Experimental resultsThe running time is approximately O( dn ) (vertical translation2in logarithmic scale).Figure : execution time as a function ofdiﬀerent values of d.c 2015 Frank Nielsen  Ga¨tan Hadjeresein logarithmic scale for19ApplicationsHyperbolic geometry arises when considering certain subfamilies ofmultivariate normal distributions.For instance, the following subfamiliesN µ, σ 2 In of nvariate normal distributions with scalarcovariance matrix (In is the n × n identity matrix),22N µ, diag σ1 , . . . , σn of nvariate normal distributions withdiagonal covariance matrixN(µ0 , Σ) of dvariate normal distributions with ﬁxed mean µ0and arbitrary positive deﬁnite covariance matrix Σare statistical manifolds whose Fisher information metric ishyperbolic.c 2015 Frank Nielsen  Ga¨tan Hadjerese20ApplicationsIn particular, our results apply to the twodimensionallocationscale subfamily:Figure : MEHB (D) of probability density functions (left) in the (µ, σ)superior halfplane (right). P = {A, B, C }.c 2015 Frank Nielsen  Ga¨tan Hadjerese21OpeningsPlugging the EHB and MEHB algorithms to compute clusterscenters in the approximation algorithm by [Gonzalez, 1985], weobtain approximate algorithms forcovering in hyperbolic spacesthe kcenter problem in Oc 2015 Frank Nielsen  Ga¨tan HadjeresekNd2log122Algorithm 3: Gonzalez farthestﬁrst traversal approximation algorithm1: C1 := P,i =02: while i ≤ k do3:∀j ≤ i, compute cj := MEB(Cj )4:∀j ≤ i, set fj := argmaxp∈P ρ(p, cj )5:ﬁnd f ∈ {fj } whose distance to its cluster center is maximal6:create cluster Ci containing f7:add to Ci all points whose distance to f is inferior to thedistance to their cluster center8:increment i9: end while10: return {Ci }ic 2015 Frank Nielsen  Ga¨tan Hadjerese23OpeningsThe computation of the minimum enclosing hyperbolic ball doesnot necessarily involve all points p ∈ P.Coresets in hyperbolic geometrythe MEHB obtained by the algorithm is an coresetdiﬀerences with the euclidean setting: coresets are of size atmost 1/ [Badoiu and Clarkson, 2008]c 2015 Frank Nielsen  Ga¨tan Hadjerese24Thank you!c 2015 Frank Nielsen  Ga¨tan Hadjerese25Bibliography IArnaudon, M. and Nielsen, F. (2013).On approximating the Riemannian 1center.Computational Geometry, 46(1):93–104.Badoiu, M. and Clarkson, K. L. (2008).Optimal coresets for balls.Comput. Geom., 40(1):14–22.Gonzalez, T. F. (1985).Clustering to minimize the maximum intercluster distance.Theoretical Computer Science, 38:293–306.Nock, R. and Nielsen, F. (2005).Fitting the smallest enclosing Bregman ball.In Machine Learning: ECML 2005, pages 649–656. Springer.c 2015 Frank Nielsen  Ga¨tan Hadjerese26
Brain Computer Interfaces (BCI) based on electroencephalography (EEG) rely on multichannel brain signal processing. Most of the stateoftheart approaches deal with covariance matrices, and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably, a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset, as the invariances embedded in those distances have an influence on the classification accuracy. Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load.

From Euclidean to Riemannian Means:Information Geometry for SSVEP ClassiﬁcationEmmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al.F’SATI  Tshawne University of Technology (South Africa)LISV  Université de Versailles SaintQuentin (France)Mensia Technologies (France)sylvain.chevallier@uvsq.fr28 October 2015BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCerebral interfacesContext Rehabilitation and disability compensation) Outofthelab solutions) Open to a wider populationProblem Intrasubject variabilities) Online methods, adaptative algorithmsIntersubject variabilities) Good generalization, fast convergenceOpportunities New generation of BCI (Congedo & Barachant)• Growing interest in EEG community• Large community, available datasets• Challenging situations and problemsS. Chevallier28/10/2015GSI2 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesOutlineBrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesS. Chevallier28/10/2015GSI3 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction based on brain activityBrainComputer Interface (BCI) for nonmuscular communication• Medical applications• Possible applications for wider populationRecording at what scale ?• Neuron• Neuronal group• BrainS. Chevallier!LFP!ECoG!SEEG!EEG!MEG!IRMf!TEP28/10/2015GSI4 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction loopBCI loop1Acquisition2Preprocessing3Translation4User feedbackS. Chevallier28/10/2015GSI5 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesElectroencephalographyMost BCI rely on EEG) Eﬃcient to capture brainwaves• Lightweight system• Low cost• Mature technologies• High temporal resolution• No trepanationS. Chevallier28/10/2015GSI6 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesOrigins of EEG• Local ﬁeld potentials• Electric potential diﬀerence betweendendrite and soma• Maxwell’s equation• Quasistatic approximation• Volume conduction eﬀect• Sensitive to conductivity of brain skull• Sensitive to tissue anisotropiesS. Chevallier28/10/2015GSI7 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesExperimental paradigmsDiﬀerent brain signals for BCI :• Motor imagery : (de)synchronization in premotor cortex• Evoked responses : low amplitude potentials induced by stimulusSteadyState Visually Evoked Potentials8 electrodes in occipital regionSSVEP stimulation LEDs13 Hz 17 Hz21 Hz• Neural synchronization with visual stimulation• No learning required, based on visual attention• Strong induced activationS. Chevallier28/10/2015GSI8 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesBCI ChallengesLimitations• Data scarsity) A few sources are nonlinearly mixed on all electrodes• Individual variabilities) Eﬀect of mental fatigue• Intersession variabilities) Electronic impedances, localizations of electrodes• Interindividual variabilities) State of the art approaches fail with 20% of subjectsDesired properties :• Online systems) Continously adapt to the user’s variations• No calibration phase) Non negligible cognitive load, raises fatigue• Generic model classiﬁers and transfert learning) Use data from one subject to enhance the results for anotherS. Chevallier28/10/2015GSI9 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesSpatial covariance matricesCommon approach : spatial ﬁltering• Eﬃcient on clean datasets• Speciﬁc to each user and session) Require user calibration• Two step training with feature selection) Overﬁtting risk, curse of dimensionalityWorking with covariance matrices• Good generalization across subjects• Fast convergence• Existing online algorithms• Eﬃcient implementationsS. Chevallier28/10/2015GSI10 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCovariance matrices for EEG• An EEG trial : X 2 RC ⇥N , C electrodes, N time samples• Assuming that X ⇠ N (0, ⌃)• Covariance matrices ⌃ belong toMC = ⌃ 2 RC ⇥C : ⌃ = ⌃ and x  ⌃x > 0, 8x 2 RC \0• Mean of the set {⌃i }i=1,...,I¯ = argmin⌃2M PI d m (⌃i , ⌃)is ⌃Ci=1• Each EEG class is representedby its mean• Classiﬁcation based on thosemeans• How to obtain a robust andeﬃcient algorithm ?Congedo, 2013S. Chevallier28/10/2015GSI11 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesMinimum distance to Riemannian meanSimple and robust classiﬁer(k)• Compute the center ⌃Eof each of the K classesˆ• Assign a given unlabelled ⌃ to the closest class(k)ˆk ⇤ = argmin (⌃, ⌃E )k¯Trajectories on tangent space at mean of all trials ⌃µ6Resting class13Hz class21Hz class17Hz class42Delay0−2−4−4S. Chevallier−228/10/2015024GSI12 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesRiemannian potatoRemoving outliers and artifacts¯Reject any ⌃i that lies too far from the mean of all trials ⌃µz( i ) =¯is d(⌃i , ⌃), µ andI{ i }i=1µ> zth ,are the mean and standard deviation of distancesiRaw matricesS. Chevallieri28/10/2015Riemannian potato ﬁlteringGSI13 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCovariance matrices for EEGbased BCIRiemannian approaches in BCI :• Achieve state of the art results! performing like spatial ﬁltering or sensorspace methods• Rely on simpler algorithms! less errorprone, computationally eﬃcientWhat are the reason of this success ?• Invariances embedded with Riemannian distances! invariance to rescaling, normalization, whitening! invariance to electrode permutation or positionning• Equivalent to working in an optimal source space! spatial ﬁltering are sensitive to outliers and userspeciﬁc! no question on "sensors or sources" methods) What are the most desirable invariances for EEG ?S. Chevallier28/10/2015GSI14 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesConsidered distances and divergencesEuclidean dE (⌃1 , ⌃2 ) = k⌃1⌃2 kFLogEuclidean dLE (⌃1 , ⌃2 ) = klog(⌃1 )log(⌃2 )kFV. Arsigny et al., 2006, 2007Aﬃneinvariant dAI (⌃1 , ⌃2 ) = klog(⌃1 1 ⌃2 )kFT. Fletcher & S. Joshi, 2004 , M. Moakher, 2005↵divergence d↵ D (⌃1 , ⌃2 ) =1<↵<141 ↵2Bhattacharyya dB (⌃1 , ⌃2 ) = log28/10/2015det(⌃1 )12↵det(⌃2 )1+↵2Z. Chebbi & M. Moakher, 2012⇣S. Chevallierlogdet( 1 2 ↵ ⌃1 + 1+↵ ⌃2 )2det 1 (⌃1 +⌃2 )2(det(⌃1 ) det(⌃2 ))1/2⌘1/2Z. Chebbi & M. Moakher, 2012GSI15 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesExperimental results• Euclidean distances yield the lowest results! Usually attributed to the invariance under inversion that is notguaranteed! Displays swelling eﬀect• Riemannian approaches outperform stateoftheart methods(CCA+SVM)• Bhattacharyya has the lowestcomputational cost and a goodaccuracyS. Chevallier28/10/2015900.780Accuracy (%)performances! but requires a costlyoptimisation to ﬁnd the best ↵value0.6700.5600.4500.3400.230CPU time (s)• ↵divergence shows the best0.120−1−0.50Alpha values (α)GSI0.50116 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesConclusionWorking with covariance matrices in BCI• Achieves very good results• Simple algorithms work well : MDM, Riemannian potato• Need for robust and online methodsInteresting applications for IG :• Many freely available datasets• Several competitions• Many open source toolboxes for manipulating EEGSeveral open questions :• Handling electrodes misplacements and others artifacts• Missing data and covariance matrices of lower rank• Inter and intraindividual variabilitiesS. Chevallier28/10/2015GSI17 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesThank you !S. Chevallier28/10/2015GSI18 / 19BrainComputer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction loopBCI loop1Acquisition2Preprocessing3Translation4User feedbackFirst systems in early ’70S. Chevallier28/10/2015GSI19 / 19
Group Theoretical Study on Geodesics for the EllipticalModelsHiroto InoueKyushu University, JapanOctober 28, 2015´GSI2015, Ecole Polytechnique, ParisSaclay, FranceHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20151 / 14Overview1Eriksen’s construction of geodesics on normal modelProblem2Reconsideration of Eriksen’s argumentEmbedding Nn → Sym+ (R)n+13Geodesic equation on Elliptical model4Future workHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20152 / 14Eriksen’s construction of geodesics on normal modelLet Sym+ (R) be the set of ndimensional positivedeﬁnite matrices.nThe normal model Nn = (M, ds 2 ) is a Riemannian manifold deﬁned byM = (µ, Σ) ∈ Rn × Sym+ (R) ,n1ds 2 = (tdµ)Σ−1 (dµ) + tr((Σ−1 dΣ)2 ).2The geodesic equation on Nn isµ − ΣΣ−1 µ = 0,¨ ˙˙(1)¨˙˙Σ + µtµ − ΣΣ−1 Σ = 0.˙ ˙The solution of this geodesic equation has been obtained by Eriksen.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20153 / 14Theorem ([Eriksen 1987])For any x ∈ Rn , B ∈ Symn (R), deﬁne a matrix exponential Λ(t) by∆ δ ΦB x0tγ := exp(−tA),Λ(t) = tδA := tx 0 −tx ∈ Mat2n+1 .tΦ γ Γ0 −x −B(2)−1 δ, ∆−1 ) is the geodesic on NThen, the curve (µ(t), Σ(t)) := (−∆nsatisﬁying the initial condition(µ(0), Σ(0)) = (0, In ),˙(µ(0), Σ(0)) = (x, B).˙(proof)We see that by the deﬁnition, (µ(t), Σ(t)) satisﬁes the geodesic equation.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20154 / 14Problem1Explain Eriksen’s theorem, to clarify the relation between thenormal model and symmetric spaces.2Extend Eriksen’s theorem to the elliptical model.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20155 / 14Reconsideration of Eriksen’s argumentSym+ (R)n+1Notice that the positivedeﬁnite symmetric matrices Sym+ (R) is an+1symmetric space byG /KSym+ (R)n+1gK →g · tg ,where G = GLn+1 (R), K = O(n + 1). This space G /K has theG invariant Riemannian metricds 2 =Hiroto Inoue (Kyushu Uni.)1tr (S −1 dS)2 .2Group Theoretical Study on GeodesicsOctober 28, 20156 / 14Embedding Nn → Sym+ (R)n+1Put an aﬃne subgroupP µ0 1GA :=P ∈ GLn (R), µ ∈ Rn⊂ GLn+1 (R).Deﬁne a Riemannian submanifold as the orbitGA · In+1 = {g · tg  g ∈ GA } ⊂ Sym+ (R).n+1Theorem (Ref. [Calvo, Oller 2001])We have the following isometryNn(Σ, µ)Hiroto Inoue (Kyushu Uni.)∼− GA · In+1 ⊂ Sym+ (R),→n+1→(3)Σ + µtµ µ.tµ1Group Theoretical Study on GeodesicsOctober 28, 20157 / 14Embedding Nn → Sym+ (R)n+1By using the above embedding, we get a simpler expression of the metricand the geodesic equation.Nn(Σ, µ)coordinatemetricgeodesic eq.∼=→S=⇔ds 2 =⇔˙(In , 0)(S −1 S) = (B, x)ds 2 = (tdµ)Σ−1 (dµ)+ 1 tr((Σ−1 dΣ)2 )2µ − ΣΣ−1 µ = 0,¨ ˙˙¨˙˙Σ + µtµ − ΣΣ−1 Σ = 0˙ ˙Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsGA · In+1 ⊂ Sym+ (R)n+1Σ + µtµ µtµ112tr (S −1 dS)2October 28, 20158 / 14Reconsideration of Eriksen’s argumentWe can interpret the Eriksen’s argument as follows.BA = tx00−tx −Bx0−x−→Diﬀerential equation˙Λ−1 Λ = −A∆ δ ∗∗e −tA = tδ∗ ∗ ∗−→−→{Λ : JΛJ = Λ−1 }∩sym2n+1 (R)−→∩Sym+ (R)2n+1expIn1Geodesic equation˙(In , 0)(S −1 S) = (B, x)S :=∆tδδ−1∈∈∈{A : JAJ = −A}Here J = In−→−→Nn ∼ GA · In+1=−→∩Sym+ (R)n+1Essential!projection.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20159 / 14Geodesic equation on Elliptical modelDeﬁnitionLet us deﬁne a Riemannian manifold En (α) = (M, ds 2 ) byM = (µ, Σ) ∈ Rn × Sym+ (R) ,nds 2 = (tdµ)Σ−1 (dµ) +112tr((Σ−1 dΣ)2 )+ dα tr(Σ−1 dΣ) .22(4)where dα = (n + 1)α2 + 2α, α ∈ C. Then En (0) = Nn .The geodesic equation on En (α) is µ − ΣΣ−1 µ = 0,˙ ¨ ˙ Σ + µtµ − ΣΣ−1 Σ−˙˙˙ ˙ ¨dα t −1µΣ µΣ = 0.˙˙ndα + 1(5)This is equivalent to the geodesic equation on the elliptical model.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201510 / 14Geodesic equation on Elliptical modelThe manifold En (α) is also embedded into positivedeﬁnite symmetricmatrices Sym+ (R), ref. [Calvo, Oller 2001], and we have simplern+1expression of the geodesic equation.En (α)coordinate∼=(Σ, µ) →∃ G (α)A· In+1 ⊂ Sym+ (R)n+1S = Σαds 2 =12Σ + µtµ µtµ1tr (S −1 dS)2metric(4)⇔geodesic eq.(5)˙⇔ (In , 0)(S −1 S) = (C , x) − α(log S) (In , 0)A = det AHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201511 / 14Geodesic equation on Elliptical modelBut, in general, we do not ever construct any submanifoldN ⊂ Sym+ (R) such that its projection is En (α):2n+1Geodesic equation˙(In , 0)(S −1 S) = (C , x) − α(log S) (In , 0)Λ(t)−→S(t)−→En (α) ∼ GA (α) · In+1=∩Sym+ (R)n+1N∩Sym+ (R)2n+1∈−→∈Diﬀerential equation˙Λ−1 Λ = −A−→projectionThe geodesic equation on elliptical model has not been solved.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201512 / 14Future work1Extend Eriksen’s theorem for elliptical models (ongoing)2Find Eriksen type theorem for general symmetric spaces G /KSketch of the problem:For a projection p : G /K → G /K ,ﬁnd a geodesic submanifold N ⊂ G /K ,such that pN maps all the geodesics to the geodesics:∀ Λ(t):Np(N)∈p(Λ(t)): Geodesic∈−→−→GeodesicpN∩G /KHiroto Inoue (Kyushu Uni.)−→p:projectionGroup Theoretical Study on Geodesics∩G /KOctober 28, 201513 / 14ReferencesCalvo, M., Oller, J.M.A distance between elliptical distributions based in an embedding into the Siegelgroup,J. Comput. Appl. Math. 145, 319–334 (2002).Eriksen, P.S.Geodesics connected with the Fisher metric on the multivariate normal manifold,pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987).Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201514 / 14
We introduce a class of paths or oneparameter models connecting arbitrary two probability density functions (pdf’s). The class is derived by employing the KolmogorovNagumo average between the two pdf’s. There is a variety of such path connectedness on the space of pdf’s since the KolmogorovNagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The oneparameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.

Path connectedness on a space ofprobability density functionsOsamu Komori1 , Shinto Eguchi2University of Fukui1 , JapanThe Institute of Statistical Mathematics2 , JapanEcole Polytechnique, ParisSaclay (France)October 28, 2015Komori, O. (University of Fukui)GSI2015October 28, 20151 / 18Contents1KolmogorovNagumo (KN) average2parallel displacement At3U divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕpathGSI2015October 28, 20152 / 18SettingTerminologyX : data spaceP : probability measure on X.FP : space of probability density functions associated with PWe consider a path connecting fand g, where f, g ∈ FP , andinvestigate the property from aviewpoint of informationgeometry.Komori, O. (University of Fukui)GSI2015.October 28, 20153 / 18KolmogorovNagumo (KN) averageLet ϕ : (0, ∞) → R be an monotonic increasing and concavecontinuous function. Then for f and g in F pThe KolmogorovNagumo (KN) average()ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)).for 0 ≤ t ≤ 1.Remark 1−1ϕ.is monotone increasing, convex and continuous on (0, ∞)..Komori, O. (University of Fukui)GSI2015October 28, 20154 / 18ϕpathBased on KN average, we consider ϕpath connecting f and g inFP :ϕpath()ft (x, ϕ) = ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) − κt ,where κt ≤ 0 is a normalizing factor, where the equality holds ift = 0 or t = 1...Komori, O. (University of Fukui)GSI2015October 28, 20155 / 18Existence of κtTheorem 1There uniquely exists κt such that∫X()ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) − κt dP(x) = 1.Proo fFrom the convexity of ϕ−1 , we have∫0≤∫().ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) dP(x) ≤ {(1 − t) f (x) + tg(x)}dP(x) ≤ 1And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing.Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equationabove.Komori, O. (University of Fukui)GSI2015October 28, 20156 / 18Illustration of ϕpathKomori, O. (University of Fukui)GSI2015October 28, 20157 / 18Examples of ϕpathExample 11ϕ0 (x) = log(x). The ϕ0 path is given byft (x, ϕ0 ) = exp((1 − t) log f (x) + t log g(x) − κt ),where κt = log2∫exp((1 − t) log f (x) + t log g(x))dP(x).ϕη (x) = log(x + η) with η ≥ 0. The ϕη path is given by[]ft (x, ϕη ) = exp (1 − t) log{ f (x) + η} + t log{g(x) + η} − κt ,where κt = log3[∫]exp{(1 − t) log{ f (x) + η} + t log{g(x) + η}}dP(x) − η .ϕβ (x) = (xβ − 1)/β with β ≤ 1. The ϕβ path is given byft (x, ϕβ ) = {(1 − t) f (x)β + tg(x)β − κt } β ,1.where κt does not have an explicit form.Komori, O. (University of Fukui)GSI2015October 28, 20158 / 18Contents1KolmogorovNagumo (KN) average2parallel displacement At3U divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕpathGSI2015October 28, 20159 / 18Extended expectationFor a function a(x): X → R, we considerExtended expectation∫E(ϕ) {a(X)}f=1Xϕ′ ( f (x))∫Xa(x)dP(x)1dP(x)′ ( f (x))ϕ,.where ϕ: (0, ∞) → R is a generator function.Remark 2If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation..Komori, O. (University of Fukui)GSI2015October 28, 201510 / 18Properties of extended expectationWe note that1E(ϕ) (c) = c for any constant c.f2E(ϕ) {ca(X)} = cE(ϕ) {a(X)} for any constant c.ff3E(ϕ) {a(X) + b(X)} = E(ϕ) {a(X)} + E(ϕ) {b(X)}.fff4E(ϕ) {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 forfPalmost everywhere x in X.Remark 3If we deﬁne f (ϕ) (x) = 1/ϕ′ ( f (x))/E(ϕ) {a(X)} = E f (ϕ) {a(X)}.fKomori, O. (University of Fukui)∫X1/ϕ′ ( f (x))dP(x), thenGSI2015October 28, 201511 / 18Tangent space of FPLet H f be a Hilbert space with the inner product deﬁned by⟨a, b⟩ f = E(ϕ) {a(X)b(X)}, and the tangent spacefTangent space associated with extended expectationT f = {a ∈ H f : ⟨a, 1⟩ f = 0}.For a statistical model M = { fθ (x)}θ∈Θ we have.E(ϕ) {∂i ϕ( fθ (X))} = 0fθ.for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi )i=1,··· ,p . Further,E(ϕ) {∂i ∂ j ϕ( fθ (X))}fθKomori, O. (University of Fukui)=E(ϕ)fθ{ ϕ′′ ( fθ (X))ϕ′ ( fθ (X))GSI2015}∂i ϕ( fθ (X))∂i ϕ( fθ (X)) .2October 28, 201512 / 18(ϕ)Parallel displacement At(ϕ)Deﬁne At (x) in T ft by the solution for a differential equation{ϕ′′ ( ft ) }˙tA(ϕ) (x) − E(ϕ) A(ϕ) f˙ ′= 0,ttftϕ ( ft )where ft is a path connecting f and g such that f0 = f and f1 = g.˙tA(ϕ) (x) is the derivative of At(ϕ) (x) with respect to t.Theorem 2The geodesic curve { ft }0≤t≤1 by the parallel displacement At is theϕpath.(ϕ)Komori, O. (University of Fukui)GSI2015October 28, 201513 / 18Contents1KolmogorovNagumo (KN) average2parallel displacement At3U divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕpathGSI2015October 28, 201514 / 18U divergenceAssume that U(s) is a convex and increasing function of a scalar sand let ξ(t) = argmax s {st − U(s)} . Then we haveU divergence∫DU ( f, g) =∫{U(ξ(g)) − f ξ(g)}dP −{U(ξ( f )) − f ξ( f )}dP.In fact, U divergence is the difference of the cross entropy CU ( f, g).with the diagonal entropy CU ( f, f ), where∫CU ( f, g) = {U(ξ(g)) − f ξ(g)}dP..Komori, O. (University of Fukui)GSI2015October 28, 201515 / 18Connections based on U divergenceFor a manifold of ﬁnite dimension M = { fθ (x) : θ ∈ Θ} and vectorﬁelds X and Y on M , the Riemannian metric is∫G(U)(X, Y)( f ) =X f Yξ( f )dPfor f ∈ M and linear connections ∇(U) and ∇∗ (U) are∫G(U)(∇(U) Y, Z)( f )X=andG(U)(∇∗ (U) Y, Z)( f )XXY f Zξ( f )dP∫=Z f XYξ( f )dP.See Eguchi (1992) for details.Komori, O. (University of Fukui)GSI2015October 28, 201516 / 18Equivalence between ∇∗ geodesic andξpathLet ∇(U) and ∇∗ (U) be linear connections associated withU divergence DU , and let C (ϕ) = { ft (x, ϕ) : 0 ≤ t ≤ 1} be the ϕ pathconnecting f and g of FP . Then, we haveTheorem 3A ∇(U) geodesic curve connecting f and g is equal to C (id) , whereid denotes the identity function; while a ∇∗ (U) geodesic curveconnecting f and g is equal to C (ξ) , whereξ(t) = argmax s {st − U(s)}.Komori, O. (University of Fukui)GSI2015October 28, 201517 / 18Summary1234We consider ϕpath based on KolmogorovNagumo average.The relation between U divergence and ϕpath wasinvestigated (ϕ corresponds to ξ ).The idea of ϕpath can be applied to probability densityestimation as well as classiﬁcation problems.Divergence associated with ϕpath can be considered, wherea special case would be Bhattacharyya divergence.Komori, O. (University of Fukui)GSI2015October 28, 201518 / 18
Computational Information Geometry......in mixture modellingComputational Information Geometry: mixture modellingGermain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 .1The Open University (EPSRC grant EP/L010429/1), United Kingdom2 University of Waterloo, USAGSI15, 2830 October 2015, ParisGermain Van BeverCIG for mixtures1/19Computational Information Geometry......in mixture modellingOutline1Computational Information Geometry...Information GeometryCIG2...in mixture modellingIntroductionLindsay’s convex geometry(C)IG for mixture distributionsGermain Van BeverCIG for mixtures2/19Computational Information Geometry......in mixture modellingInformation GeometryCIGOutline1Computational Information Geometry...Information GeometryCIG2...in mixture modellingIntroductionLindsay’s convex geometry(C)IG for mixture distributionsGermain Van BeverCIG for mixtures3/19Computational Information Geometry......in mixture modellingInformation GeometryCIGGeneralitiesThe use of geometry in statistics gave birth to many different approaches.Traditionally, Information geometry refers to the application of differential geometry tostatistical theory and practice.The main ingredients of IG in exponential families (Amari, 1985) are1the manifold of parameters M ,2the Riemannian (Fisher information) metric g, and3the set of afﬁne connections {connections).−1,+1} (mixture and exponentialThese allow to deﬁne notions of curvature, dimension reduction or information lossand invariant higher order expansions. Two afﬁne structures (maps on M ) are usedsimultaneously:1: Mixture afﬁne geometry on probability measures: λf (x) + (1 − λ)g(x).+1: Exponential afﬁne geometry on probability measures: C(λ)f (x)λ g(x)(1−λ)Germain Van BeverCIG for mixtures4/19Computational Information Geometry......in mixture modellingInformation GeometryCIGComputational Information GeometryThis talk is about Computational Information Geometry (CIG, Critchley and Marriott,2014).1In CIG, the multinomial model provides, modulo, discretization, a universalmodel. It therefore moves from the manifoldbased systems to simplexbasedgeometries and allows for different supports in the extended simplex.2It provides a unifying framework for different geometries.3Tractability of the geometry allows for efﬁcient algorithms in a computationalframework.It is inherently ﬁnite and discrete. The impact of discretization is studied. A workingmodel will be a subset of the simplex.Germain Van BeverCIG for mixtures5/19Computational Information Geometry......in mixture modellingInformation GeometryCIGMultinomial distributionsX ∼ Mult(π0 , . . . , πk ), π = (π0 , . . . , πk ) ∈ int(∆k ), withk∆k :=π : πi ≥ 0,πi = 1 .i=0In this case, π (0) = (π 1 , . . . , π k ) is the mean parameter, while η = log(π (0) /π0 ) isthe natural parameter. Studying limits gives extended exponential families on theclosed simplex (Csiszár and Matúš, 2005).mixed geodesics in +1space00.060.2420.4π2η20.6240.861.0mixed geodesics in 1space0.00.20.4π10.60.81.0Germain Van Bever64CIG for mixtures20η12466/19Computational Information Geometry......in mixture modellingInformation GeometryCIGRestricting to the multinomials familiesUnder regular exponential families with compact support, the cost ofdiscretization on the components of Information Geometry is bounded!The same holds true for the MLE and the loglikelihood function.The loglikelihood (x, π)
Bayesian and Information Geometry for Inverse Problems (chaired by Ali MohammadDjafari, Olivier Swander)
We review the manifold projection method for stochastic nonlinear filtering in a more general setting than in our previous paper in Geometric Science of Information 2013. We still use a Hilbert space structure on a space of probability densities to project the infinite dimensional stochastic partial differential equation for the optimal filter onto a finite dimensional exponential or mixture family, respectively, with two different metrics, the Hellinger distance and the L2 direct metric. This reduces the problem to finite dimensional stochastic differential equations. In this paper we summarize a previous equivalence result between Assumed Density Filters (ADF) and Hellinger/Exponential projection filters, and introduce a new equivalence between Galerkin method based filters and Direct metric/Mixture projection filters. This result allows us to give a rigorous geometric interpretation to ADF and Galerkin filters. We also discuss the different finitedimensional filters obtained when projecting the stochastic partial differential equation for either the normalized (KushnerStratonovich) or a specific unnormalized (Zakai) density of the optimal filter.

Stochastic PDE projection on manifolds:AssumedDensity and Galerkin FiltersGSI 2015, Oct 28, 2015, ParisDamiano BrigoDept. of Mathematics, Imperial College, Londonwww.damianobrigo.it—Joint work with John ArmstrongDept. of Mathematics, King’s College, London—Full paper to appear in MCSS, see also arXiv.orgD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20151 / 37Inner Products, Metrics and ProjectionsSpaces of densitiesSpaces of probability densitiesConsider a parametric family of probability densitiesS = {p(·, θ), θ ∈ Θ ⊂ Rm },S 1/2 = {p(·, θ), θ ∈ Θ ⊂ Rm }.If S (or S 1/2 ) is a subset of a function space having an L2 structure (⇒inner product, norm & metric), then we may ask whetherp(·, θ) → θRm ,(p(·, θ) → θ respectively)is a Chart of a mdim manifold (?) S (S 1/2 ).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20152 / 37Inner Products, Metrics and ProjectionsSpaces of densitiesSpaces of probability densitiesConsider a parametric family of probability densitiesS = {p(·, θ), θ ∈ Θ ⊂ Rm },S 1/2 = {p(·, θ), θ ∈ Θ ⊂ Rm }.If S (or S 1/2 ) is a subset of a function space having an L2 structure (⇒inner product, norm & metric), then we may ask whetherp(·, θ) → θRm ,(p(·, θ) → θ respectively)is a Chart of a mdim manifold (?) S (S 1/2 ). The topology & differentialstructure in the chart is the L2 structure, but two possibilities:S : d2 (p1 , p2 ) = p1 − p2(L2 direct distance), p1,2 ∈ L2√ √√√S 1/2 : dH ( p1 , p2 ) =p1 − p2where ·(Hellinger distance),p1,2 ∈ L1is the norm of Hilbert space L2 .D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20152 / 37Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.The inner product of 2 basis elements is deﬁned (L2 structure)∂p(·, θ) ∂p(·, θ)∂p(x, θ) ∂p(x, θ)1=1dx = 4 γij (θ) .4∂θi∂θj∂θi∂θj√√∂ p ∂ p1∂p(x, θ) ∂p(x, θ)1=1dx = 4 gij (θ) .4∂θi ∂θjp(x, θ) ∂θi∂θjγ(θ): direct L2 matrix (d2 ); g(θ): famous FisherRao matrix (dH )D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.The inner product of 2 basis elements is deﬁned (L2 structure)∂p(·, θ) ∂p(·, θ)∂p(x, θ) ∂p(x, θ)1=1dx = 4 γij (θ) .4∂θi∂θj∂θi∂θj√√∂ p ∂ p1∂p(x, θ) ∂p(x, θ)1=1dx = 4 gij (θ) .4∂θi ∂θjp(x, θ) ∂θi∂θjγ(θ): direct L2 matrix (d2 ); g(θ): famous FisherRao matrix (dH )mm∂p(·, θ) ∂p(·, θ)γd2 ort. projection: Πθ [v ] =[γ ij (θ) v ,]∂θj∂θii=1 j=1√(dH proj. analogous inserting · and replacing γ with g)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdXt= ft (Xt ) dt + σt (Xt ) dWt , X0 , (signal)(1)dYt= bt (Xt ) dt + dVt , Y0 = 0 (noisy observation)ˆˆThese are Ito SDE’s. We use both Ito and Stratonovich (Str) SDE’s. StrˆSDE’s are necessary to deal with manifolds, since second order Itoterms not clear in terms of manifolds [16], although we are working ona direct projection of Ito equations with good optimality properties(John Armstrong)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20154 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdXt= ft (Xt ) dt + σt (Xt ) dWt , X0 , (signal)(1)dYt= bt (Xt ) dt + dVt , Y0 = 0 (noisy observation)ˆˆThese are Ito SDE’s. We use both Ito and Stratonovich (Str) SDE’s. StrˆSDE’s are necessary to deal with manifolds, since second order Itoterms not clear in terms of manifolds [16], although we are working ona direct projection of Ito equations with good optimality properties(John Armstrong)The nonlinear ﬁltering problem consists in ﬁnding the conditionalprobability distribution πt of the state Xt given the observations up totime t, i.e. πt (dx) := P[Xt ∈ dx  Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t).Assume πt has a density pt : then pt satisﬁes the Str SPDE:D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20154 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdpt =L∗ ptt1dt − pt [bt 2 − Ept {bt 2 }] dt +2with the forward operator L∗ φ = −tD. Brigo and J. Armstrong (ICL and KCL)n∂i=1 ∂xiSPDE Projection Filtersdpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]GSI 20155 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdpt =L∗ ptt1dt − pt [bt 2 − Ept {bt 2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any ﬁnite dim p(·, θ) [19].D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdpt =L∗ ptt1dt − pt [bt 2 − Ept {bt 2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any ﬁnite dim p(·, θ) [19].We need ﬁnite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdpt =L∗ ptt1dt − pt [bt 2 − Ept {bt 2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any ﬁnite dim p(·, θ) [19].We need ﬁnite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).Projection transforms the SPDE to a ﬁnite dimensional SDE for θ viathe chain rule (hence Str calculus): dp(·, θt ) = m ∂p(·,θ) ◦ dθj (t).j=1 ∂θjD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37Nonlinear Projection FilteringNonlinear ﬁltering problemThe nonlinear ﬁltering problem for diffusion signalsdpt =L∗ ptt1dt − pt [bt 2 − Ept {bt 2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any ﬁnite dim p(·, θ) [19].We need ﬁnite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).Projection transforms the SPDE to a ﬁnite dimensional SDE for θ viathe chain rule (hence Str calculus): dp(·, θt ) = m ∂p(·,θ) ◦ dθj (t).j=1 ∂θjWith Ito calculus we would have termsD. Brigo and J. Armstrong (ICL and KCL)∂ 2 p(·,θ)∂θi ∂θj dSPDE Projection Filtersθi , θj (not tang vec)GSI 20155 / 37Nonlinear Projection FilteringProjection FiltersProjection ﬁlter in the metrics h (L2) and g (Fisher)mdθti = γ ij (θt )L∗ p(x, θt )tj=1∂p(x, θt )dx −∂θjdmγ ij (θt )j=1m+γ ij (θt )[k=1j=1btk (x)1∂pdx dtbt (x)22∂θj∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjThe above is the projected equation in d2 metric and Πγ .D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20156 / 37Nonlinear Projection FilteringProjection FiltersProjection ﬁlter in the metrics h (L2) and g (Fisher)mdθti = γ ij (θt )L∗ p(x, θt )tj=1∂p(x, θt )dx −∂θjdmγ ij (θt )j=1m+γ ij (θt )[k=1btk (x)j=11∂pdx dtbt (x)22∂θj∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjThe above is the projected equation in d2 metric and Πγ .Instead, using the Hellinger distance & the Fisher metric with projection Πgmm∗Lt p(x, θt ) ∂p(x, θt )1∂p dθti = g ij (θt )dx −g ij (θt )bt (x)2dx dtp(x, θt )∂θj2∂θjj=1j=1d+g ij (θt )[k =1D. Brigo and J. Armstrong (ICL and KCL)mj=1SPDE Projection Filtersbtk (x)∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjGSI 20156 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y function b among c(x) exponentsmakes ﬁlter correction step (projection of dY term) exactD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y function b among c(x) exponentsmakes ﬁlter correction step (projection of dY term) exactOne can deﬁne both a local and global ﬁltering error through dHD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y function b among c(x) exponentsmakes ﬁlter correction step (projection of dY term) exactOne can deﬁne both a local and global ﬁltering error through dHAlternative coordinates, expectation param., η = Eθ [c] = ∂θ ψ(θ).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection ﬁlter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y function b among c(x) exponentsmakes ﬁlter correction step (projection of dY term) exactOne can deﬁne both a local and global ﬁltering error through dHAlternative coordinates, expectation param., η = Eθ [c] = ∂θ ψ(θ).Projection ﬁlter in η coincides with classical approx ﬁlter: assumeddensity ﬁlter (based on generalized “moment matching”)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the ﬁlter equations are simpler?D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the ﬁlter equations are simpler?The answer is afﬁrmative, and this is the mixture family.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the ﬁlter equations are simpler?The answer is afﬁrmative, and this is the mixture family.We deﬁne a simple mixture family as follows. Given m + 1 ﬁxedsquared integrable probability densities q = [q1 , q2 , . . . , qm+1 ]T , deﬁneˆθ(θ) := [θ1 , θ2 , . . . , θm , 1 − θ1 − θ2 − . . . − θm ]Tˆˆfor all θ ∈ Rm . We write θ instead of θ(θ). Mixture family (simplex):ˆS M (q) = {θ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1}D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37Choice of the familyMixture FamiliesMixture familiesIf we consider the L2 / γ(θ) distance, the metric γ(θ) itself and therelated projection become very simple. Indeed,∂p(·, θ)= qi − qm+1 and γij (θ) =∂θi(qi (x) − qm (x))(qj (x) − qm (x))dx(NO inline numeric integr).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20159 / 37Choice of the familyMixture FamiliesMixture familiesIf we consider the L2 / γ(θ) distance, the metric γ(θ) itself and therelated projection become very simple. Indeed,∂p(·, θ)= qi − qm+1 and γij (θ) =∂θi(qi (x) − qm (x))(qj (x) − qm (x))dx(NO inline numeric integr). The L2 metric does not depend on thespeciﬁc point θ of the manifold. The same holds for the tangent spaceat p(·, θ), which is given byspan{q1 − qm+1 , q2 − qm+1 , · · · , qm − qm+1 }Also the L2 projection becomes particularly simple.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20159 / 37Mixture Projection FilterMixture Projection FilterArmstrong and B. (MCSS 2016 [3]) show that the mixture family +metric γ(θ) lead to a Projection ﬁlter that is the same asapproximate ﬁltering via Galerkin [5] methods.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201510 / 37Mixture Projection FilterMixture Projection FilterArmstrong and B. (MCSS 2016 [3]) show that the mixture family +metric γ(θ) lead to a Projection ﬁlter that is the same asapproximate ﬁltering via Galerkin [5] methods.See the full paper for the details. Summing up:Family →Metric↓ExponentialBasic MixtureHellinger dHFisher g(θ)Good∼ADF ≈ localmoment matchingNothing specialDirect L2 d2matrix γ(θ)Nothing specialGood(∼Galerkin)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201510 / 37Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, ﬁlter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiD. Brigo and J. Armstrong (ICL and KCL)iSPDE Projection FiltersGSI 201511 / 37Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, ﬁlter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201511 / 37Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, ﬁlter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES withMEANS AND VARIANCES in each component not ﬁxed. Forexample for a mixture of two Gaussians we have 5 parameters.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x),D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection Filtersparam.θ, µ1 , v1 , µ2 , v2GSI 201511 / 37Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, ﬁlter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.Speciﬁcally, we consider a mixture of GAUSSIAN DENSITIES withMEANS AND VARIANCES in each component not ﬁxed. Forexample for a mixture of two Gaussians we have 5 parameters.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x),param.θ, µ1 , v1 , µ2 , v2We are now going to illustrate the Gaussian mixture projectionﬁlter (GMPF) in a fundamental example.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201511 / 37Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXtdYtD. Brigo and J. Armstrong (ICL and KCL)= σdWt= X 2 dt + σdVt .SPDE Projection FiltersGSI 201512 / 37Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalWe expect a bimodal distributionD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalWe expect a bimodal distributionθpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(pink)vs EKF (N ) (blue)vs exact (green, ﬁnite diff. method, grid 1000 state & 5000 time)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 01ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201513 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 11ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201514 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 21ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201515 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 31ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201516 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 41ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201517 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 51ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201518 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 61ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201519 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 71ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201520 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 81ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201521 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 91ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201522 / 37Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 101ProjectionExactExtended KalmanExponential0.80.60.40.20864D. Brigo and J. Armstrong (ICL and KCL)20X2SPDE Projection Filters468GSI 201523 / 37Mixture Projection FilterThe quadratic sensorComparing local approximation errors (L2 residuals) εtε2 =t(pexact,t (x) − papprox,t (x))2 dxpapprox,t (x): three possible choices.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(blue)vs EKF (N ) (green)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201524 / 37Mixture Projection FilterThe quadratic sensorL2 residuals for the quadratic sensorResiduals0.7Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)0.60.50.40.30.20.100246810TimeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201525 / 37Mixture Projection FilterThe quadratic sensorComparing local approx errors (Prokhorov residuals) εtεt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) +∀x}with F the CDF of p’s.LevyProkhorov metric works well with singular densities like particleswhere L2 metric not ideal.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(green)vs best three particles (blue)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201526 / 37Mixture Projection FilterThe quadratic sensor´Levy residuals for the quadratic sensorProkhorovResiduals0.14Prokhorov Residual (L2NM)Prokhorov Residual (HE)Best possible residual (3Deltas)0.120.10.080.060.040.020012D. Brigo and J. Armstrong (ICL and KCL)345Time6SPDE Projection Filters78910GSI 201527 / 37Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeAs one approaches the boundary γij becomes singularD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeAs one approaches the boundary γij becomes singularThe solution is to dynamically change the parameterization andeven the dimension of the manifold.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structureD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler ﬁlter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler ﬁlter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsFurther investigation: convergence, more on optimality?D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesConclusionsApproximate ﬁnitedimensional ﬁltering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler ﬁlter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsFurther investigation: convergence, more on optimality?Optimality: introducing new projections (forthcoming J. Armstrong)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37Conclusions and ReferencesThanksWith thanks to the organizing committee.Thank you for your attention.Questions and comments welcomeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201530 / 37Conclusions and ReferencesReferences I[1]J. Aggrawal: Sur l’information de Fisher. In: Theories del’Information (J. Kampe de Feriet, ed.), SpringerVerlag,Berlin–New York 1974, pp. 111117.[2]Amari, S. Differentialgeometrical methods in statistics, Lecturenotes in statistics, SpringerVerlag, Berlin, 1985[3]Armstrong, J., and Brigo, D. (2016). Nonlinear ﬁltering viastochastic PDE projection on mixture manifolds in L2 direct metric,Mathematics of Control, Signals and Systems, 2016, accepted.[4]Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W.(1999). Nonlinear Projection Filter based on Galerkinapproximation. AIAA Journal of Guidance Control and Dynamics,22 (2): 258266.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201531 / 37Conclusions and ReferencesReferences II[5]Beard, R. and Gunther, J. (1997). Galerkin Approximations of theKushner Equation in Nonlinear Estimation. Working Paper,Brigham Young University.[6]BarndorffNielsen, O.E. (1978). Information and ExponentialFamilies. John Wiley and Sons, New York.[7]Brigo, D. Diffusion Processes, Manifolds of Exponential Densities,and Nonlinear Filtering, In: Ole E. BarndorffNielsen and Eva B.Vedel Jensen, editor, Geometry in Present Day Science, WorldScientiﬁc, 1999[8]Brigo, D, On SDEs with marginal laws evolving inﬁnitedimensional exponential families, STAT PROBABIL LETT,2000, Vol: 49, Pages: 127 – 134D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201532 / 37Conclusions and ReferencesReferences III[9]Brigo, D. (2011). The direct L2 geometric structure on a manifoldof probability densities with applications to Filtering. Available onarXiv.org and damianobrigo.it[10] Brigo, D, Hanzon, B, LeGland, F, A differential geometricapproach to nonlinear ﬁltering: The projection ﬁlter, IEEE TAUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252[11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear ﬁlteringby projection on exponential manifolds of densities, BERNOULLI,1999, Vol: 5, Pages: 495 – 534[12] D. Brigo, Filtering by Projection on the Manifold of ExponentialDensities, PhD Thesis, Free University of Amsterdam, 1996.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201533 / 37Conclusions and ReferencesReferences IV[13] Brigo, D., and Pistone, G. (1996). Projecting the FokkerPlanckEquation onto a ﬁnite dimensional exponential family. Available atarXiv.org[14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbookof Nonlinear Filtering, Oxford University Press.[15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear ﬁltering,in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: TheMathematics of Filtering and Identiﬁcation and Applications(Reidel, Dordrecht, 1981) 53–75.[16] Elworthy, D. (1982). Stochastic Differential Equations onManifolds. LMS Lecture Notes.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201534 / 37Conclusions and ReferencesReferences V[17] Hanzon, B. A differentialgeometric approach to approximatenonlinear ﬁltering. In C.T.J. Dodson, Geometrization of StatisticalTheory, pages 219 – 223,ULMD Publications, University ofLancaster, 1987.[18] B. Hanzon, Identiﬁability, recursive identiﬁcation and spaces oflinear dynamical systems, CWI Tracts 63 and 64, CWI,Amsterdam, 1989[19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence ofﬁnite dimensional ﬁlters for conditional statistics of the cubicsensor problem, Systems and Control Letters 3 (1983) 331–340.[20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes.Grundlehren der Mathematischen Wissenschaften, vol. 288(1987), SpringerVerlag, Berlin,D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201535 / 37Conclusions and ReferencesReferences VI[21] A. H. Jazwinski, Stochastic Processes and Filtering Theory,Academic Press, New York, 1970.[22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochasticdifferential equations for the non linear ﬁltering problem. Osaka J.Math. Volume 9, Number 1 (1972), 1940.[23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets ofProbability Distributions. Presented at the 1st InternationalSymposium on Imprecise Probabilities and Their Applications,Ghent, Belgium, 29 June  2 July 1999[24] R. Z. Khasminskii (1980). Stochastic Stability of DifferentialEquations. Alphen aan den Reijn[25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I,General Theory (Springer Verlag, Berlin, 1978).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201536 / 37Conclusions and ReferencesReferences VII[26] M. Murray and J. Rice  Differential geometry and statistics,Monographs on Statistics and Applied Probability 48, Chapmanand Hall, 1993.[27] D. Ocone, E. Pardoux, A Lie algebraic criterion for nonexistenceof ﬁnite dimensionally computable ﬁlters, Lecture notes inmathematics 1390, 197–204 (Springer Verlag, 1989)[28] Pistone, G., and Sempi, C. (1995). An Inﬁnite DimensionalGeometric Structure On the space of All the Probability MeasuresEquivalent to a Given one. The Annals of Statistics 23(5), 1995D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201537 / 37
Clustering, classification and Pattern Recognition in a set of data are between the most important tasks in statistical researches and in many applications. In this paper, we propose to use a mixture of Studentt distribution model for the data via a hierarchical graphical model and the Bayesian framework to do these tasks. The main advantages of this model is that the model accounts for the uncertainties of variances and covariances and we can use the Variational Bayesian Approximation (VBA) methods to obtain fast algorithms to be able to handle large data sets.

.Variational Bayesian Approximation method forClassiﬁcation and Clustering with a mixture ofStudentt modelAli MohammadDjafariLaboratoire des Signaux et Syst`mes (L2S)eUMR8506 CNRSCentraleSup´lecUNIV PARIS SUDeSUPELEC, 91192 GifsurYvette, Francehttp://lss.centralesupelec.frEmail: djafari@lss.supelec.frhttp://djafari.free.frhttp://publicationslist.org/djafariA. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 1/20Contents1. Mixture models2. Diﬀerent problems related to classiﬁcation and clusteringTrainingSupervised classiﬁcationSemisupervised classiﬁcationClustering or unsupervised classiﬁcation3. Mixture of Studentt4. Variational Bayesian Approximation5. VBA for Mixture of Studentt6. ConclusionA. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 2/20Mixture modelsGeneral mixture modelKak pk (xk θ k ),p(xa, Θ, K ) =0 < ak < 1k=1Same family pk (xk θ k ) = p(xk θ k ), ∀kGaussian p(xk θ k ) = N (xk µk , Σk ) with θ k = (µk , Σk )Data X = {xn , n = 1, · · · , N} where each element xn can bein one of these classes cn .ak = p(cn = k), a = {ak , k = 1, · · · , K },Θ = {θ k , k = 1, · · · , K }Np(Xn , cn = ka, θ) =p(xn , cn = ka, θ).n=1A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 3/20Diﬀerent problemsTraining:Given a set of (training) data X and classes c, estimate theparameters a and Θ.Supervised classiﬁcation:Given a sample xm and the parameters K , a and Θ determineits classk ∗ = arg max {p(cm = kxm , a, Θ, K )} .kSemisupervised classiﬁcation (Proportions are not known):Given sample xm and the parameters K and Θ, determine itsclassk ∗ = arg max {p(cm = kxm , Θ, K )} .kClustering or unsupervised classiﬁcation (Number of classes Kis not known):Given a set of data X, determine K and c.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 4/20TrainingGiven a set of (training) data X and classes c, estimate theparameters a and Θ.Maximum Likelihood (ML):(a, Θ) = arg max {p(X, ca, Θ, K )} .(a,Θ)Bayesian: Assign priors p(aK ) and p(ΘK ) = K p(θ k )k=1and write the expression of the joint posterior laws:p(a, ΘX, c, K ) =p(X, ca, Θ, K ) p(aK ) p(ΘK )p(X, cK )wherep(X, cK ) =p(X, ca, ΘK )p(aK ) p(ΘK ) da dΘInfer on a and Θ either as the Maximum A Posteriori (MAP)or Posterior Mean (PM).A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 5/20Supervised classiﬁcationGiven a sample xm and the parameters K , a and Θ determinep(cm = kxm , a, Θ, K ) =p(xm , cm = ka, Θ, K )p(xm a, Θ, K )where p(xm , cm = ka, Θ, K ) = ak p(xm θ k ) andKp(xm a, Θ, K ) =ak p(xm θ k )k=1Best class k ∗ :k ∗ = arg max {p(cm = kxm , a, Θ, K )}kA. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 6/20Semisupervised classiﬁcationGiven sample xm and the parameters K and Θ (not theproportions a), determine the probabilitiesp(cm = kxm , Θ, K ) =p(xm , cm = kΘ, K )p(xm Θ, K )wherep(xm , cm = kΘ, K ) =andp(xm , cm = ka, Θ, K )p(aK ) daKp(xm Θ, K ) =p(xm , cm = kΘ, K )k=1Best class k ∗ , for example the MAP solution:k ∗ = arg max {p(cm = kxm , Θ, K )} .kA. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 7/20Clustering or nonsupervised classiﬁcationGiven a set of data X, determine K and c.Determination of the number of classes:p(K = LX) =p(XK = L) p(K = L)p(X, K = L)=p(X)p(X)andL0p(X) =p(K = L) p(XK = L),L=1where L0 is the a priori maximum number of classes andLp(XK = L) =ak p(xn , cn = kθ k )p(aK ) p(ΘK ) da dΘn k=1When K and c are determined, we can also determine thecharacteristics of those classes a and Θ.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 8/20Mixture of Studentt modelStudentt and its Inﬁnite Gaussian Scaled Model (IGSM):∞T (xν, µ, Σ) =0ν νN (xµ, z −1 Σ) G(z , ) dz2 2where11N (xµ, Σ)= 2πΣ− 2 exp − 2 (x − µ) Σ−1 (x − µ)11= 2πΣ− 2 exp − 2 Tr (x − µ)Σ−1 (x − µ)andG(zα, β) =β α α−1zexp [−βz] .Γ(α)Mixture of Studentt:Kp(x{νk , ak , µk , Σk , k = 1, · · · , K }, K ) =ak T (xn νk , µk , Σk ).k=1A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 9/20Mixture of Studentt modelIntroducing znk , zk = {znk , n = 1, · · · , N}, Z = {znk },c = {cn , n = 1, · · · , N},θ k = {νk , ak , µk , Σk }, Θ = {θ k , k = 1, · · · , K }Assigning the priorsp(Θ) = k p(θ k ), we can write:p(X, c, Z, ΘK ) =nk−1ak N (xn µk , zn,k Σk ) G(znk  ν2k , ν2k ) p(θ k )Joint posterior law:p(c, Z, ΘX, K ) =p(X, c, Z, ΘK ).p(XK )The main task now is to propose some approximations to it insuch a way that we can use it easily in all the abovementioned tasks of classiﬁcation or clustering.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 10/20Variational Bayesian Approximation (VBA)Main idea: to propose easy computational approximationq(c, Z, Θ) for p(c, Z, ΘX, K ).Criterion: KL(q : p)Interestingly, by noting thatp(c, Z, ΘX, K ) = p(X, c, Z, ΘK )/p(XK )we have:KL(q : p) = −F(q) + ln p(XK )whereF(q) = − ln p(X, c, Z, ΘK )qis called free energy of q and we have the following properties:– Maximizing F(q) or minimizing KL(q : p) are equivalentand both give un upper bound to the evidence of the modelln p(XK ).– When the optimum q ∗ is obtained, F(q ∗ ) can be used as acriterion for model selection.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 11/20VBA: choosing the good familiesUsing KL(q : p) has the very interesting property that using qto compute the means we obtain the same values if we haveused p (Conservation of the means).Unfortunately, this is not the case for variances or othermoments.If p is in the exponential family, then choosing appropriateconjugate priors, the structure of q will be the same and wecan obtain appropriate fast optimization algorithms.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 12/20Hierarchical graphical modelξ0γ0 , Σ0µ0 , η0k0 dccc© d E µkαkaβk Σk¨d d ¨ d©©d ¨¨¨%¨EznkxnFigure : Graphical representation of the model.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 13/20VBA for mixture of StudenttIn our case, noting thatp(xn , cn , znk ak , µk , Σk , νk )p(X, c, Z, ΘK ) =nk[p(αk ) p(βk ) p(µk Σk ) p(Σk )]kwith−1p(xn , cn , znk ak , µk , Σk , νk ) = N (xn µk , zn,k Σk ) G(znk αk , βk )is separable, in one side for [c, Z] and in other size incomponents of Θ, we propose to useq(c, Z, Θ) = q(c, Z) q(Θ).A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 14/20VBA for mixture of StudenttWith this decomposition, the expression of theKullbackLeibler divergence becomes:KL(q1 (c, Z)q2 (Θ) : p(c, Z, ΘX, K ) =q1 (c, Z)q2 (Θ) lncq1 (c, Z)q2 (Θ)dΘ dZp(c, Z, ΘX, K )The expression of the Free energy becomes:F(q1 (c, Z)q2 (Θ)) =q1 (c, Z)q2 (Θ) lncA. MohammadDjafari,VBA for Classiﬁcation and Clustering...,p(X, c, ZΘ, K )p(ΘK )dΘ dZq1 (c, Z)q2 (Θ)GSI2015, October 2830, 2015, Polytechnique, France 15/20Proposed VBA for Mixture of Studentt priors modelUsing a generalized Studentt obtained by replacingG(zn,k  ν2k , ν2k ) by G(zn,k αk , βk ) it will be easier to proposeconjugate priors for αk , βk than for νk .−1p(xn , cn = k, znk ak , µk , Σk , αk , βk , K ) = ak N (xn µk , zn,k Σk ) G(zn,k αk , βk ).In the following, noting byΘ = {(ak , µk , Σk , αk , βk ), k = 1, · · · , K },we propose to use the factorized prior laws:[p(αk ) p(βk ) p(µk Σk ) p(Σk )]p(Θ) = p(a)kwith the following components: p(a) = D(ak0 ), k0 = [k0 , · · · , k0 ] = k0 1 p(α ) = E(α ζ ) = G(α 1, ζ )0kk 0kp(βk ) = E(βk ζ0 ) = G(αk 1, ζ0 ) p(µ Σk ) = N (µ µ0 1, η −1 Σk )kk0p(Σk ) = IW(Σk γ0 , γ0 Σ0 )A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 16/20Proposed VBA for Mixture of Studentt priors modelwhereD(ak) =Γ(kk )l Γ(kl )alkl −1llis the Dirichlet pdf,E(tζ0 ) = ζ0 exp [−ζ0 t]is the Exponential pdf,G(ta, b) =b a a−1texp [−bt]Γ(a)is the Gamma pdf andIW(Σγ, γ∆) =1 2 ∆γ/2 exp − 1 Tr ∆Σ−12ΓD (γ/2)Σγ+D+12.is the inverse Wishart pdf.With these prior laws and the likelihood: joint posterior law:p(X, c, Z, Θ)pk (c, Z, ΘX) =.p(X)A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 17/20Expressions of qq(c, Z, Θ) = q(c, Z) q(Θ) =nk [q(cn= kznk ) q(znk )]k [q(αk ) q(βk ) q(µk Σk ) q(Σk )] q(a).with:˜˜˜˜ q(a) = D(ak), k = [k1 , · · · , kK ] q(αk ) = G(αk ζk , ηk )˜ ˜˜ ˜q(βk ) = G(βk ζk , ηk )˜ q(µk Σk ) = N (µk µ, η −1 Σk )q(Σk ) = IW(Σk ˜ , γ Σ)γ ˜˜With these choices, we haveF(q(c, Z, Θ)) = ln p(X, c, Z, ΘK )q(c,Z,Θ)F1kn +=kF1knFA. MohammadDjafari,= ln p(xn , cn , znk , θ k )= ln p(x , c , znk , θ k )2kn nVBA for Classiﬁcation and Clustering...,nF2kkq(cn =kznk )q(znk )θq( k 2830, 2015, Polytechnique, France 18/20GSI2015, October )VBA Algorithm stepExpressions of the updating expressions of the tilded parametersare obtained by following three steps:E step: Optimizing F with respect to q(c, Z) when keepingq(Θ) ﬁxed, we obtain the expression of q(cn = kznk ) = ˜k ,aq(znk ) = G(znk αk , βk ).M step: Optimizing F with respect to q(Θ) when keepingq(c, Z) ﬁxed, we obtain the expression of˜˜˜˜˜ ˜q(a) = D(ak), k = [k1 , · · · , kK ], q(αk ) = G(αk ζk , ηk ),˜k , ηk ), q(µk Σk ) = N (µk µ, η −1 Σk ), andq(βk ) = G(βk ζ ˜˜q(Σk ) = IW(Σk ˜ , γ Σ), which gives the updating algorithmγ ˜˜for the corresponding tilded parameters.F evaluation: After each E step and M step, we can alsoevaluate the expression of F(q) which can be used forstopping rule of the iterative algorithm.Final value of F(q) for each value of K , noted Fk , can beused as a criterion for model selection, i.e.; the determinationof the number of clusters.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 19/20ConclusionsClustering and classiﬁcation of a set of data are between themost important tasks in statistical researches for manyapplications such as data mining in biology.Mixture models and in particular Mixture of Gaussians areclassical models for these tasks.We proposed to use a mixture of generalised Studenttdistribution model for the data via a hierarchical graphicalmodel.To obtain fast algorithms and be able to handle large datasets, we used conjugate priors everywhere it was possible.The proposed algorithm has been used for clustering,classiﬁcation and discriminant analysis of some biological data(Cancer research related), but in this paper, we only presentedthe main algorithm.A. MohammadDjafari,VBA for Classiﬁcation and Clustering...,GSI2015, October 2830, 2015, Polytechnique, France 20/20
The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix in order to draw a parallel coordinate plot. In this paper, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a geometrical viewpoint. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are restricted to be fullrank.

What is textile plot?Textile setMain resultOther resultsSummaryGeometric Properties of textile plotTomonari SEI and Ushio TANAKAUniversity of Tokyo and Osaka Prefecture University´at Ecole Polytechnique, Oct 28, 20151 / 23What is textile plot?Textile setMain resultOther resultsSummaryIntroductionThe textile plot proposed by Kumasaka and Shibata (2008) isa method for data visualization.The method transforms a data matrix into another matrix,Rn×pX → Y ∈ Rn×p ,in order to draw a parallel coordinate plot.The parallel coordinate plot is a standard 2dimensionalgraphical tool for visualizing multivariate data at a glance.In this talk, we investigate a set of matrices induced by thetextile plot, which we call the textile set, from a diﬀerentialgeometrical point of view.It is shown that the textile set is written as the union of twodiﬀerentiable manifolds if data matrices are “generic”.2 / 23What is textile plot?Textile setMain resultOther resultsSummaryIntroductionThe textile plot proposed by Kumasaka and Shibata (2008) isa method for data visualization.The method transforms a data matrix into another matrix,Rn×pX → Y ∈ Rn×p ,in order to draw a parallel coordinate plot.The parallel coordinate plot is a standard 2dimensionalgraphical tool for visualizing multivariate data at a glance.In this talk, we investigate a set of matrices induced by thetextile plot, which we call the textile set, from a diﬀerentialgeometrical point of view.It is shown that the textile set is written as the union of twodiﬀerentiable manifolds if data matrices are “generic”.2 / 23What is textile plot?Textile set1Main result4Other results5SummaryTextile set3Other resultsWhat is textile plot?2Main resultSummary3 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotExample (Kumasaka and Shibata, 2008)7.9Textile plot for the iris data.(150 cases, 5 attributes)6.92.52Each variate is transformedby a locationscaletransformation.virginicaversicolorCategorical data isquantiﬁed.Missing data is admitted.setosa0.14.4idWal.Leal.Petal.SepngWidgtenl.LpaSeththhsieecSpth14.3PetOrder of axes can bemaintained.4 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotExample (Kumasaka and Shibata, 2008)7.9Textile plot for the iris data.(150 cases, 5 attributes)6.92.52Each variate is transformedby a locationscaletransformation.virginicaversicolorCategorical data isquantiﬁed.Missing data is admitted.setosa0.14.4idWal.Leal.Petal.SepngWidgtenl.LpaSeththhsieecSpth14.3PetOrder of axes can bemaintained.4 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coeﬃcients aj and bj are determined by the followingprocedure.5 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coeﬃcients aj and bj are determined by the followingprocedure.5 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coeﬃcients aj and bj are determined by the followingprocedure.5 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile plotCoeﬃcients a = (aj ) and b = (bj )are the solution of the followingminimization problem:pn∑∑Minimize(ytj − yt· )2¯a,byt4t=1 j=1subject to yj = aj + bj xj ,p∑yjj=1Intuition: as horizontal as possible.2yt3yt.= 1.yt5yt1yt2Solution: a = 0 and b is theeigenvector corresponding to themaximum eigenvalue of thecovariance matrix of X.6 / 23What is textile plot?Textile setMain resultOther resultsExample (n = 100, p = 4)X ∈ R100×4 . Each row ∼ N(0, Σ), Σ =3.271−0.60.50.1−0.61−0.6−0.20.5−0.610.0Summary0.1−0.2.0.01−3.932.982.432.232.982.43−2.582.23−2.71−2.72−2.58−2.71−3.93(a) raw data X−2.723.27(b) textile plot Y7 / 23What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisﬁes two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following deﬁnition of the textile set.8 / 23What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisﬁes two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following deﬁnition of the textile set.8 / 23What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisﬁes two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following deﬁnition of the textile set.8 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile setDeﬁnitionThe textile set is deﬁned byTn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is deﬁned by∑Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile setDeﬁnitionThe textile set is deﬁned byTn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is deﬁned by∑Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile setDeﬁnitionThe textile set is deﬁned byTn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is deﬁned by∑Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile setDeﬁnitionThe textile set is deﬁned byTn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is deﬁned by∑Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23What is textile plot?Textile setMain resultOther resultsSummaryTextile setDeﬁnitionThe textile set is deﬁned byTn,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is deﬁned by∑Un,p = { Y ∈ Rn×p  ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23What is textile plot?Textile setMain resultOther resultsSummaryTn,p with small pLemma (p = 1)Tn,1 = Sn−1 , the unit sphere.Lemma (p = 2)Tn,2 = A ∪ B, where√A = {(y1 , y2 )  y1 = y2 = 1/ 2},B = {(y1 , y2 )  y1 − y2 = y1 + y2 = 1},each of which is diﬀeomorphic to Sn−1 × Sn−1 . Their intersectionA ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2 .→ See next slide for n = p = 2 case.10 / 23What is textile plot?Textile setMain resultOther resultsSummaryTn,p with small pLemma (p = 1)Tn,1 = Sn−1 , the unit sphere.Lemma (p = 2)Tn,2 = A ∪ B, where√A = {(y1 , y2 )  y1 = y2 = 1/ 2},B = {(y1 , y2 )  y1 − y2 = y1 + y2 = 1},each of which is diﬀeomorphic to Sn−1 × Sn−1 . Their intersectionA ∩ B is diﬀeomorphic to the Stiefel manifold Vn,2 .→ See next slide for n = p = 2 case.10 / 23What is textile plot?Textile setMain resultOther resultsSummaryExample (n = p = 2)T2,2 ⊂ R4 is the union of two tori, glued along O(2).ηφξθ{T2,2 =1√2(cos θsin θ)}cos φsin φ{ ()}1 cos ξ + cos η cos ξ − cos η∪2 sin ξ + sin η sin ξ − sin η11 / 23What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we deﬁne two concepts: noncompactStiefel manifold and canonical form.Deﬁnition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices:V ∗ := { Y ∈ Rn×p  rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the GramSchmidt orthonormalization, the quotient spaceV ∗ /O(n) is identiﬁed with uppertriangular matrices withpositive diagonals. → see next slide.12 / 23What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we deﬁne two concepts: noncompactStiefel manifold and canonical form.Deﬁnition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices:V ∗ := { Y ∈ Rn×p  rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the GramSchmidt orthonormalization, the quotient spaceV ∗ /O(n) is identiﬁed with uppertriangular matrices withpositive diagonals. → see next slide.12 / 23What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we deﬁne two concepts: noncompactStiefel manifold and canonical form.Deﬁnition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column fullrank matrices:V ∗ := { Y ∈ Rn×p  rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the GramSchmidt orthonormalization, the quotient spaceV ∗ /O(n) is identiﬁed with uppertriangular matrices withpositive diagonals. → see next slide.12 / 23What is textile plot?Textile setMain resultOther resultsSummaryNoncompact Stiefel manifold and canonical formDeﬁnition (Canonical form)Let us denote by V ∗∗ the set of all matrices written asy11 · · · y1p. . 0 .... ... .. ypp , y > 0, 1 ≤ i ≤ p. .ii 0 ··· 0 .. . ...0 ··· 0We call it a canonical form.Note that V ∗∗ ⊂ V ∗ and V ∗ /O(n)V ∗∗ .13 / 23What is textile plot?Textile setMain resultOther resultsSummaryNoncompact Stiefel manifold and canonical formDeﬁnition (Canonical form)Let us denote by V ∗∗ the set of all matrices written asy11 · · · y1p. . 0 .... ... .. ypp , y > 0, 1 ≤ i ≤ p. .ii 0 ··· 0 .. . ...0 ··· 0We call it a canonical form.Note that V ∗∗ ⊂ V ∗ and V ∗ /O(n)V ∗∗ .13 / 23What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : noncompact Stiefel manifold,V ∗∗ : set of canonical forms.DeﬁnitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identiﬁed with Un,p .∗∗So it is essential to study Un,p .14 / 23What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : noncompact Stiefel manifold,V ∗∗ : set of canonical forms.DeﬁnitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identiﬁed with Un,p .∗∗So it is essential to study Un,p .14 / 23What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : noncompact Stiefel manifold,V ∗∗ : set of canonical forms.DeﬁnitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identiﬁed with Un,p .∗∗So it is essential to study Un,p .14 / 23What is textile plot?Textile setMain resultOther resultsSummary∗∗Un,p for small pLet us check examples.Example (n = p = 1)∗∗U1,1 = {(1)}.Example (n = p = 2)()y11 y12Let Y =with y11 , y22 > 0. Then0 y22∗∗222U2,2 = {y12 = 0} ∪ {y11 = y12 + y22 },union of a plane and a cone.15 / 23What is textile plot?Textile setMain resultOther resultsSummary∗∗Un,p for small pLet us check examples.Example (n = p = 1)∗∗U1,1 = {(1)}.Example (n = p = 2)()y11 y12Let Y =with y11 , y22 > 0. Then0 y22∗∗222U2,2 = {y12 = 0} ∪ {y11 = y12 + y22 },union of a plane and a cone.15 / 23What is textile plot?Textile setMain resultOther resultsSummaryMain theorem∗∗The diﬀerential geometrical property of Un,p is given as follows:TheoremLet n ≥ p ≥ 3. Then we have the following decomposition∗∗Un,p = M1 ∪ M2 ,where each Mi is a diﬀerentiable manifold, the dimensions of whichare given byp(p + 1)− (p − 1),2p(p + 1)dim M2 =− p,2dim M1 =respectively. M2 is connected while M1 may not.16 / 23What is textile plot?Textile setMain resultOther resultsSummaryExample∗∗U3,3 is the union of 4dim and 3dim manifolds.We look at a cross section with y11 = y22 = 1:y13y33y12Union of a surface and a vertical line.17 / 23What is textile plot?Textile setMain resultOther resultsSummaryCorollaryLet n ≥ p ≥ 3. Then we have∗Un,p = π −1 (M1 ) ∪ π −1 (M2 ),where π denotes the map of GramSchmidt orthonormalization.The dimensions aredim π −1 (M1 ) = np − (p − 1),dim π −1 (M2 ) = np − p.18 / 23What is textile plot?Textile setMain resultOther resultsSummaryOther resultsWe state other results. First we have n = 1 case.LemmaIf n = 1, then the textile set T1,p is the union of a(p − 2)dimensional manifold and 2(2p − 1) isolated points.Example∗∗U1,3 consists of a circle and 14 points:∗∗U1,3 = (S 2 ∩ {y1 + y2 + y3 = 1})111111111∪ {±( √3 , √3 , √3 ), ±( √2 , √2 , 0), ±( √2 , 0, √2 ), ±(0, √2 , √2 ),± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)}.19 / 23What is textile plot?Textile setMain resultOther resultsSummaryOther resultsWe state other results. First we have n = 1 case.LemmaIf n = 1, then the textile set T1,p is the union of a(p − 2)dimensional manifold and 2(2p − 1) isolated points.Example∗∗U1,3 consists of a circle and 14 points:∗∗U1,3 = (S 2 ∩ {y1 + y2 + y3 = 1})111111111∪ {±( √3 , √3 , √3 ), ±( √2 , √2 , 0), ±( √2 , 0, √2 ), ±(0, √2 , √2 ),± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)}.19 / 23What is textile plot?Textile setMain resultOther resultsSummaryDiﬀerential geometrical characterization of fλ −1 (O)Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by∑y1 yj − λ y1 2 j....fλ (y1 , . . . , yp ) := ∑2 j yp yj − λ yp ∑2−1j yjLemmaWe have a classiﬁcation of Tn,p , namelyfλ −1 (O) =Tn,p =λ≥0fλ −1 (O).0≤λ≤n20 / 23What is textile plot?Textile setMain resultOther resultsSummaryDiﬀerential geometrical characterization of fλ −1 (O)Fix λ ≥ 0 arbitrarily. We deﬁne the map fλ : Rn×p → Rp+1 by∑y1 yj − λ y1 2 j....fλ (y1 , . . . , yp ) := ∑2 j yp yj − λ yp ∑2−1j yjLemmaWe have a classiﬁcation of Tn,p , namelyfλ −1 (O) =Tn,p =λ≥0fλ −1 (O).0≤λ≤n20 / 23What is textile plot?Textile setMain resultOther resultsSummaryDiﬀerential geometrical characterization of fλ −1 (O)Lastly, we state a characterization of fλ −1 (O) from the viewpointof diﬀerential geometry.TheoremLet λ ≥ 0. fλ −1 (O) is a regular submanifold of Rn×p withcodimension p + 1 wheneverλ > 0,y11 yjj − y1j yj1 = 0,∃ ∈ { 2, . . . , p };p∑j = 2, . . . , p,yij + yi (1 − 2λ) = 0,i = 1, . . . , n.j=221 / 23What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We deﬁned the textile set Tn,p and ﬁnd its geometricproperties.Present and future study:1Characterize the classiﬁcation fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate diﬀerential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one ﬁnd statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We deﬁned the textile set Tn,p and ﬁnd its geometricproperties.Present and future study:1Characterize the classiﬁcation fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate diﬀerential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one ﬁnd statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We deﬁned the textile set Tn,p and ﬁnd its geometricproperties.Present and future study:1Characterize the classiﬁcation fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate diﬀerential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one ﬁnd statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23What is textile plot?Textile setMain resultOther resultsSummaryReferences1 Absil, P.A., Mahony, R., and Sepulchre, R. (2008), OptimizationAlgorithms on Matrix Manifolds, Princeton University Press..2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot,Proceedings of the Institute of Statistical Mathematics, 55, 69–83.3 Inselberg, A. (2009), Parallel Coordinates: VISUAL MultidimensionalGeometry and its Applications, Springer..4 Kumasaka, N. and Shibata, R. (2008), Highdimensional datavisualisation: The textile plot, Computational Statistics and DataAnalysis, 52, 3616–3644..23 / 23
In anomalous statistical physics, deformed algebraic structures are important objects. Heavily tailed probability distributions, such as Student’s tdistributions, are characterized by deformed algebras. In addition, deformed algebras cause deformations of expectations and independences of random variables. Hence, a generalization of independence for multivariate Student’s tdistribution is studied in this paper. Even if two random variables which follow to univariate Student’s tdistributions are independent, the joint probability distribution of these two distributions is not a bivariate Student’s tdistribution. It is shown that a bivariate Student’s tdistribution is obtained from two univariate Student’s tdistributions under qdeformed independence.

A generalization of independence andmultivariate Student’s tdistributionsMATSUZOE HiroshiNagoya Institute of Technologyjoint works withSAKAMOTO Monta123456(Efrei, Paris)Deformed exponential familyNonadditive diﬀerentials and expectation functionalsGeometry of deformed exponential familiesGeneralization of independenceqindependence and Student’s tdistributionsAppendixNotions of expectations, independence are determinedfrom the choice of statistical models.Probability density function
Hessian Information Geometry (chaired by ShunIchi Amari, Michel Boyom)
We define a metric and a family of αconnections in statistical manifolds, based on ϕdivergence, which emerges in the framework of ϕfamilies of probability distributions. This metric and αconnections generalize the Fisher information metric and Amari’s αconnections. We also investigate the parallel transport associated with the αconnection for α = 1.

2nd Conference on Geometric Science of Information, GSI2015October 28–30, 2015 – Ecole Polytechnique, ParisSaclayNew Metric and Connectionsin Statistical ManifoldsRui F. Vigelis,1 David C. de Souza,2and Charles C. Cavalcante31 32Federal University of Ceará – BrazilFederal Institute of Ceará – BrazilSession “Hessian Information Geometry ”, October 28OutlineIntroductionϕFunctionsϕDivergenceGeneralized Statistical ManifoldConnectionsϕFamiliesDiscussionIntroductionIn the paperR.F. Vigelis, C.C. Cavalcante. On ϕfamilies of probabilitydistributions. J. Theor. Probab., 26(3):870–884, 2013,the authors proposed the so called ϕdivergenceDϕ (p q),for p, q ∈ Pµ .The ϕdivergence is deﬁned in terms of a ϕfunction.The metric and connections that we propose is derived fromthe ϕdivergence Dϕ (· ·).IntroductionThe proposition of new geometric structures (metric andconnections) in statistical manifolds is a recurrent researchtopic.To cite a few:J. Zhang. Divergence function, duality, and convex analysis.Neural Computation, 16(1): 159–195, 2004.J. Naudts. Estimators, escort probabilities, and φexponentialfamilies in statistical physics. JIPAM, 5(4): Paper No. 102, 15p., 2004.S.i. Amari, A. Ohara, H. Matsuzoe. Geometry of deformedexponential families: invariant, duallyﬂat and conformalgeometries. Physica A, 391(18): 4308–4319, 2012.H. Matsuzoe. Hessian structures on deformed exponentialfamilies and their conformal structures. Diﬀerential Geom.Appl, 35(suppl.): 323–333, 2014.IntroductionLet (T , Σ, µ) be a measure space.All probability distributions will be consideredˆ0Pµ = p ∈ L : p > 0 andpdµ = 1 ,Twhere L0 denotes the set of all realvalued, measurablefunctions on T , with equality µa.e.ϕFunctionsA function ϕ : R → (0, ∞) is said to be a ϕfunction if thefollowing conditions are satisﬁed:(a1) ϕ(·) is convex;(a2) limu→−∞ ϕ(u) = 0 and limu→∞ ϕ(u) = ∞;(a3) there exists a measurable function u0 : T → (0, ∞) such thatˆϕ(c(t) + λu0 (t))dµ < ∞, for all λ > 0,Tfor each measurable function c : T → R such that ϕ(c) ∈ Pµ .Not all functions satisfying (a1) and (a2) admit the existenceof u0 .Condition (a3) is imposed so that ϕfamilies areparametrizations for Pµ in the same manner as exponentialfamilies.ϕFunctionsThe κexponential function expκ : R → (0, ∞), for κ ∈ [−1, 1],which is given by√(κu + 1 + κ2 u 2 )1/κ , if κ = 0,expκ (u) =exp(u),if κ = 0,is a ϕfunction.The qexponential function11−qexpq (u) = [1 + (1 − q)u]+ ,where q > 0 and q = 1,is not a ϕfunction (expq (u) = 0 for u < 1/(1 − q)).A ϕfunction ϕ(·) may not be a φexponential functionexpφ (·), which is deﬁned as the inverse ofˆ u1lnφ (u) =dx,u > 0,1 φ(x)for some increasing function φ : [0, ∞) → [0, ∞).ϕDivergenceWe deﬁne the ϕdivergence asˆϕ−1 (p) − ϕ−1 (q)dµ(ϕ−1 ) (p),Dϕ (p q) = T ˆu0dµ−1T (ϕ ) (p)for any p, q ∈ Pµ .If ϕ(·) = exp(·) and u0 = 1 then Dϕ (p q) coincides with theKullback–Leibler divergenceˆpdµ.DKL (p q) =p logqTGeneralized Statistical ManifoldA metric (gij ) can be derived from the ϕdivergence:∂∂∂θi p ∂θj∂ 2 fθ= −Eθ,∂θi ∂θjgij = −qDϕ (p q)q=pwhere fθ = ϕ−1 (pθ ) and´(·)ϕ (fθ )dµEθ [·] = ´ T.T u0 ϕ (fθ )dµConsidering the loglikelihood function lθ = log(pθ ) in theplace of fθ = ϕ−1 (pθ ), we get the Fisher information matrix.Generalized Statistical ManifoldA family o probability distributions P = {pθ : θ ∈ Θ} ⊆ Pµ is saidto be a generalized statistical manifold if the following conditionsare satisﬁed:(P1) Θ is a domain (an open and connected set) in Rn .(P2) p(t; θ) = pθ (t) is a diﬀerentiable function with respect to θ.(P3) The operations of integration with respect to µ anddiﬀerentiation with respect to θi commute.(P4) The matrix g = (gij ), which is deﬁned bygij = −Eθ∂ 2 fθ,∂θi ∂θjis positive deﬁnite at each θ ∈ Θ.Generalized Statistical ManifoldThe matrix (gij ) can also be expressed asgij = Eθ´(·)ϕ (fθ )dµwhere Eθ [·] = ´T.T u0 ϕ (fθ )dµ∂fθ ∂fθ,∂θi ∂θjAs consequence, the mappingaiX =i∂→X =∂θiaii∂fθ∂θiis an isometry between the tangent space Tθ P at pθ andTθ P = span∂fθ: i = 1, . . . , n ,∂θiequipped with the inner product X , Yθ= Eθ [X Y ].ConnectionsWe use the ϕdivergence Dϕ (· ·) to deﬁne a pair of mutuallydual connections D (1) and D (−1) , whose Christoﬀel symbolsare given by(1)Γijk = −∂2∂θi ∂θjp∂∂θkq∂∂θk∂2∂θi ∂θjqDϕ (p q)q=pand(−1)Γijk=−pDϕ (p q)q=p.Connections D (1) and D (−1) correspond to the exponential emixture connections.Connections(1)(−1)Expressions for the Christoﬀel symbols Γijk and Γijk(1)Γijk = Eθare given by∂ 2 fθ ∂fθ∂ 2 fθ∂fθ− EθE u0 k∂θi ∂θj ∂θk∂θi ∂θj θ∂θand(−1)Γijkwhere∂ 2 fθ ∂fθ∂fθ ∂fθ ∂fθ+ Eθ∂θi ∂θj ∂θk∂θi ∂θj ∂θk∂fθ∂fθ∂fθ ∂fθ∂fθ ∂fθ− EθEθ u0 i − EθEθ u0 j ,j ∂θ ki ∂θ k∂θ∂θ∂θ∂θ= Eθ´(·)ϕ (fθ )dµ.Eθ [·] = ´TT u0 ϕ (fθ )dµTerms in red vanish if ϕ(·) = exp(·) and u0 = 1.ConnectionsUsing the pair of mutually dual connections D (1) and D (−1) ,we can specify a family of αconnections D (α) in generalizedstatistical manifolds, whose Christoﬀel symbols are(α)Γijk =1 + α (1) 1 − α (−1)Γijk +Γijk .22The connections D (α) and D (−α) are mutually dual.For α = 0 , the connection D (0) , which is clearly selfdual.corresponds to the Levi–Civita connection .ϕFamiliesA parametric ϕfamily Fp = {pθ : θ ∈ Θ} centered atp = ϕ(c) is deﬁned bynθi ui (t) − ψ(θ)u0 (t) ,pθ (t) := ϕ c(t) +i=1where ψ : Θ → [0, ∞) is a normalizing function.The functions satisfy some conditions, which imply ψ ≥ 0.The domain Θ can be chosen to be maximal.If ϕ(·) = exp(·) and u0 = 1, then Fp corresponds to anexponential family.ϕFamiliesThe normalizing function and ϕdivergence are related byψ(θ) = Dϕ (p pθ ).The matrix (gij ) is the Hessian of the normalizing function ψ:gij =∂2ψ.∂θi ∂θjAs a result,(0)Γijk =1 ∂gij1 ∂2ψ=.2 ∂θk2 ∂θi ∂θj ∂θjϕFamilies(1)In ϕfamilies, the Christoﬀel symbols Γijk vanish identically,i.e., (θi ) is an aﬃne coordinate system, and the connectionD (1) is ﬂat (and D (−1) is also ﬂat).Thus Fp admits a coordinate system (ηj ) that is dual to (θi ),and there exist potential functions ψ and ψ ∗ such thatθi =∂ψ ∗,∂ηiηj =∂ψ,∂θjandψ(p) + ψ ∗ (p) =θi (p)ηi (p).iDiscussion(1)(−1)Advantages of (gij ), and Γijk , ΓijkDϕ (· ·):being derived fromDuality.Pythagorean Relation.Projection Theorem.Open questions:An example of generalized statistical manifold whosecoordinate system is D (−1) ﬂat.Parallel transport with respect to D (−1) .Divergence or ϕfunction associated with αconnections.EndThank you!
Curvature properties for statistical structures are studied. The study deals with the curvature tensor of statistical connections and their duals as well as the Ricci tensor of the connections, Laplacians and the curvature operator. Two concepts of sectional curvature are introduced. The meaning of the notions is illustrated by presenting few exemplary theorems.

Curvatures of statistical structuresBarbara OpozdaParis, October 2015Barbara Opozda ()Curvatures of statistical structuresParis, October 20151 / 29Statistical structures  statistical settingM  open subset of RnΛ  probability space with a ﬁxed σalgebrap : M × Λ (x, λ) → p(x, λ) ∈ R  smooth relative to x such thatpx (λ) := p(x, λ) is a probability measure on Λ — probability distribution(x, λ) := log (p(x, λ))gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to theprobability px ∀x ∈ M, ∂1 , ..., ∂n  the canonical frame on Mg – Fisher information metric tensor ﬁeld on MCijk (x) = Ex [(∂i )(∂j )(∂k )]  cubic form(g , C ) – statistical structure on MBarbara Opozda ()Curvatures of statistical structuresParis, October 20152 / 29Statistical structures (Codazzi structures)– geometric setting; threeequivalent deﬁnitionsM – manifold, dim M = nI) (g , C ), C  totally symmetric (0, 3)tensor ﬁeld on M, that is,C (X , Y , Z ) = C (Y , X , Z ) = C (Y , Z , X )∀X , Y , Z ∈ Tx M, x ∈ MC – cubic formII) (g , K ), K – symmetric (1, 2)tensor ﬁeld (i.e., K (X , Y ) = K (Y , X ))and symmetric relative to g , that is,g (X , K (Y , Z )) = g (Y , K (X , Z ))is symmetric for all arguments.C (X , Y , Z ) = g (X , K (Y , Z ))Barbara Opozda ()Curvatures of statistical structuresParis, October 20153 / 29III) (g , ), torsionfree connection such that(X g )(Y , Z )Y g )(X , Z )=((1)— statistical connectionT – any tensor ﬁeld of type (p, q) on M,T (X , Y1 , ..., Yq ) = (T – of type (p, q + 1)X T )(Y1 , ..., Yq )In particular, g (X , Y , Z ) = ( X g )(Y , Z )(1) ⇔ g is a symmetric cubic formˆ  LeviCivita connection for gK (X , Y ) :=XY− ˆXYK – diﬀerence tensorg (X , Y , Z ) = −2g (X , K (Y , Z )) = −2C (X , Y , Z )Barbara Opozda ()Curvatures of statistical structuresParis, October 20154 / 29A statistical structure is trivial if and only if K = 0 or equivalently C = 0or equivalently = ˆ .KX Y := K (X , Y )E := tr g K = K (e1 , e1 ) + ... + K (en , en ) = (tr Ke1 )e1 + ... + (tr Ken )enE – mean diﬀerence vector ﬁeldE =0⇔tr KX = 0∀X ∈ TM⇔tr g C (X , ·, ·) = 0∀X ∈ TME = 0 ⇒ tracefree statistical structureFact. (g , ) – tracefree if and only ifdetermined by gBarbara Opozda ()νg = 0, where νg – volume formCurvatures of statistical structuresParis, October 20155 / 29ExamplesRiemannian geometry of the second fundamental formM – locally strongly hypersurface in Rn+1– the second fundamental form h satisﬁes the Codazzi equationh(X , Y , Z ) =h(Y , X , Z ),whereis the induced connection (the LeviCivita connection of the ﬁrstfundamental form)(h, )  statistical structureSimilarly one gets statistical structures on hypersurfaces in space forms.Barbara Opozda ()Curvatures of statistical structuresParis, October 20156 / 29Equiaﬃne geometry of hypersurfaces in the standard aﬃne spaceRn+1M – locally strongly convex hypersurface in Rn+1ξ – a transversal vector ﬁeldD – standard ﬂat connection on Rn+1 , X , Y ∈ X (M), ξ  transversalvector ﬁeldDX Y =XY+ h(X , Y )ξ− Gauss formula– induced connection, h – second fundamental form (metric tensor ﬁeld)DX ξ = −SX + τ (X )ξ− Weingarten formulaIf τ = 0, ξ is called equiaﬃne. In this case the Codazzi equation is satisﬁedh(X , Y , Z ) =h(Y , X , Z )(h, ) – statistical structureBarbara Opozda ()Curvatures of statistical structuresParis, October 20157 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29Geometry of Lagrangian submanifolds in Kaehler manifoldsN – Kaehler manifold of real dimension 2n and with complex structure JM – Lagrangian submanifold of N  ndimensional submanifold such thatJTM orthogonal to TM, i.e. JTM is the normal bundle (in the metricsense) for M ⊂ ND – the Kaehler connection on NDX Y =XY+ JK (X , Y )g – induced metric tensor ﬁeld on M(g , K ) – statistical structureIt is tracefree ⇔ M is minimal in N.Barbara Opozda ()Curvatures of statistical structuresParis, October 201511 / 29Most of statistical structures are outside the three classes of examples. Forinstance, in order that a statistical structure is locally realizable on anequiaﬃne hypersurface it is necessary that is projectively ﬂat.Barbara Opozda ()Curvatures of statistical structuresParis, October 201512 / 29Dual connections, curvature tensorsg – metric tensor ﬁeld on M,– any connectionXg (Y , Z ) = g (XY , Z)+ g (Y ,XZ)(2)– dual connection(g , ) – statistical structure if and only if (g , ) – statistical structureR(X , Y )Z – (1, 3)  curvature tensor forIf R = 0 the structure is called HessianR(X , Y )Z – curvature tensor forg (R(X , Y )Z , W ) = −g (R(X , Y )W , Z )(3)In particular, R = 0 ⇔ R = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201513 / 29ˆ – LeviCivita connection for g ,= ˆ + K, = ˆ − KˆR – curvature tensor for ˆˆR(X , Y ) = R(X , Y ) +( ˆ X K )Y − ( ˆ Y K )X+ [KX , KY ](4)+ [KX , KY ](5),where[KX , KY ] = KX KY − KY KXˆR(X , Y ) = R(X , Y ) −( ˆ X K )Y + ( ˆ Y K )XˆR(X , Y ) + R(X , Y ) = 2R(X , Y ) + 2[KX , KY ]Barbara Opozda ()Curvatures of statistical structuresParis, October 2015(6)14 / 29Sectional curvaturesR does not have to be skewsymmetric relative to g , i.e.g (R(X , Y )Z , W ) = −g (R(X , Y )W , Z ), in general.Lemma *The following conditions are equivalent:1) g (R(X , Y )Z , W ) = −g (R(X , Y )W , Z ) ∀X , Y , Z , W2) R = R3) ˆ K is symmetric, that is,( ˆ K )(X , Y , Z ) = ( ˆ X K )(Y , Z ) = ( ˆ Y K )(X , Z ) = ( ˆ K )(Y , X , Z )∀X , Y , Z .For hypersurfaces in Rn+1 each of the above conditions describes an aﬃnesphereBarbara Opozda ()Curvatures of statistical structuresParis, October 201515 / 29R :=R+R2[K , K ](X , Y )Z := [KX , KY ]ZR(X , Y )Z and [K , K ](X , Y )Z are Riemanncurvaturelike tensors – theyare skewsymmetric in X , Y , satisfy the ﬁrst Bianchi identity,R(X , Y ), [K , K ](X , Y ) are skewsymmetric relative to g ∀X , Yπ – vector plane in Tx M, X , Y – orthonormal basis of πˆˆsectional curvature for g – k(π) := g (R(X , Y )Y , X )sectional K curvature – k(π) := g ([K , K ](X , Y )Y , X )sectionalcurvature – k (π) := g (R(X , Y )Y , X )Barbara Opozda ()Curvatures of statistical structuresParis, October 201516 / 29In general, Schur’s lemma does not hold for kand k. We have, however,LemmaAssume that M is connected, dim M > 2 and the sectional  curvature(the sectional K curvature) is pointwise constant. If one of the equivalentconditions in Lemma * holds then the sectional curvature (the sectionalK curvature) is constant on M.sectional K curvatureThe easiest situation which should be taken into account is when thesectional K curvature is constant for all vector planes in Tx M. In thisrespect we haveBarbara Opozda ()Curvatures of statistical structuresParis, October 201517 / 29TheoremIf the sectional K curvature is constant and equal to A for all vectorplanes in Tx M then there is an orthonormal basis e1 , ..., en of Tx M andnumbers λ1 , ..., λn , µ1 , ..., µn−1 such thatµ1...K e1 = Ke = µ1iµ1λ1µ1...µi−1λi· · · µi−1µi...µiK en=µ1 · · · µn−1Barbara Opozda ()µ1. . . µn−1 λnCurvatures of statistical structuresParis, October 201518 / 29continuation of the theoremMoreoverµi =λi −λ2 − 4Ai−1i2,Ai = Ai−1 − µ2 ,ifor i = 1, ..., n − 1 where A0 = A. The above representation of K is notunique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µifor i = 1, ..., n − 1 are expressed as followsλi = (n − i)−Ai−1,n−i +1µi = −−Ai−1.n−i +1In particular, in the last case the numbers λi , µi depend only on A and thedimension of M.Barbara Opozda ()Curvatures of statistical structuresParis, October 201519 / 29Example 1.K e1= Ke = λ/2 · · · 0iλ/2λλ/2...λ/2...000...0K en=λ/2. . . λ/2 · · · 00 0The sectional K curvature is constant = λ2 /4Barbara Opozda ()Curvatures of statistical structuresParis, October 201520 / 29Example 2.K curvature vanishes, i.e. [K , K ] = 0. There is an orthonormal framee1 , ..., e1 such that0...K e1 = Ke = 0i0λ10...0· · · 0 λi0...0K enBarbara Opozda ()=0...00 · · · 0 λnCurvatures of statistical structuresParis, October 201521 / 29Some theorems on the sectional K curvature(g , K ) – tracefree if E = tr g K = 0TheoremLet (g , K ) be a tracefree statistical structure on M with symmetric ˆ K .If the sectional K curvature is constant then either K = 0 (the statisticalˆstructure is trivial) or R = 0 and ˆ K = 0.TheoremˆLet ˆ K = 0. Each of the following conditions implies that R = 0:1) the sectional K curvature is negative,2) [K,K]=0 and K is nondegenerate, i.e. X → KX is a monomorphism.Barbara Opozda ()Curvatures of statistical structuresParis, October 201522 / 29TheoremK is as in Example 1. at each point of M, ˆ K is symmetric, div E isconstant on M (E = tr g K ). Then the sectional curvature for g by anyplane containing E is nonpositive. Moreover, if M is connected it isconstant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g ) byany plane containing E vanishes.TheoremIf the sectional K curvature is nonpositive on M and [K , K ] · K = 0 thenthe sectional K curvature vanishes on M.CorollaryIf (g , K ) is a Hessian structure on M with nonnegative sectional curvatureˆˆof g and such that R · K = 0 then R = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201523 / 29TheoremˆˆThe sectional K curvature is negative on M, R · K = 0. Then R = 0.TheoremLet M be a Lagrangian submanifold of N, where N is a Kaehler manifoldof constant holomorphic curvature 4c, the sectional curvature of the ﬁrstˆfundamental form g on M is smaller than c on M and R · K = 0, where Kˆ = 0.is the second fundamental tensor of M ⊂ N. Then RBarbara Opozda ()Curvatures of statistical structuresParis, October 201524 / 29sectional curvatureAll aﬃne spheres are statistical manifolds of constant sectionalcurvatureA Riemann curvaturelike tensor deﬁnes the curvature operator. Forinstance, for the curvature tensor R = (R + R)/2 we have the curvatureoperator R : Λ2 TM → Λ2 TM given byg (R(X ∧ Y ), Z ∧ W ) = g (R(Z , W )Y , X )A curvature operator is symmetric relative to the canonical extension of gto the bundle Λ2 TM. Hence it is diagonalizable. In particular, it can bepositive deﬁnite, negative deﬁnite etc.The assumption that R is positive deﬁnite is stronger than the assumptionthat the sectional curvature is positive.Barbara Opozda ()Curvatures of statistical structuresParis, October 201525 / 29TheoremLet M be a connected compact oriented manifold and (g , ) be atracefree statistical structure on M. If R = R and the curvature operatorˆdetermined by the curvature tensor R is positive deﬁnite on M then thesectional curvature is constant.TheoremLet M be a connected compact oriented manifold and (g , ) be atracefree statistical structure on M. If the curvature operator forR = R+R is positive on M then the Betti numbers2b1 (M) = ... = bn−1 (M) = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201526 / 29sectional curvature for gˆˆk(π) = g (R(X , Y )Y , X ), X , Y – an orthonormal basis for πTheoremLet M be a compact manifold equipped with a tracefree statisticalˆstructure (g , ) such that R = R. If the sectional curvature k for g ispositive on M then the structure is trivial, that is = ˆ .In the 2dimensional case we haveTheoremLet M be a compact surface equipped with a tracefree statistical structure(g , ). If M is of genus 0 and R = R then the structure is trivial.Barbara Opozda ()Curvatures of statistical structuresParis, October 201527 / 29B. Opozda, Bochner’s technique for statistical manifolds, Annals ofGlobal Analysis and Geometry, DOI 10.1007/s104550159475zB. Opozda, A sectional curvature for statistical structures,arXiv:1504.01279[math.DG]Barbara Opozda ()Curvatures of statistical structuresParis, October 201528 / 29Hessian structuresˆ(g , ) – Hessian if R = 0. Then R = 0 and R = −[K , K ].ˆˆ K is symmetric and R = −[K , K ].(g , ) is Hessian if and only ifAll Hessian structure are locally realizable on aﬃne hypersurfaces in Rn+1equipped with Calabi’s structure. If they are tracefree they are locallyrealizable on improper aﬃne spheres.If the diﬀerence tensor is as in Example 1. and the structure is Hessianthen K = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201529 / 29
We show that Hessian manifolds of dimensions 4 and above must have vanishing Pontryagin forms. This gives a topological obstruction to the existence of Hessian metrics. We find an additional explicit curvature identity for Hessian 4manifolds. By contrast, we show that all analytic Riemannian 2manifolds are Hessian.

The Pontryagin Forms of Hessian ManifoldsJ. ArmstrongS.AmariOctober 27, 2015SummaryQuestionGiven a Riemannian metric g , under what circumstances is itlocally a Hessian metric?QuestionWhen can we locally ﬁnd a function f and coordinates x such thatgij = ∂i ∂j f ?Answer (Partial)In dimension 2 all analytic metrics g are Hessian. In dimensions 3the general metric is not Hessian. In dimensions 4 there are evenrestrictions on the curvature tensor of g — in particular thePontrjagin forms vanish.Solving unusual partial diﬀerential equationsQuestionGiven a symmetric g , when can we locally ﬁnd a function f andcoordinates x such that gij = (∂i f )(∂j f )?AnswerOnly if g lies in the n dimensional subspace Im φ ⊂ S 2 T whereφ : T → S 2Tby φ(x) = xx.Sometimes we can’t ﬁnd a solution even at a point.QuestionGiven a one form η, when can we locally ﬁnd a function f suchthat df = η.AnswerSince ddf = 0 we must have dη = 0 at x. Sometimes we can ﬁnda solution at a point, but can’t extend it even to ﬁrst order aroundx.GeneralizingLet E and F be vector bundles and let D : Γ(E ) → Γ(F ) be adiﬀerential operator.D : Jk (E ) → F where Jk is the bundle of k jets.Deﬁne D1 : Jk+1 (E ) → J1 (F ) to be the ﬁrst prolongation.This is the operator which maps a section e to the one jet ofj1 (De).Deﬁne Di : Jk+i (E ) → Ji (F ) to be the ith prolongatione → ji (e)We can only hope to solve the diﬀerential equation De = f if wecan ﬁnd an algebraic solution to every equationDi e = ji (f )at the point x.Applying the fact that derivatives commute may yield obstructionsto the existence of solutions to a diﬀerential equation even locally.Dimension countingThe dimension of the space of kjets of 1 functions of n realvariables is:k+2kdim(S i T ) =dim Jk :=i=0i=0n+i −1.iThe reason for this is that derivatives commute. Note thisfact is also encoded in the statement ddf = 0.The counting argumentWe wish to solve∂ ∂f = gij .∂xi ∂xjwhich is a second order equation for f and coords x. So inputis n + 1 functions of n variables.Dimension of space of (k + 2) jets of f and xk+21dk = dim Jk+2 (x, f ) =(n + 1)i=0n+i −1.iDimension of space of k jets of g :k2dk = dim Jk (g ) =i=0n(n + 1) n + i − 1.2i12If n > 2 dk grows more slowly than dk . So most metrics arenot Hessian metrics.Informal versionA Riemannian metric depends onvariables.n(n+1)2functions of nA Hessian metric depends on n + 1 functions of n variables.“Therefore” if n > 2 there are more Riemannian metrics thanHessian metrics.Note: this computation is suggestive but slightly wrongbecause we’ve ignored the diﬀeomorphism group. It wouldsuggest that in dimension 1 there are more Hessian metricsthan Riemannian metrics!CurvatureReminder:Hessian metrics locally correspond to g dually ﬂat structures,and vice versa.∗is ﬂat.g dually ﬂat means is ﬂat and it’s dual w.r.t. gg(ZX,Y )= g (X ,∗Z Y ).PropositionLet (M, g ) be a Riemannian manifold. Letdenote theLevi–Civita connection and let = + A be a g dually ﬂatconnection. Then(i) The tensor Aijk lies in S 3 T ∗ . We shall call it the S 3 tensor of.(ii) The S 3 tensor determines the Riemann curvature tensor asfollows:Rijkl = −g ab Aika Ajlb + g ab Aila Ajkb .Proofis torsion free implies A ∈ S 2 T ∗ ⊗ TUsing metric to identify T ∗ andT , bothfree implies A ∈ S 3 T ∗R = 0. But by deﬁnition:R XY Z =XYZ−YXand−∗are torsion[X ,Y ] ZExpanding in terms of Levi–Civita:R XY Z = RXY Z + 2([X A)Y ] Z+ 2A[X AY ] ZCurvature symmetries tell us (using g to identify T and T ∗ ):R ∈ Λ2 T ⊗ Λ2 TOn the other hand:([· A)·]∈ Λ2 T ⊗ S 2 TProjecting the equation onto Λ2 T ⊗ Λ2 T gives the desiredresult.Curvature obstructionDeﬁne a quadratic equivariant map ρ fromS 3 T ∗ −→ Λ2 T ∗ ⊗ Λ2 T ∗ by:ρ(Aijk ) = −g ab Aika Ajlb + g ab Aila AjkbIf g is a Hessian metric R lies in image of ρ.CorollaryIn dimension 5, ρ is not onto. Therefore there conditionR ∈ Im ρ is an obstruction to a metric being a Hessian metric.Proof.dim R = dim(Space of algebraic curvature tensors) =1dim(S 3 T ) = n(1 + n)(2 + n)6The former is strictly greater than the latter if n51 2 2n (n − 1)12Dimension 4Numerical observation: ρ is not onto in dimension 4 even thoughdim R = dim(S 3 T ∗ ) = 20.Proof.Pick a random A ∈ S 3 T ∗ and compute rank of (ρ∗)A , thediﬀerential of ρ at A. It is 18 whereas the space of algebraiccurvature tensors is 20 dimensional. (Proof with probability 1)QuestionWhat are the conditions on the curvature tensor for it to lie in theimage of ρ?What does this question mean?This is an implicitization question. Im ρ is given parametricallyby the map ρ. We want implicit equations on the curvaturetensor that deﬁne Im ρ.This is a real algebraic geometry question and so we shouldexpect inequalities for our implicit equations. (e.g.Im x 2 = {y : y 0})Complexify the vector spaces to get a complex algebraicgeometry where we expect equalities for our implicitequations. This is how we choose to interpret the question.Gr¨bner basis algorithms allow us to solve the latter problemoin principle (for ﬁxed n) but not in practice (doublyexponential time is common).Algorithms do exist for the real algebraic geometry problemtoo, but they’re even less practical.StrategySpace of algebraic curvature tensors R is associated to arepresentation of SO(n).Decompose R into irreducible components under SO(n)Any invariant linear condition on R can be expressed as alinear combination of these irreducibles.Decompose S 2 R ⊕ R into irreducibles. Any invariantquadratic condition on R can be expressed as a linearcombination of these irreducibles. etc.If we have m irreducible components ρ1 (R), ρ2 (R), . . . ,ρm (R). Choose m + 1 random tensors A and solve theequationαi ρi (R) = 0ifor αi . (In fact we only need to check linear combinationsover isomorphic components)This is feasible in dimension 4. Representation theory ofSU(2) × SU(2) is simple. is simpleHessian curvature tensors in dimension 4TheoremThe space of possible curvature tensors for a Hessian 4manifold is18 dimensional. In particular the curvature tensor must satisfy theidentities:α(Rija b Rklb a ) = 0α(Riajb Rk bcd Rl dac − 2Riajb Rkc ad Rl dbc ) = 0where α denotes antisymmetrization of the i, j, k and l indices.Proof.Using a symbolic algebra package, write the general tensor inS 3 T ∗ with respect to an orthonormal basis in terms of its 20components. Compute the curvature tensor using ρ. One can thendirectly check the above identities.Both expressions deﬁne 4forms on a general Riemannianmanifold. The ﬁrst is a wellknown 4form. It deﬁnes the ﬁrstPontrjagin class of the manifold.Pontrjagin formsThe Gauss–Bonnet formula gives an important link betweencurvature and topology. In this case the integral of scalarcurvature is related to the Euler class.The theory of characteristic classes generalizes this.To a complex vector bundle V over a manifold M one canassociate topological invariants, the Chern classesci (V ) ∈ H 2i (M).The Pontrjagin classes of a real vector bundle V R are deﬁnedto be the Chern classes of the complexiﬁcationpi (V R ) ∈ H 4i (M).The Pontrjagin classes of a manifold are deﬁned to be thePontrjagin classes of its tangent bundle.It is possible to ﬁnd explicit representatives for the De Rhamcohomology classes of a bundle by computing appropriatepolynomial expressions if a curvature tensor for the bundle.We call these explicit representatives Pontrjagin forms.Relationship between Pontrjagin forms and curvatureTheoremFor each p, the form Qp (R) deﬁned by:Qip i2 ...i2p =1sgn(σ)Riσ(1) iσ(2) a1 a2 Riσ(3) iσ(4) a2 a3 Riσ(5) iσ(6) a3 a4 . . . Riσ(2p−1) iσ(2p) ap a1σ∈S2pis closed. The Pontrjagin forms can all be written as algebraicexpressions in these Qp (R) using the ring structure of Λ∗ andviceversa.This is a standard result from the theory of characteristic classes.Main resultTheoremThe forms Qp (R) vanish on Hessian manifolds, hence thePontrjagin forms vanish on Hessian manifolds.CorollaryIf a manifold M admits a metric that is everywhere locally Hessianthen its Pontrjagin classes all vanish.Note that we’re being clear to distinguish this from the case of amanifold which is globally dually ﬂat, where the vanishing of thePontrjagin classes is a trivially corollary of the existence of ﬂatconnections.Graphical notationρ(Aijk ) = −g ab Aika Ajlb + g ab Aila AjkbijRijkl = −ij+k.lklTrivalent graphEach vertex represents the tensor AConnecting vertices represents contraction with the metricPicture naturally incorporates symmetries of Aiσ(1)iσ(2)− sgn(σ)Ri1 i2 ab =σ∈S2.abProofiσ(1)iσ(2)− sgn(σ)Ri1 i2 ab =σ∈S2.abBy deﬁnition:Qip i2 ...i2p =1sgn(σ)Riσ(1) iσ(2) a1 a2 Riσ(3) iσ(4) a2 a3 Riσ(5) iσ(6) a3 a4 . . . Riσ(2p−1) iσ(2p) ap a1σ∈S2pWe can replace each R with an H:Qip i2 ...i2p =1(−1)psgn(σ)iσ(1)iσ(2)iσ(3)iσ(4)iσ(5)iσ(6)iσ(2p−1) iσ(2p)...σ∈S2pSince the cycle 1 → 2 → 3 . . . → 2p → 1 is an odd permutation,one sees that Q p = 0.SummaryIn dimension 2 all metrics are locally Hessian (UseCartan–K¨hler theory. Proved independently by RobertaBryant)In dimensions3 not all metrics are locally HessianIn dimensions4 there are conditions on the curvatureIn dimension 4 we have identiﬁed two conditions explicitly.These are necessary conditions and, working over the complexnumbers, they characterize Im ρ.In dimension n 4 we have identiﬁed a number of explicitcurvature conditions in terms of the Pontrjagin forms.Dimension counting tells us that other curvature conditionsexist, but we do not know them explicitly.
Based on the theory of compact normal leftsymmetric algebra (clan), we realize every homogeneous cone as a set of positive definite real symmetric matrices, where homogeneous Hessian metrics as well as a transitive group action on the cone are described efficiently.

Matrix realization of a homogeneous coneHideyuki ISHI(Nagoya University)1§1. Introduction§2. Matrix realization and leftsymmetric algebra§3. Homogeneous Hessian metrics2V : real vector spaceΩ : regular open convex cone in V , that is,• V ⊃ Ω : open subset,• x ∈ Ω, c > 0 ⇒ cx ∈ Ω,• x, y ∈ Ω, 0 ≤ t ≤ 1 ⇒ (1 − t)x + ty ∈ Ω,• Ω ∩ (−Ω) = {0}.Ω : homogeneous coneif ∃G: Lie group acting on Ω transitively as linear transforms3Example 1.V = Sym(n, R)Ω = Pn := { X ∈ Sym(n, R)  X is positive deﬁnite }G = GL(n, R) acts on Pn transitively byρ(A)X := AX tA (A ∈ GL(n, R), X ∈ Pn).{Hn := T ∈ GL(n, R)  Tij = 0 (i < j), Tii > 0 (i = 1, . . . , n)Hn acts on Pn simply transitively by ρbecause of the Cholesky decomposition:∀X ∈ Pn ∃1T ∈ Hn s.t. X = T tT .4}Example 2.V := { X ∈ Sym(3, R)  X12 = X21 = 0}x1 0 x4= X = 0 x2 x5  x1, . . . , x5 ∈ Rx4 x5 x3Ω := V ∩ P3t1 0 0H := T = 0 t2 0  t1, t2, t3 > 0, t4, t5 ∈ R ⊂ H3t4 t5 t3Then H acts on Ω simply transitively by ρ.5Example 3.n ≥ 3V := X = x1...x3. . .  x , . . . , xn ∈ Rxn 1x2}> 0, x1x2 − x2 − · · · − x2 > 0n3x1x3 {. . xn.Ω := Z ∩ Pn−1 = X  x1t1...H := T =  t1, t2 > 0, t3, . . . , tn ∈ Rt1t3 . . . t n t2H acts on Ω simply transitively.6The cone Ω ⊂ Pn−1 is linearly isomorphic to the circular{}coney ∈ Rn  y1 >y1 − y2√22y2 + · · · + ynin Rny3......because ∈Ωy1 − y2yn y3...yny1 + y2√22iﬀ y1 > y2 + · · · + yn .Roughly speaking, every homogeneous cone is realizedsimilarly.7§2. Matrix realization and leftsymmetric algebra{Put hn := T ∈ Mat(n, R)  Tij = 0 (i < j)For X ∈ Sym(n, R), deﬁne X ∈ hn by}= Lie(Hn).∨Xij(X )ij := Xii/2∨(i > j)(i = j)0(i < j)Then X = X + t(X ).∨∨For X, Y ∈ Sym(n, R), deﬁneX△Y := X Y + Y t(X ) ∈ Sym(N, R).∨∨Then △ gives a bilinear product on the vector spaceSym(n, R), encoding the action ρ of Hn on Pn.8Main Theorem. (i) Let Z be a subspace of Sym(n, R)such that Z△Z ⊂ Z and En ∈ Z. Then PZ { Z ∩ Pn is:=}a homogeneous cone. The set HZ := Hn ∩ X  X ∈ Z∨forms a subgroup of Hn and acts simply transitively onPZ .(ii) Every homogeneous cone is linearly isomorphic tosuch PZ .Examples 2 and 3 are special cases. x1 0 x4(Recall Z = 0 x2 x5  x1, . . . , x5 ∈ R xx5 x34in Example 2).9The algebra (Sym(n, R), △) has the following properties:(C1) X△(Y △Z)−(X△Y )△Z = Y △(X△Z)−(Y △X)△Zfor all X, Y, Z(leftsymmetry )(C2) there exists a linear form ξ such that ξ(X△X) > 0for all nonzero X (compactness)(C3) For each X, the leftmultiplication operator LX :Y → X△Y has only real eigenvalues (normality )(C4) En△X = X△En = X for all X (∃unit element).An Ralgebra (V, △) satisfying (C1) is called a leftsymmetric algebra (or KoszulVinberg algebra), whilea leftsymmetric algebra satisfying (C2) and (C3) iscalled a clan (a compact normal leftsymmetric algebra).10Vinberg obtained a onetoone correspondence betweena homogeneous cone and a clan with unit element upto natural isomorphisms.Theorem. Every clan with a unit element is isomorphic to a subalgebra of (Sym, △).Theorem. A subalgebra Z of (Sym, △) with En ∈ Z admits a speciﬁc block decomposition after an appropriatepermutation of rows and columns (see Proceedings).11§3. Homogeneous Hessian metricsFor X ∈ Sym(n, R) and k = 1, . . . , n,let X [k] := (Xij )1≤i,j≤k ∈ Sym(k, R).For X ∈ Pn and s = (s1, . . . , sn) ∈ Rn , deﬁne>0∏n[k] )sk −sk+1 , where s∆s(X) := k=1(det Xn+1 := 0.∏If X = T tT with T ∈ Hn, then ∆s(X) = n (Tkk )2skk=1Therefore,∏∆s(ρ(T )Y ) = ( n (Tkk )2sk )∆s(Y ) (Y ∈ Pn).k=1Let gs be the Hessian metric on Pn whose potentialis − log ∆s(Y ). Then gs is Hninvariant.For X ∈ Pn and A, B ∈ TX Pn ≡ Sym(n, R), we have()∑ngs(A, B)X := k=1(sk −sk+1)Tr A[k](X [k])−1B [k](X [k])−1 .12A Hessian metric g on a domain D ⊂ Rn is said to behomogeneous if ∃G : Lie group acting on D transitivelyas aﬃne isometries.Clearly the Hessian metric gs on Pn is homogeneous.Moreover, Every homogeneous Hessian metric on Pnis equivalent to some gs. Namely, for a homogeneousHessian metric g on Pn, there exists a linear transformf : Pn → Pn such that g = f ∗gs.13Theorem. Let Z be a subalgebra of (Sym(n, R), △) withEn ∈ Z. Then every homogeneous Hessian metric onthe homogeneous cone PZ is equivalent to the restriction gsPZ with some s ∈ Rn .>0This parametrization is redundant because diﬀerent smay give the same metric on PZ .See Proceedings for a precise parametrization.14
In this article, we derive an inequality satisfied by the squared norm of the imbedding curvature tensor of Multiply CRwarped product statistical submanifolds N of holomorphic statistical space forms M. Furthermore, we prove that under certain geometric conditions, N and M become Einstein.

Multiply CR Warped Product StatisticalSubmanifolds of a HolomorphicStatistical Space FromByPROF. MOHAMMAD HASAN SHAHIDDepartment of MathematicsFaculty of Natural SciencesJamia Millia Islamia(Central University)New Delhi, India(with Prof. Michal Boyom and M. Jamali)CRsubmanifold and CRwarped ProductSubmanifoldLet B and F be two Riemannian Manifolds with Riemannian metric gBand g F , respectively , and f a positive differentiable function on B. Thewarped product manifold B ´ F equipped with the Riemannian metricg = gB + f 2gFThe function f is called the warping function. It is well known that thenotion of warped product plays some important roles id differentialgeometry as well as in physics.Let M be a Kaehler manifold with complex structure J and N aRiemannian manifold isometrically immersed in M . For eachxÎ N , we denote by D x the maximal holomorphic subspace ofthe tangent space T x N of N. If the dimension of D x is the samefor all xÎN , the space D x define a holomorphic distribution Don N, which is called the holomorphic distribution of N. Asubmanifold N is a Kaehler manifold M is called aCRsubmanifold if there exists a holomorphic distribution D on Nwhose orthogonal complement D^ is totally real distribution,^^i.e., JD Ì T N . A CRsubmanifold is called a totally realsubmanifold if dim D x =0.Statistical manifolds introduced, in 1985, by Amari have beenstudied in term of information geometry. Since the geometry ofsuch manifolds includes the notion of dual connections, alsocalled conjugate connection in affine geometry, it is closelyrelated to the affine differential geometry. Further, a statisticalstructure being a generalization of a Hessian geometry.Let ( M , g ) be Riemannian manifold and M a submanifold of M .If (M,Ñ, g) is a statistical manifold, then we call (M,Ñ, g) astatistical submanifold of ( M , g ) , where Ñ is an affineconnection on M and g is the metric tensor on M induced fromthe Riemannian metric g on M . Let Ñ be an affine connectionon M . If ( M , g , Ñ ) is astatistical manifold and M a submanifold ofM , then (M,Ñ, g) is also a statistical manifold by inducedconnection Ñ and metric g.In the case ( M , g ) is a semiRiemannian manifold, theinduced metric connection g has to be nondegenerated.In the geometry of submanifolds, Gauss formula, Weingartenformula and the equation of Gauss, Codazzi and Ricci areknow as fundamental equations. Corresponding fundamentalequations on statistical submanifolds were obtained .Let M be an ndimensional submanifold of M . Then, for anyX , Y Î G(TM ) , Gauss formula isÑ X Y = Ñ X Y + h( X , Y )*XÑ Y = Ñ* Y + h* ( X , Y )XWhere h and h * are symmetric and bilinear, called theimbedding curvature tensor of M in M for Ñ and the*imbedding curvature tensor of M in M for Ñ , respectively.it is also proved that (Ñ, g ) and (Ñ* , g) are dual statisticalstructure on M, where g is induced metric on G(TM) from theRiemannian metric g on M .Let us denote the normal bundle on M by G(TM^ ) . Since hand h * are bilinear, we have the linear transformation A xand A x* defined byg ( Ax X , Y ) = g ( h ( X , Y ), x )g ( Ax* X , Y ) = g ( h * ( X , Y ), x )Definition. Let N1, N2 ,....,Nk be Riemannian manifold of thedimensions n1, n2 ,....,nk respectively and let N = N1, N2 ,....,Nk bethe Cartesian product of N1, N2 ,....,Nk . For each a, denote bypa : N®Na the canonical projection N and Na . We denote theshorizontal lift of Na in N via p a by Na itself. If s2,....., k : N1 ®R+ arepositive valued functions, thenk(2.1)g( X ,Y ) = p1* X ,p1*Y + å(s a o p1)2 p a* X ,p a*Ya=1define a metric g on N. The product manifold N endowed withthis metric is denoted by N1 ´s 2 N2 ´.....´sk Nk . This product manifoldN is known as multiply warped product manifold.Definition. If N1, N2 ,....,Nk be k statistical manifolds, thenN= N1 ´s 2 N2 ´.....´sk Nk is again a statistical manifold with metricgiven by equation (2.1). This manifold N is called multiplywarped product statistical manifold.Now let us denote the part s2 N2 ´...´sk Nk by N^ and N1 by NT .Then N can be represented as N = NT ´N^. We denote byX,Y....ÎG(M) as the vector field on M and X, Y…. the inducedvector field on N.Definition. A multiply warped product statistical submanifoldN = NT ´N^ in an almost complex manifold M is called a multiplyCRwarped product statistical submanifold if NT is an invariantsubmanifold and N^ is an antiinvariant submanifold of M.We denote by, m ≥ 1 the Euclidean 2m space with thestandard metric. Then the canonical complex structure ofis defined byJ (x1, y1,..., xm , ym ) = ( y1, x1,..., ym , xm )Example. Consider inthe submanifold is given by theequations [B. Sahin, Geom. Dedicata 2006](*)From (*) one can obtain that TM is spanned bywhere,Using (*) one gets thatis invariant with respectto J. Moreover,are orthogonal to TM. Hence,is antiinvariant with respect to J. Thus M is aCRsubmanifold of . Furthermore, we can derive thatand are integralable.Denoting the integral manifold of D andbyrespectively, then the induced metric tensor isThus M is a CRwarped product submanifold ofwarping function.,with• A. Bejancu, CR submanifold of a Kaehler a manifold I, Proc.Amer. Math. Soc. 69 (1978), 135142.• A. Bejancu, CRsubmanifold of a Kaehler manifold II, Trans.Amer. Math. Soc. 69 (1979), 333345.• Chen BY (1981) CRsubmanifolds in Kaehler manifolds. I. JDiff Geometry 16: 305322; CRsubmanifolds in Kaehlermanifolds. II. Ibid 16: 493509.• Chen BY (2001) Geometry of warped product CRsubmanifolds in Kaehler manifolds I. Monatsh Math 133:177195; Geometry of warped product CRsubmanifolds inKaehler manifolds. II. Ibid 134: 103119.• S . Amari, DifferentialGeometrical methods in Statistics,SpringerVerlag, 1985.• Yano K. and Kon, M.: CRsubmanifolds of Kaehlerian andSasakian Manifolds, Birkhauser, Basel, 1983.From the decomposition of TN = D Å D^ and T ^ N = JD ^ Å lwe may writeh ( X , Y ) = h JD ^ ( X , Y ) + h l ( x , y )Also for multiply CR warped product statistical submanifolds N of astatistical manifold [L. Tod., Diff. Geom. – Dynamical system ,2006]z =kå ( X (log sa=2a)) z a and Ñ * Z =Xkå ( X (loga=2s a )) Zfor any vector fields X Î D and Z ÎD^ , where Zdenotes the N a component of Z .aa(4)Lemma 1. Letbe a multiply CRwarped product statistical submanifold of a holomorphicstatistical space form M. then we havek(i) hJD ( JX , Y ) = å ( X (logs a )) JZ a + JPz JX^a =2(ii)(iii)g ( PZ JX , W ) = g (Q Z JX , JW )g (h( JX , Z ), Jh( X , Z )) = hl ( Z , X ) + g (QZ X , Jhl ( X , Z ))2For any vector field X in D and Z , W in Ddenotes the N a  component of Z.^Za, whereProof. From Gauss formulawe can writeÑ Z JX + h( JX , Y ) = PZ X + QZ X + JÑ Z X + Jh( X , Z )kh( JX , Z ) = PZ X + QZ X + J (å ( X (log s a ) Z a ) + Jh ( Z , X )a =2k å ( JX (log s a )) JZ a(5)a=2where P and Q denotes the tangential and normal projection.Comparing the tangential part in the above equation and thentaking inner product with W Î D ^ , we getkhJD^ (JX, Z) = å( X (logsa ))Z + JP JX, "X ÎD, Z ÎDZa=2a^Now comparing normal parts of (5) and taking inner productwith JW for WÎD^kg (hJD^ ( JX , Z ), JW ) g (QZ X , JW ) + å ( X (logs a ) g ( JZ a , JW ))a =2Using part(i) of the lemma 1 we arrive atg ( PZ JX ,W ) = g (QZ X , JW ).Comparing normal part of h(JX, Z)  Jh (Z, X) = QZ X +lon both the sides and taking inner product withfindk(X(log a )JZaå sa=2weTheorem 2. Letbe multiplyCRwarped product statistical submanifold of holomorphicstatistical space form M with P ^ DÎD , then the square norm ofDimbedding curvature tensor of N in M satisfies the followinginequalities :Proof. Let { X 1 , X 2 ,...., X p , X p +1 = JX 1 ,..., JX 2 p = JX p } be localorthonormal frame of vector field frame of the vector field on N Tand {Z1, Z2 ,...,Zq}be such that Z D a is a basis for some N a ,a= 2,…..,k whereD 2 = {1, 2,..., n 2 } ,…., D k = {n2 + n3 + .... + nk 1 + 1,..., n1 + n2 + ... + nk }andn 2 + n 3 + ..... + n k = qThe above equation impliesNow using part (i) of the Lemma 1 we getIn the view of the above assumption PD D Î D , the above inequalitytakes the form^By the CuachySchwartz inequality the above equationbecomesThereforeTheorem 3. Letbe a compactorientable multiply CR warped product statistical submanifoldwithout boundary of holomorphic statistical space form M ofconstant curvature k. If PD D Î D andThen^And the equality holds if and if^Proof. Let XÎD , Z ÎD , then form holomorphic statisticalspace form of constant curvature k, we haveWhich implies.On the other hand from Codazzi equation, we may write(7)(8)Now, we calculate each term of (8) as(10)Similarly we replace X By JX in the last equation, we get(11)(12)(13)(14).kR( X , JX , Z , JZ ) = å [{X ( X log s a ) + ( X log s a ) 2 }g ( Z a , Z a )]a=2k*X g(h(JX, Z ),Ñ JZ) + å[{JX(JX logs a ) + (JX logs a )2}g(Z a , Z a )]a=2*JXk g(h(JX, Z),Ñ JZ)  å(ÑX X logsa )g(Z a , Z a )a=2k å{(ÑJX JX logs a ) g (Z a , Z a )}a =2(16)Combining (7) and (16) and taking summation over the rangefrom 1 to p, we havekpk()å Z4 a=2kåa = 22agrad=kåa=2DD (log s a ) Z(logsa)a22p^+ å [ g (h( Jei , Z ), Ñ ei * ) JZ  g (h(ei , Z ), Ñ ^*i JZ )]Jei =1(18)Integrating both the sides, Green’s and the hypothesis leads tokk= 4å Za 2a=2ò { gradpå Za =2på Z aSincea=2k2D£0Nkk(log s a ) }dv22a 2ò dvNò dv > 0Nå2 Z ò { grad D (log s a ) }dv ³ 0Anda=N2{ grad D (log s a ) }dv = 0Further the equality holds if and only if òa2NWhich implies that the equality holds ifproves the theorem.. ThisTheorem 4. Letbe acompact orientable antiinvariant multiply warped productstatistical submanifold without boundary of holomorphicstatistical space form M of constant curvature k. If PD D Î Dand AÑ^* JZJZ = AÑJX*JX X , then^XR( X ,Y , X ,Y ) ³ g (H , H * )and the equality holds if and only ifgradD (logs a ) = 0Proof. From the previous theorem we have. Since Nis antiinvariant , we have N T = 0 and N = N ^ .This implies that N becomes completely totally umbilicalsubmanifold of M. Furthermore, from the expression of theambient curvature we have, for two orthonormal vector X , Y Î TNkThen.R ( X ,Y , X ,Y ) = 4Furthermore, from Gauss equation and totally umbilicity of N, weobtainkR( X ,Y, X ,Y) = ( + g(H, H*))4R( X , Y , X , Y ) ³ g ( H , H * )and the equality holds ifTheorem 5. Letbe acompact orientable antiinvariant multiply warped productstatistical submanifold without boundary of holomorphicstatistical space form M of constant curvature k. If P D D Î Dand AÑ^*JZ JX = AÑ^ * JZ X , then M is Einstein and N is Einstein ifXJXand only if^k+ g (H , H4*) is constant.Proof. The proof is straight from the last theorem and theGauss equation which combinely givekRic(Y , Z ) = (n  1){ + g ( H , H * )}g (Y , Z )4References:[1]. S. AMARI, “Differential Geometric methods in statistics”, SpringerVerlag,1985.[2]. S. AMARI and H. NAGAOKA, “Methods of Information Geometry”, Transl.Math. Monogr., Vol191, Amer. Math. Soc., 2000.[3]. M. E. AYDIN, A. MIHAI and I. MIHAI, “Some inequalities on submanifolds instatistical manifolds of constant curvature”, Filomat (To appear).[4]. R.L. BISHOP and B. O’NEILL, “Manifolds of negative curvature”, Trans. ofAmer. Math. Soc., Vol145(1969), 149.[5]. B. Y. CHEN, “Geometry of warped product CRsubmanifolds in Kaehlermanifold”, Monatsh. Math., 133(2001), 177195.[6]. B. Y. CHEN, “Geometry of warped product CRsubmanifolds in Kaehlermanifold II”, Monatsh. Math., 134(2001), 103119.[7]. B. Y. CHEN and FRANKI DILLEN, “Optimal inequalities fir multiply warpedproduct submanifolds”, Int. Elect. J. of Geometry, Vol1 (2008), No1, 111.[8]. H. FURUHATA, “Hypersurfaces in statistical manifolds”, Diff. Geom. Appl., 27,(2009), 420429.[9]. L. Todgihounde, “Dualistic structures on warped product manifolds”,Differential GeometryDynamical Systems, Vol8 (2006), 278284.[10]. P. W. VOS, “Fundamental equations for statistical submanifolds withapplications to the Bartlett connection”, Ann. Inst. Statist. Math., 41(3) (1989),429450.
Topological forms and Information (chaired by Daniel Bennequin, Pierre Baudot)
In this lecture we will present joint work with Ryan Thorngren on thermodynamic semirings and entropy operads, with Nicolas Tedeschi on Birkhoff factorization in thermodynamic semirings, ongoing work with Marcus Bintz on tropicalization of Feynman graph hypersurfaces and Potts model hypersurfaces, and their thermodynamic deformations, and ongoing work by the author on applications of thermodynamic semirings to models of morphology and syntax in Computational Linguistics.

Information Algebras and their ApplicationsMatilde MarcolliGeometric Science of Information, Paris, October 2015Matilde MarcolliInformation AlgebrasBased on:M. Marcolli, R. Thorngren, Thermodynamic semirings, J.Noncommut. Geom. 8 (2014), no. 2, 337–392M. Marcolli, N. Tedeschi, Entropy algebras and Birkhoﬀfactorization, J. Geom. Phys. 97 (2015) 243–265Matilde MarcolliInformation AlgebrasMinPlus Algebra (Tropical Semiring)minplus (or tropical) semiring T = R ∪ {∞}• operations ⊕ andx ⊕ y = min{x, y }xy =x +y• operations ⊕ andwith identity ∞with identity 0satisfy:associativitycommutativityleft/right identitydistributivity of productMatilde Marcolliover sum ⊕Information AlgebrasThermodynamic semiringsTβ,S = (R ∪ {∞}, ⊕β,S , )• deformation of the tropical addition ⊕β,Sx ⊕β,S y = min{px + (1 − p)y −p1S(p)}ββ thermodynamic inverse temperature parameterS(p) = S(p, 1 − p) binary information measure, p ∈ [0, 1]• for β → ∞ (zero temperature) recovers unperturbed idempotentaddition ⊕• multiplication= + is undeformed• for S = Shannon entropy considered ﬁrst in relation toF1 geometry inA. Connes, C. Consani, From monoids to hyperstructures: insearch of an absolute arithmetic, arXiv:1006.4810Matilde MarcolliInformation AlgebrasKhinchin axiomsSh(p) = −C (p log p + (1 − p) log(1 − p))• Axiomatic characterization of Shannon entropy S(p) = Sh(p)1symmetry S(p) = S(1 − p)2minima S(0) = S(1) = 03extensivityS(pq) + (1 − pq)S(p(1 − q)/(1 − pq)) = S(p) + pS(q)• correspond to algebraic properties of semiring Tβ,S1commutativity of ⊕β,S2left and right identity for ⊕β,S3associativity of ⊕β,S⇒ Tβ,S commutative, unital, associative iﬀ S(p) = Sh(p)Matilde MarcolliInformation AlgebrasKhinchin axioms nary formGiven S as above, deﬁne Sn : ∆n−1 → RSn (p1 , . . . , pn ) =(1 −1 j n−10bypi )S(1 i
We show that the entropy function–and hence the finite 1logarithm–behaves a lot like certain derivations. We recall its cohomological interpretation as a 2cocycle and also deduce 2ncocycles for any n. Finally, we give some identities for finite multiple polylogarithms together with number theoretic applications.

Finite polylogarithms, their multiple analogues andthe Shannon entropyGeometric Sciences of Information 2015Session “Topological Forms and Information”École Polytechnique (France), 28 October 2015Philippe ElbazVincent(Université Grenoble Alpes) & HerbertGangl (Durham University)Content of this talkInformation theory, Entropy and Polylogarithms (review of pastworks),Algebraic interpretation of the entropy function,Cohomological interpretation of formal entropy functions,Finite multiple polylogarithms, applications and open problems.2 / 13Information theory, Entropy and Polylogarithms (1/4)The Shannon entropy can be characterised in the framework ofinformation theory, assuming that the propagation of informationfollows a Markovian model (Shannon, 1948).If H is the Shannon entropy, it fulﬁlls the equation, often called theFundamental Equation of Information Theory (FEITH)H(x) + (1 − x)Hy1−x− H(y ) − (1 − y )Hx1−y= 0.(FEITH)It is known (Aczel and Dhombres, 1989), that if g is a real functionlocally integrable on ]0, 1[ and if, moreover, g fulﬁlls FEITH, thenthere exists c ∈ R such that g = cH (we can also restrict thehypothesis to Lebesgue measurable).3 / 13Information theory, Entropy and Polylogarithms (2/4)It turns out that FEITH can be derived, in a precise formal sense(ElbazVincent and Gangl, 2002), from the 5term equation of theclassical (or padic) dilogarithm.Cathelineau (1996) found that an appropriate derivative of theBloch–Wigner dilogarithm coincides with the classical entropyfunction, and that the ﬁve term relation satisﬁed by the formerimplies the four term relation of the latter.znz < 1, theMore precisely, we deﬁne Lim (z) = ∞ nm ,n=1mlogarithm. We setD2 (z) = i Im Li2 (z) + log(1 − z) log z ,Then D2 satisﬁes the following 5term equationD2 (a) − D2 (b) + D2ba− D21−b1−a+ D21 − b−11 − a−1= 0,whenever such an expression makes sense. The relation is thefamous ﬁve term equation for the dilogarithm (ﬁrst stated by Abel).4 / 13Information theory, Entropy and Polylogarithms (3/4)It can be shown formarly (see Cathelineau, ElbazVincent andGangl) that FEITH is an inﬁnitesimal version of this 5termequation.Kontsevich (1995) discovered that the truncated ﬁnite logarithmover a ﬁnite ﬁeld Fp , with p prime, deﬁned byp−1£1 (x) =k=1xk,ksatisﬁes FEITH.In our previous work, we showed how one can expand thisrelationship for “higher analogues" in order to produce and provesimilar functional identities for ﬁnite polylogarithms from those forclassical polylogarithms (using mod p reduction of padicpolylogarithms and their inﬁnitesimal version). It was also shownthat functional equations for ﬁnite polylogarithms often hold evenas polynomial identities over ﬁnite ﬁelds.5 / 13Information theory, Entropy and Polylogarithms (4/4) Entropy and FEITH arise from the inﬁnitesimal picture (forboth archimedean and nonarchimedean structure) and their ﬁniteanalogs associated to the dilogarithm. Does their exist higheranalogue of the Shannon entropy associated to mlogarithms ? Itcould be connected to the higher degrees of the informationcohomology space of Baudot and Bennequin (Entropy 2015).6 / 13Algebraic interpretation of the entropy function (1/2)Let R be a (commutative) ring and let D be a map from R to R.We will say that D is a unitary derivation over R if the followingaxioms hold :1 “Leibniz’s rule” : for all x, y ∈ R, we haveD(xy ) = xD(y ) + yD(x).2 “Additivity on partitions of unity” : for all x ∈ R, we haveD(x) + D(1 − x) = 0.We will denote by Der u (R) the set of unitary derivations over R.We will say that a map f : R → R is an abstract symmetricinformation function of degree 1 if the two following conditionshold : for all x, y ∈ R such that x, y , 1 − x, 1 − y ∈ R × , thefunctional equation FEITH holds and for all x ∈ R, we havef (x) = f (1 − x). Denote by IF 1 (R) the set of abstract symmetricinformation functions of degree 1 over R. Then IF 1 (R) is anRmodule. Let Leib(R) be the set of Leibniz functions over R (i.e.which fulﬁll the “Leibniz rule”).7 / 13Algebraic interpretation of the entropy function (2/2)Proposition : We have a morphism of Rmodulesh : Leib(R) → IF 1 (R), deﬁned by h(ϕ) = ϕ + ϕ ◦ τ , withτ (x) = 1 − x. Furthermore, Ker (h) = Der u (R). Hence, if h is onto, abstract information function are naturallyassociated to formal derivations. Nevertheless, h can be also 0.Indeed, if R = Fq , is a ﬁnite ﬁeld, then Leib(Fq ) = 0, butIF 1 (Fq ) = 0 (it is generated by £1 ).8 / 13Cohomological interpretation of formal entropy functionsThe following results are classical in origin (Cathelineau, 1988 andKontsevich, 1995)Proposition : Let F be a ﬁnite prime ﬁeld and H : F → F afunction which fulﬁlls the following conditions : H(x) = H(1 − x),the functional equation (FEITH) holds for H and H(0) = 0. Thenxthe function ϕ : F × F → F deﬁned by ϕ(x, y ) = (x + y )H( x+y ) ifx + y = 0 and 0 otherwise, is a nontrivial 2cocycle.sketch of proof : Suppose that ϕ is a 2coboundary. Then, there exists a mapQ : F → F , such that ϕ(x, y ) = Q(x + y ) − Q(x) − Q(y ). The functionψλ (x) = Q(λx) − λQ(x) is an additive morphism F → F , hence entirelydetermined by ψλ (1). The map ψλ (1) fulﬁlls the Leibniz chain rule on F × . Wededuce from it that ϕ = 0 (which is not possible, so it is not a coboundary !) We deduce that £1 is unique (up to a constant). In the real orcomplex we use other type of cohomological arguments (see alsothe relationship with Baudot and Bennequin, 2015).9 / 13Finite multiple polylogarithms (1/3)While classical polylogarithms play an important role in the theoryof mixed Tate motives over a ﬁeld, it turns out that it is oftenpreferable to also consider the larger class of multiplepolylogarithms (cf. Goncharov’s work). In a similar way it is usefulto investigate their ﬁnite analogues. We are mainly concerned withﬁnite double polylogarithms which are given as functionsZ/p × Z/p → Z/p by£a,b (x, y ) =0
We present a dictionary between arithmetic geometry of toric varieties and convex analysis. This correspondence allows for effective computations of arithmetic invariants of these varieties. In particular, combined with a closed formula for the integration of a class of functions over polytopes, it gives a number of new values for the height (arithmetic analog of the degree) of toric varieties, with respect to interesting metrics arising from polytopes. In some cases these heights are interpreted as the average entropy of a family of random processes.

”GSI’15”´Ecole Polytechnique, October 28, 2015Heights of toric varieties, entropyand integration over polytopesJos´ Ignacio Burgos Gil, Patrice Philippon & Mart´ SombraeınPatrice Philippon, IMJPRGUMR 7586  CNRS1Toric varietiesToric varieties form a remarkable class of algebraic varieties,endowed with an action of a torus having one Zariski dense openorbit. Toric divisors are those invariant by the action of the torus.Together with their toric divisors, they can be describedin terms of combinatorial objects such as lattice fans, supportfunctions or lattice polytopes(u1 ,u2 )→0(u1 ,u2 )→−u1(u1 ,u2 )→−u22Each cone corresponds to an aﬃne toric variety and the fanencodes how they glue together. If the fan is complete then thetoric variety is proper.The support function determines a toric divisor D on eachaﬃne toric chart. By duality, the stability set of the supportfunction is a polytope ∆, which may be empty but which is ofdimension n as soon as D is nef, which is equivalent to thesupport function being concave.One fundamental result is: if D is a toric nef divisor thendegD (X) = n!voln(∆).3HeightsA height measures the complexity of objects over the ﬁeld ofrational numbers, say. For a/b ∈ Q× and d = gcd(a, b):h(a/b) = log max(a/d, b/d) =log max(av , bv ),vthanks to the product formula:dv = 1vfor any d ∈ Q× and where v runs over all the (normalised)absolute values on Q (usual and padic).4HeightsA height measures the complexity of objects over the ﬁeld ofrational numbers, say. For a/b ∈ Q× and d = gcd(a, b):h(a/b) = log max(a/d, b/d) =log max(av , bv ),vthanks to the product formula:dv = 1, d ∈ Q×.vFor points of a projective space x = (x0 : . . . : xN ) ∈ PN (Q):h(x) =log xvv=−log(x) v ,vwhere · v is a norm on QN +1 compatible with the absolute value·v on Q (usual or padic). Metrics on OPN (1): (x) v =  (x)v .x v5On an abstract variety equipped with a divisor (X, D),deﬁned over Q, the suitable arithmetic setting amounts to acollection of metrics on the space of rational sections of thedivisor, compatible with the absolute values on Q (the collectionis in bijection with the set of absolute values on Q). We denoteD the resulting metrised divisor.Arithmetic intersection theory allows to deﬁne the height ofX relative to D analogously to the degree degD (X):hD (X) =hv (X)vwhere the local heights hv are deﬁned through an arithmeticanalogue of B´zout formula. Local heights depend on the choiceeof auxiliary sections but the global height does not.6Metrics on toric varietiesOn toric divisors, a metric is said toric if it is invariant bythe action of the compact subtorus of the principal orbit.There exists a bijection between toric metrics and continuousfunctions on the fan, whose diﬀerence with the support functionis bounded. The metric is semipositive iﬀ the correspondingfunction is concave.By Legendre duality, the semipositive toric metrics are also inbijection with the continuous, concave functions on the polytopeassociated to the toric divisor, dubbed roof function.7The roof function is the concave enveloppe of the graphof the function s → − log s v,sup, for s running over the toricsections of the divisor and its multiples.1Roof function of the pullback of the canonical metric of P2 on P1 by t→( 1 : 2 :t)tv=2v=∞v=otherThe support function itself corresponds to the socalledcanonical metric. Its roof function is the zero function on thepolytope.8Heights on toric varietiesLet (X, D) be a toric varieties with a toric divisor (overQ), equipped with a collection of toric metrics (a toric metriseddivisor).The (local) roof functions attached to the toric metriseddivisor sum up in the socalled global roof function:ϑ :=ϑv .vWe have the analogue of the formula seen for the degree:hD (X) = (n + 1)!ϑ.∆9Metrics from polytopesLet F (x) = x, uF + F (0) be the linear forms deﬁning apolytope Γ ⊂ Rn, with F running over its facets and uF =voln−1(F )nvoln(Γ) . Let ∆ ⊂ Γ be another polytope, the restriction of1ϑ := −F log( F )cFto ∆, is the roof function of some (archimedean) metric on thetoric variety X and divisor D deﬁned by ∆, hence D.Example: the roof function of the FubiniStudy metric on Pn is−(1/2)(x0 log(x0) + . . . + xn log(xn))1where x0 = 1 − x1 − . . . − xn (dual to − 2 log 1 +ne−2uii=1).10Height as average entropyLet x ∈ Γ and βx be the (discrete) random variable thatmaps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F ):voln−1(F )P (βx = F ) = dist(x, F ).nvoln(Γ)x•∆ΓF11Height as average entropyLet x ∈ Γ and βx be the (discrete) random variable thatmaps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F ):voln−1(F )P (βx = F ) = dist(x, F ).nvoln(Γ)The entropyE(βx) = −P (βx = F ) log(P (βx = F ))Fsatisﬁes1·voln(∆)hD (X)cE(βx)dvoln(x) =·.n + 1 degD (X)∆12Integration over polytopesAn aggregate of ∆ in a direction u ∈ Rn is the union ofall the faces of ∆ contained in {x ∈ Rn  x, u = λ} for someλ ∈ R.Deﬁnition – Let V be an aggregate in the direction of u ∈ Rn,we set recursively: If u = 0, then Cn(∆, 0, V ) = voln(V ) andCk (∆, 0, V ) = 0 for k = n. If u = 0, thenCk (∆, u, V ) = −FuF , uCk (F, πF (u), V ∩ F ),2uwhere the sum is over the facets F of ∆. This recursive formulaimplies that Ck (∆, u, V ) = 0 for all k > dim(V ).13Proposition [2, Prop.6.1.4] – Let ∆ ⊂ Rn be a polytope ofdimension n and u ∈ Rn. Then, for any f ∈ C n(R),dim(V )f (n)( x, u )dvoln(x) =∆Ck (∆, u, V )f (k)( V, u ).V ∈∆(u)k=0The coeﬃcients Ck (∆, u, V ) are determined by this identity.nExample: If ∆ = Conv(ν0, . . . , νn) = i=0{x; x, ui ≥ λi} is asimplex and u ∈ Rn \ {0}, then C0(∆, u, ν0) equalsn!voln(∆)
In this paper we propose a method to characterize and estimate the variations of a random convex set Ξ0 in terms of shape, size and direction. The mean nvariogram γ(n)Ξ0:(u1⋯un)↦E[νd(Ξ0∩(Ξ0−u1)⋯∩(Ξ0−un))] of a random convex set Ξ0 on ℝ d reveals information on the n th order structure of Ξ0. Especially we will show that considering the mean nvariograms of the dilated random sets Ξ0 ⊕ rK by an homothetic convex family rKr > 0, it’s possible to estimate some characteristic of the n th order structure of Ξ0. If we make a judicious choice of K, it provides relevant measures of Ξ0. Fortunately the germgrain model is stable by convex dilatations, furthermore the mean nvariogram of the primary grain is estimable in several type of stationary germgrain models by the so called npoints probability function. Here we will only focus on the Boolean model, in the planar case we will show how to estimate the n th order structure of the random vector composed by the mixed volumes t (A(Ξ0),W(Ξ0,K)) of the primary grain, and we will describe a procedure to do it from a realization of the Boolean model in a bounded window. We will prove that this knowledge for all convex body K is sufficient to fully characterize the so called difference body of the grain Ξ0⊕˘Ξ0. we will be discussing the choice of the element K, by choosing a ball, the mixed volumes coincide with the Minkowski’s functional of Ξ0 therefore we obtain the moments of the random vector composed of the area and perimeter t (A(Ξ0),U(Ξ)). By choosing a segment oriented by θ we obtain estimates for the moments of the random vector composed by the area and the Ferret’s diameter in the direction θ, t((A(Ξ0),HΞ0(θ)). Finally, we will evaluate the performance of the method on a Boolean model with rectangular grain for the estimation of the second order moments of the random vectors t (A(Ξ0),U(Ξ0)) and t((A(Ξ0),HΞ0(θ)).

Characterization and Estimation of the Variations of aRandom Convex Set by its Mean nVariogram :Application to the Boolean ModelS.Rahmani, JC.Pinoli & J.DebayleEcole Nationale Sup´rieure des Mines de SaintEtienne,FRANCEeSPIN, PROPICE / LGF, UMR CNRS 530728/10/2015SR (ENSMSE / LGFPMDM)GSI 201528/10/20151 / 22Geometric Stochastic Modeling and objectivesSection 1Geometric Stochastic Modeling and objectivesSR (ENSMSE / LGFPMDM)GSI 201528/10/20152 / 22Geometric Stochastic Modeling and objectivesStochastic materialsMaterial modellingMaterial characterizationSR (ENSMSE / LGFPMDM)GSI 201528/10/20153 / 22Geometric Stochastic Modeling and objectivesGermGrain model [Matheron 1967]DeﬁnitionΞ=xi + Ξi(1)xi ∈ΦThe Ξi are i.i.d.Φ a point processLaw of ΦLaw of Ξ0⇔⇔Spatial distributiongranulometryBoolean model ⇒ Φ Poisson point process of intensity λSR (ENSMSE / LGFPMDM)GSI 201528/10/20154 / 22Geometric Stochastic Modeling and objectivesObjectives and state of the artGeometrical characterization of Ξ0from measurements in a bounded window Ξ ∩ MNo assumption on Ξ0 ’s shape.Describing Ξ0 .State of the artMiles formulae [Miles 1967]Tangent points method [Molchanov 1995]Minimum contrast method[ Dupac & Digle 1980]⇒ Mean geometric parameter λ, E[A(Ξ0 )], E[U(Ξ0 )]Formula for distribution for model of disk [Emery 2012]SR (ENSMSE / LGFPMDM)GSI 201528/10/20155 / 22Geometric Stochastic Modeling and objectivesCharacterization and description of the grainFor homothetic grains:E[U(Ξ0 )]2πE[U(Ξ0 )]=4E[A(Ξ0 )]πDisk of radius r : E[r ] =& E[r 2 ] =Square of side x :E[x]& E[r 2 ] = E[A(Ξ0 )]⇒ Parametric distribution of homothetic factor!For non homothetic grains: rectangle, ellipse...Same mean for area and perimeter (Minkowski densities)⇒ insuﬃcient to fully characterize Ξ0 ! What about the variations ofthese geometrical characteristics?SR (ENSMSE / LGFPMDM)GSI 201528/10/20156 / 22Theoretical aspectsSection 2Theoretical aspectsSR (ENSMSE / LGFPMDM)GSI 201528/10/20157 / 22Theoretical aspectsFrom covariance of Ξ to variation of Ξ0Mean covariogram:γΞ0 (u) = E[A(Ξ0 ∩ Ξ0 + u)]¯Covariance:CΞ (u) = P(x ∈ (Ξ ∩ Ξ + u))Relationship:γΞ0 (u) =¯2CΞ (u) − pΞ1log 1 +γ(1 − pΞ )2(2)In addition:R2SR (ENSMSE / LGFPMDM)γΞ0 (u)du = E[A(Ξ0 )2 ]¯GSI 201528/10/20158 / 22Theoretical aspectsStability by convex dilationsΞΞ⊕K(a) grain Ξ0 , intensity λ(b) grain Ξ0 ⊕ K , intensity λWhere X ⊕ Y = {x + y x ∈ X , y ∈ Y }⇒ The Boolean model is stable under convex dilationsSR (ENSMSE / LGFPMDM)GSI 201528/10/20159 / 22Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =SR (ENSMSE / LGFPMDM)GSI 2015R2E[γΞ0 ⊕rK (u)]du28/10/201510 / 22Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =R2E[γΞ0 ⊕rK (u)]duSteiner’s formula (mixed volumes)A(Ξ0 ⊕ rK ) = A(Ξ0 ) + 2rW (Ξ0 , K ) + r 2 A(K )The polynomial ζ0,Kζ0,K (r ) = E[A2 ] + 4r E[A0 W (Ξ0 , K )] + r 2 (4E[W (Ξ0 , K )2 ] +0+ 2A(K )E[A0 ]) + 4r 3 A(K )E[W (Ξ0 , K )] + r 4 A(K )2SR (ENSMSE / LGFPMDM)GSI 201528/10/201510 / 22Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =R2E[γΞ0 ⊕rK (u)]duSteiner’s formula (mixed volumes)A(Ξ0 ⊕ rK ) = A(Ξ0 ) + 2rW (Ξ0 , K ) + r 2 A(K )The polynomial ζ0,Kζ0,K (r ) = E[A2 ] + 4r E[A0 W (Ξ0 , K )] + r 2 (4E[W (Ξ0 , K )2 ] +0+ 2A(K )E[A0 ]) + 4r 3 A(K )E[W (Ξ0 , K )] + r 4 A(K )2⇒ Estimation of E[A2 ], E[A0 W (Ξ0 , K )] and E[W (Ξ0 , K )2 ]0SR (ENSMSE / LGFPMDM)GSI 201528/10/201510 / 22Theoretical aspectsGeneralization to nth order momentsThe mean nvariogram(n)For n ≤ 2, γΞ0 (u1 , · · · un−1 ) = E[A(n−1i=1 (Ξ0− ui ) ∩ Ξ0 )]Relation nvariogram → n point probability function (see proceding)(n)Of course R2 · · · R2 γΞ0 (u1 , · · · un−1 )du1 · · · dun−1 = E[A(Ξ0 )n ]Then the development of E[A(Ξ0 ⊕ K )n ] by Steiner’s formula gives:∀K convex, nth order moments of (A0 , W (Ξ0 , K ))SR (ENSMSE / LGFPMDM)GSI 201528/10/201511 / 22Theoretical aspectsThe interpretation of the mixed areaDeﬁnitionFor Ξ0 and K convex, W (Ξ0 , K ) = 1 (A(Ξ0 ⊕ K ) − A(K ))2For unit ball :W (Ξ0 , B) = U(Ξ) the perimetereFor a segment: W (Ξ0 , Sθ ) = HΞ0 (θ) the F´ret’s diameterHΞ0 (θ)Ξ0OxθFor a polygon W (Ξ,Ni=1 αi Sθi )
Short course (chaired by Roger Balian)
ifINSTITUTFOURIERGeometry on the set of quantum states andquantum correlationsDominique SpehnerInstitut Fourieret Laboratoire de Physique et Mod´lisation des Milieux Condens´s,eeGrenoble´Short course, GSI’2015, Ecole Polytechnique, Paris, 28/10/2015Quantum Correlations & Quantum Information Quantum Information Theory (QIT) studies quantum systemsthat can perform informationprocessing tasks more eﬃcientlythan one can do with classical systems: computational tasks (e.g. factorizing into prime numbers) quantum communication (e.g. quantum cryptography, ...) A quantum computer works with qubits,i.e. twolevel quantum systems inlinear combinations of 0y and 1y. Entanglement is a resource for quantum computation and communication[Bennett et al. ’96, Josza & Linden ’03]However, other kinds of “quantum correlations” diﬀeringfrom entanglement could also explain the quantum eﬃciencies.Outlines Entangled and nonclassical states Contractive distances on the set of quantum states Geometrical measures of quantum correlations
Basic mathematical objects in quantummechanics(1) A Hilbert space H (in this talk: n dim H V).(2) States ρ are nonnegative operators on H with trace one.(3) Observables A are selfadjoint operators on H(in this talk: A MatpC, nq ﬁnite Hermitian matrices)(4) An evolution is given by a linear map Φ : MatpC, nq Ñ MatpC, nqwhich is(TP) trace preserving (so that trpΦpρqq trpρq 1)(CP) Completely Positive, i.e. for any integer d ¥ 1 and anyd ¢ d matrix pAij qd 1 ¥ 0 with elements Aij MatpC, nq,i,jone has pΦpAij qqd 1 ¥ 0.i,jSpecial case: unitary evolution Φpρq U ρ U ¦ with U unitary.Pure and mixed quantum states A pure state is a rankone projector ρψ ψyxψ with ψy H,}ψ} 1 (actually, ψy belongs to the projective space P H).The set E pHq of all quantum states is a convex cone. Its extremalelements are the pure states. A mixed state is a nonpure state.It has inﬁnitely many purestate decompositionsρwith pi ¥ 0,°¸piψiyxψi,ii pi 1 and ψiy P H.Statistical interpretation: the pure states ψiy have beenprepared with probability pi.Quantumclassical analogyHilbert space Hstate ρobservableset of quantum statesE pHqCPTP map ΦØØØØﬁnite sample space Ωprobability p on pΩ, P pΩqqrandom variable on pΩ, P pΩqqprobability simplex2Eclass p R ;°@1Ø stochastic matrices pΦklqk,l1,...,n°(Φkl ¥ 0, k Φkl 1 d l)nk pkSeparable statesA bipartite system AB is composed of two subsystems A and B withHilbert spaces HA and HB. It has Hilbert space HAB HA HB.For instance, A and B can be the polarizations of two photonslocalized far from each other ñ HAB C2 C2 (2 qubits): A pure state Ψy of AB is separable if it is a product stateΨy ψy φy with ψy P HA and φy P HB. A mixed state ρ is separable if it admits a pure state°decomposition ρ i piΨiyxΨi with Ψiy ψiy φiyseparable for all i.Entangled states Nonseparable states are called entangled. Entanglement isãÑ the most speciﬁc feature of Quantum Mechanics.ãÑ used as a resource in Quantum Information (e.g. quantumcryptography, teleportation, high precision interferometry...). Examples of entangled & separable states: let HA HBC2 (qubits) with canonical basis t0y, 1yu. The pure states¡©Ψ¨ y c12 0 0y ¨ 1 1y are maximally entangled.BellãÑ lead to the maximal violation of the Bell inequalitiesobserved experimentally [Aspect et al ’82] ñ nonlocality of QM1  1 Ψ¡ yxΨ¡ In contrast, the mixed state ρ ΨBellyxΨBellBell22 Bellis separable !(indeed, ρ 110 0yx0 0 2 1 1yx1 1).2Classical states
A state ρ of AB is classical if it has a spectral decomposition°ρ k pk Ψk yxΨk  with product u states Ψk y αk y βk y.Classicality is equivalent to separability for pure states only.° A state ρ is Aclassical if ρ i qiαiyxαi ρBi withtαiyu orthonormal basis of HA and ρBi arbitrary states of B. The set CAB (resp. CA) of all (A)classical states is not convex.Its convex hull is the set of separable states SAB. Some tasks impossible to do clasρSABCACBCABsically can be realized using separable nonclassical mixed states. Such states are easier to produce andpresumably more robust to a couplingwith an environment.Quantum vs classical correlations Central question in Quantum Information theory: identify(and try to protect) the Quantum Correlations responsiblefor the exponential speedup of quantum algorithms.classical correlations For mixed states,two (at least)kinds of QCsquantum correlationsÕ×entanglement [Schr¨dinger ’36]ononclassicality (quantum discord)[Ollivier, Zurek ’01, Henderson, Vedral ’01]OutlinesEntangled and nonclassical states Contractive distances on the set of quantum states
Contractive distancesσφ(σ)ρφ(ρ)CONTRACTIVE DISTANCE
The set EAB of all quantum states ofa bipartite system AB (i.e. , operatorsρ ¥ 0 on HAB with tr ρ 1) canbe equipped with many distances d. From a QI point of view, interesting distances must be contractiveunder CPTP maps, i.e. for any such map Φ on EAB, d ρ, σ EAB,dpΦpρq, Φpσ qq ¤ dpρ, σ qPhysically: irreversible evolutions can only decrease thedistance between two states. A contractive distance is in particular unitarily invariant, i.e.dpU ρU ¦, U σU ¦q dpρ, σ q for any unitary U on HAB The Lpdistances dppρ, σq }ρ ¡ σ}p ptr ρ ¡ σpq1{p arenot contractive excepted for p 1 (trace distance) [Ruskai ’94].Petz’s characterization of contractive distances
Classical setting:there exists a unique (up to a multiplicativefactor) contractive Riemannian distance dclas on the probability°2simplex Eclas, with Fisher metric ds k dp2 {pk [Cencov ’82]k Quantum generalization: any Riemannian contractive distanceon the set of states E pHq with n dim H V has metricds2 gρpdρ, dρq n¸k,l1cppk , plqxk dρly2where pk and k y are the eigenvalues and eigenvectors of ρ,pf pq {pq qf pp{q qcpp, q q 2pqf pp{q qf pq {pqand f : R Ñ R is an arbitary operatormonotone functionsuch that f pxq xf p1{xq[Morozova & Chentsov ’90, Petz ’96]Distance associated to the von Neumann entropy Quantum analog of the Shannon entropy: von Neumann entropyS pρq ¡ trpρ ln ρq Since S is concave, the physically most natural metric is§§§ρds2 gS pdρ, dρq ¡ d S pdt tdρq § d F pX sdX q §§s0dst02222[Bogoliubov; Kubo & Mori; Balian, Alhassid & Reinhardt, ’86, Balian ’14].with F pX q ln trpeX q and ρ eX ¡F pX q eX { trpeX q.1 ds2 has the Petz form with f pxq x¡xlnãÑ the corresponding distance is contractive. Loss of information when mixing the neighboring equiprobablestates ρ¨ ρ ¨ 1 dρ: ds2{8 S pρq ¡ 1 S pρ q ¡ 1 S pρ¡q222Bures distance and Uhlmann ﬁdelity Fidelity (generalizes F xψφy2 for mixed states) [Uhlmann ’76]2 c c 1{2@2F pρ, σ q trr σρ σ s F pσ, ρq ¨1[Bures ’69] Bures distance: dBupρ, σq 2 ¡ 2 F pρ, σqãÑ has metric of the Petz form with f pxq x 12ãÑ smallest contractive Riemannian distance[Petz ’96]ãÑ coincides with the FubinyStudy metric on P H for pure statesãÑ dBupρ, σ q2 is jointly convex in pρ, σ q dBupρ, σq sup dclaspp, qq with sup over all measurements2giving outcome k with proba pk (for state ρ) and qk (for state σ)[Fuchs ’96]Bures distance and Fisher information
In quantum metrology, the goal is toestimate an unknown parameter φ bymeasuring the output statesρoutpφq e¡iφH ρ eiφHand using a statistical estimatordepending on the measurement results(e.g. in quantum interferometry: estimate the phase shift φ1 ¡ φ2)§© ie¡§ãÑ precision ∆φ § fxφestyφ §¡1§ fφ § φest¡φ2 1{2φThe smallest precision is given by the quantum Cr´merRao bounda1p∆φqbest cN cF pρ,H q , F pρ, H q 4dBupρ, ρ dρq2 , dρ ¡irH, ρsN = number of measurementsF pρ, H q = quantum Fisher information[Braunstein & Caves ’94]SummaryCONTRACTIVE RIEMANNIAN METRICS:ClassicalQuantumInterpretationQ. metrologyunique:Bures ds2Bu..(Fisher information) ¡d2SLoss of informationds2 clasÕ¸ dp2kk(Fisher)pkÑ×ds2S..when merging 2 statesHellinger ds2Hel Q. state discrimination..with many copiesOutlinesEntangled and nonclassical statesContractive distances on the set of quantum states Geometrical measures of quantum correlations
Geometric approach of quantum correlationsGeometric entanglement:E pρq min dpρ, σsepq2σsepSABρGeometric quantum discord :SABCACBCABDApρq min dpρ, σAclq2σAclCAProperties:E pρΨq DApρΨq for pure states ρΨE is convexÐ for Bures distanceÐ if d2 is jointly convexE pΦA ΦBpρqq ¤ E pρq for anyEntanglement monotonicity:TPCP maps ΦA and ΦB acting on A and B (also true for DA but¦only when ΦApρAq UA ρA UA).Ð if d is contractiveBures geometric measure of entanglementEBupρq dBupρ, SABq2 2 ¡ 2 F pρ, SABqwith F pρ, SABq maxσsepSAB F pρ, σsepq= maximal ﬁdelity between ρ and a separable state.ÝÑ Main physical question: determine F pρ, SABq explicitely. pb: it is not easy to ﬁnd the geodesics for the Bures distance! The closest separable state to a pure state ρΨ is a pure productstate, so that F pρΨ, SABq maxϕy,χy xϕ χΨy2Ñ easy! For mixed states ρ, F pρ, SABq coincides with the convex roofF pρ, SABq maxtΨiy,ηiu[Streltsov, Kampermann and Bruß’10]°i pi FpρΨ , SABqÑ not easy!°max. over all pure state decompositions ρ i piΨiyxΨi of ρ.iThe twoqubit caseAssume that both subsystems A and B are qubits, HA
Concurrence:HBC2 .[Wootters ’98]C pρq maxt0, λ1 ¡ λ2 ¡ λ3 ¡ λ4uwith λ2 ¥ λ2 ¥ λ2 ¥ λ2 the eigenvalues of ρ σy σy ρ σy σy1234σy¢0i¡i0= Pauli matrixρ = complex conjugate of ρ in the canonical (product) basis.
Then[Wei and Goldbart ’03, Streltsov, Kampermann and Bruß’10]F pρ, SABq 12 1 1 ¡ C pρq2¨Quantum State Discrimination"ancilla"ψ2 > ψ3 > ψ1 >?0>1111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000MEASUREMENTAPPARATUSãÑAreceiver gets a state ρi randomlychosen with probability ηi among aknown set of states tρ1, ¤ ¤ ¤ , ρmu. To determine the state he has in hands,he performs a measurement on it.Applications : quantum communication, cryptography,... If the ρi are u, one can discriminate them unambiguously Otherwise one succeeds with probability°PS i ηi trpMiρiqΠ2ψ2>Mi = nonnegative operators describing the°measurement, i Mi 1.Open pb (for n ¡ 2): ﬁnd the optimal measurementopttMioptu and highest success probability PS .ψ1>Π1Bures geometric quantum discordThe square Bures distance DApρq dBupρ, CAq2 to the set CA ofAclassical states is a geometric analog of the quantum discordcharacterizing the “quantumness” of states(actually, the Aclassical states are the states with zero discord)opt
PS pαiyq optimal success proba. in discriminating the states¡1cρ α yxα  1 cρρi η iiiwith proba ηi xαi trBpρqαiy, where tαiyu orthonormal basis of HA.
The geometric quantum discord is given by solving a statediscrimination problem[Spehner and Orszag ’13]DApρq 2 ¡ 2 maxtαiyuoptPS pαiyqClosest Aclassical states to a state ρ
The closest Aclassical states to ρ areσρ 1F pρ,CAq°optαiiyxoptαixoptαicρ Πoptcρ αoptyii[Spehner and Orszag ’13]where tΠoptu is the optimal von Neumann measurement andioptopttαi yu the orthonormal basis of HA maximizing PS , i.e.¸ηF pρ, C q nAAi1pMioptρoptq.iopttri
ρ can have either a unique or an inﬁnity of closest Aclassicalstates.The qubit case
If A is a qubit, HAF pρ, CAq C2, and dim HB nB, thenA1max 1 ¡ tr Λpuq 2λlpuq2 }u}1l 13nB¸[Spehner and Orszag ’14]λ1puq ¥ ¤ ¤ ¤ ¥ λ2nB puq eigenvaluesof the 2nB ¢ 2nB matrixcρ σ 1 cρΛpuq uwith u R3, }u} 1, andσu u1σ1 u2σ2 u3σ3 with σiPauli matrices.c3MKστLc2σρστρIc1Η+Η−σρτρc1=c 2JG−NG+Conclusions & perspectives Conclusions: Contractive Riemannian distances on the set of quantum statesprovide useful tools for measuring quantum correlations inbipartite systems.Major challenges areãÑ compute the geometric measures for simple systemsãÑ compare the measures obtained from diﬀerent distancesand look for universal properties References:Review article: D. Spehner, J. Math. Phys. 55, 075211 (’14)D. Spehner, M. Orszag, New J. Phys. 15, 103001 (’13)D. Spehner, M. Orszag, J. Phys. A 47, 035302 (’14)R. Roga, D. Spehner, F. Illuminati, arXiv:1510.06995
Keynote speach Marc Arnaudon (chaired by Frank Nielsen)
We will prove a EulerPoincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the NavierStokes equation on a bounded domain and the CamassaHolm equation.

Deterministic frameworkStochastic frameworkStochastic EulerPoincaré reduction.Marc ArnaudonUniversité de Bordeaux, FranceGSI, École Polytechnique, 29 October 2015Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkReferencesArnaudon, Marc; Chen, Xin; Cruzeiro, Ana Bela; Stochastic EulerPoincaréreduction. J. Math. Phys. 55 (2014), no. 8, 17ppChen, Xin; Cruzeiro, Ana Bela; Ratiu, Tudor S.; Constrained and stochasticvariational principles for dissipative equations with advected quantities.arXiv:1506.05024Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic framework1Deterministic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathssCharacterization of the geodesics on GV , ·, · 0sEulerPoincaré equation on GV2Stochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsMarc ArnaudonStochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is deﬁned byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the EulerLagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is deﬁned byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the EulerLagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is deﬁned byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the EulerLagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is deﬁned byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the EulerLagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the conﬁguration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is deﬁned byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisﬁes the EulerPoincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the conﬁguration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is deﬁned byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisﬁes the EulerPoincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic EulerPoincaré reduction.EulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the conﬁguration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is deﬁned byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisﬁes the EulerPoincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVWe will be interested in variations ξ(·) satisfying˙ξ(t) = ν(t) + adξ(t) ν(t)˙for some ν ∈ C 1 ([0, T ], Te G),which is equivalent to the variation of g(·) with the perturbationg ε (t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE onG:de (t) = εeε,ν (t) · ν(t),˙dt ε,νeε,ν (0) = e.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVWe will be interested in variations ξ(·) satisfying˙ξ(t) = ν(t) + adξ(t) ν(t)˙for some ν ∈ C 1 ([0, T ], Te G),which is equivalent to the variation of g(·) with the perturbationg ε (t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE onG:de (t) = εeε,ν (t) · ν(t),˙dt ε,νeε,ν (0) = e.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVLet M be a ndimensional compact Riemannian manifold. We deﬁneGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an inﬁnite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVLet M be a ndimensional compact Riemannian manifold. We deﬁneGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an inﬁnite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVLet M be a ndimensional compact Riemannian manifold. We deﬁneGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an inﬁnite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVLet be the LeviCivita connection associated with the Riemannian manifold(M, g). We deﬁne a right invariant connection 0 on Gs by0 ˜˜YX(η) :=∂∂tt=0˜Y (ηt ) ◦ ηt−1 ◦ η +Xη Yη◦ η,˜ ˜˜˜where X , Y ∈ L (Gs ), Xη := X ◦ η −1 , Yη := Y ◦ η −1 ∈ L s (M), and η is a C 1s such that η = η and d˜ηt = X (η). Here L (Gs ) denotes the setcurve in G0dt t=0of smooth vector ﬁelds on Gs .0is the LeviCivita connection associated to Gs , ·, ·Marc Arnaudon0.Stochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVLet be the LeviCivita connection associated with the Riemannian manifold(M, g). We deﬁne a right invariant connection 0 on Gs by0 ˜˜YX(η) :=∂∂tt=0˜Y (ηt ) ◦ ηt−1 ◦ η +Xη Yη◦ η,˜ ˜˜˜where X , Y ∈ L (Gs ), Xη := X ◦ η −1 , Yη := Y ◦ η −1 ∈ L s (M), and η is a C 1s such that η = η and d˜ηt = X (η). Here L (Gs ) denotes the setcurve in G0dt t=0of smooth vector ﬁelds on Gs .0is the LeviCivita connection associated to Gs , ·, ·Marc Arnaudon0.Stochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 metric ·, · 0 and its LeviCivita connection 0,V are deﬁned on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 metric ·, · 0 and its LeviCivita connection 0,V are deﬁned on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 metric ·, · 0 and its LeviCivita connection 0,V are deﬁned on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 metric ·, · 0 and its LeviCivita connection 0,V are deﬁned on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every ﬁxed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every ﬁxed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every ﬁxed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsV[V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian pathssg ∈ C 2 ([0, T ], GV ) satisfying the equation above is a geodesic on GV , ·, · 0,V˙(i.e. 0,V g(t)) if and only of the velocity ﬁeld u satisﬁes the Euler equation for˙g(t)incompressible inviscid ﬂuids(E)Notice that the termsystem rewrites as∂u∂tdivu=−=0uu−p corresponds to the use of∂u∂tdivuMarc Arnaudon=−=0p0instead of0,Vu uStochastic EulerPoincaré reduction.0,V :the ﬁrstDeterministic frameworkStochastic frameworkEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsV[V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian pathssg ∈ C 2 ([0, T ], GV ) satisfying the equation above is a geodesic on GV , ·, · 0,V˙(i.e. 0,V g(t)) if and only of the velocity ﬁeld u satisﬁes the Euler equation for˙g(t)incompressible inviscid ﬂuids(E)Notice that the termsystem rewrites as∂u∂tdivu=−=0uu−p corresponds to the use of∂u∂tdivuMarc Arnaudon=−=0p0instead of0,Vu uStochastic EulerPoincaré reduction.0,V :the ﬁrstEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVDeterministic frameworkStochastic frameworkIf we takes: Te GV → R as(X ) := X , X ,sX ∈ Te GV ,s1and deﬁne the action functional C : Ce,e ([0, T ], GV ) → R byTC (g(·)) :=˙g(t) · g(t)−1dt,0sthen a Lagrangian path g ∈ C 2 ([0, T ], GV ) integral path of u is a critical point of Cif and only if u satisﬁes the Euler equation (E). [J.E. Marsden, T. Ratiu 1994][J.E. Marsden, J. Scheurle 1993]Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic framework[S. Shkoller 1998] If we takeX, X(X ) :=mEulerPoincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEulerPoincaré equation on GsVs: Te GV → R as the H 1 metricdµg (m) + α2MX,MXmsdµg (m), X ∈ Te GV ,s1and deﬁne the action functional C : Ce,e ([0, T ], GV ) → R in the same way assbefore, then a Lagrangian path g ∈ C 2 ([0, T ], GV ) integral path of u is a criticalpoint of C if and only if u satisﬁes the CamassaHolm equation ∂ν ∂t + u · ν + α2 ( u)∗ · ∆ν = p,ν = (1 + α2 ∆)u,div(u) = 0.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsAim: to establish a stochastic EulerPoincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsAim: to establish a stochastic EulerPoincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsAim: to establish a stochastic EulerPoincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkAn Rn valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has ﬁnite variation.If (Nt ) is a martingale, thenE[Nt Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DeﬁningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkAn Rn valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has ﬁnite variation.If (Nt ) is a martingale, thenE[Nt Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DeﬁningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkAn Rn valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has ﬁnite variation.If (Nt ) is a martingale, thenE[Nt Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DeﬁningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a deﬁnition for manifoldvalued martingales.DeﬁnitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a deﬁnition for manifoldvalued martingales.DeﬁnitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a deﬁnition for manifoldvalued martingales.DeﬁnitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a deﬁnition for manifoldvalued martingales.DeﬁnitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of Gvalued semimartingales deﬁneJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the ﬂow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDeﬁnitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of Gvalued semimartingales deﬁneJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the ﬂow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDeﬁnitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of Gvalued semimartingales deﬁneJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the ﬂow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDeﬁnitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkTheoremg is a critical point of J if and only ifdu(t)= −ad∗(t) u(t) − K (u(t))˜udtwith12˜u (t) = u(t) −Hi Hi ,ad∗ v , w = v , adu vui≥1and K : G → G satisﬁesK (u), v = −u,12adv Hi Hi+Hi(adv (Hi ))i≥1Remark 1If for all i ≥ 1, Hi = 0, or u v = 0 for all u, v ∈ G , then K (u) = 0 and we get thestandard EulerPoincaré equation.Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkTheoremg is a critical point of J if and only ifdu(t)= −ad∗(t) u(t) − K (u(t))˜udtwith12˜u (t) = u(t) −Hi Hi ,ad∗ v , w = v , adu vui≥1and K : G → G satisﬁesK (u), v = −u,12adv Hi Hi+Hi(adv (Hi ))i≥1Remark 1If for all i ≥ 1, Hi = 0, or u v = 0 for all u, v ∈ G , then K (u) = 0 and we get thestandard EulerPoincaré equation.Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkPropositionIf for all i ≥ 1,Hi Hi= 0 thenK (u) = −12Hi·Hi u+ R(u, Hi )Hi .i≥1In particular if (Hi ) is an o.n.b. of G thenK (u) = −111u = − ∆u + Ric u222Marc Arnaudonthe Hodge Laplacian.Stochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector ﬁeld on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can ﬁnd (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector ﬁeld on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can ﬁnd (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic EulerPoincaré reduction.Semimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector ﬁeld on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can ﬁnd (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves NavierStokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2dimensional torus. Then g is a critical point of Jonly if u solves CamassaHolm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves NavierStokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2dimensional torus. Then g is a critical point of Jonly if u solves CamassaHolm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic EulerPoincaré reduction.Deterministic frameworkStochastic frameworkSemimartingales in a Lie groupStochastic EulerPoincaré reductionGroup of volume preserving diffeomorphismsNavierStokes and CamassaHolm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves NavierStokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2dimensional torus. Then g is a critical point of Jonly if u solves CamassaHolm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic EulerPoincaré reduction.
Information Geometry Optimization (chaired by Giovanni Pistone, Yann Ollivier)
When observing data x1, . . . , x t modelled by a probabilistic distribution pθ(x), the maximum likelihood (ML) estimator θML = arg max θ Σti=1 ln pθ(x i ) cannot, in general, safely be used to predict xt + 1. For instance, for a Bernoulli process, if only “tails” have been observed so far, the probability of “heads” is estimated to 0. (Thus for the standard logloss scoring rule, this results in infinite loss the first time “heads” appears.)

Laplace’s Rule of Succession inInformation GeometryYann OllivierCNRS & ParisSaclay University, FranceSequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man?Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man?Common performance criterion for prediction: cumulated loglossLT := −T −1∑︁t=0to be minimized.log p t+1 (xt+1  x1...t )Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man?Common performance criterion for prediction: cumulated loglossLT := −T −1∑︁log p t+1 (xt+1  x1...t )t=0to be minimized.This corresponds to compression cost, and is also equal to squareloss for Gaussian models.Maximum likelihood estimatorMaximum likelihood strategy: Fix a parametric model p
A divergence function defines a Riemannian metric G and dually coupled affine connections (∇, ∇ ∗ ) with respect to it in a manifold M. When M is dually flat, a canonical divergence is known, which is uniquely determined from {G, ∇, ∇ ∗ }. We search for a standard divergence for a general nonflat M. It is introduced by the magnitude of the inverse exponential map, where α = (1/3) connection plays a fundamental role. The standard divergence is different from the canonical divergence.

GSI – 2015  ParisStandard Divergence in Manifoldof Dual Affine ConnectionsShunichi Amari (RIKEN Brain Science Institute)Nihat Ay (MaxPlanck Inst. Mathematics in Science)Divergence and metricD p:qD:0d1gij2d idjO d3G : Riemannian metric, positivedefiniteDivergence and dual affine connectionsijkijkijkijkD:ijkijkD:ii;jjDual geometryM , g, ,X Y,ZXM , g,T ,ijkoijkY,ZTijk2TijkY,ijkXZijko: LeviCivita connectionDual geometryM : dually flat :Dcanonical divergence,:Bregman divergenceExponential map :t00qpXp1q0Xgeodesiclog p qExponential map divergenceD p:qX p:q2qgeodesicdivergenceDp:qXpp:q2Theorem 1. Exponential map divergenceinducesTheorem 2.3 geometry13exponential mapdivergence recovers the original geometryStandard divergence: Dstanp:qX1/3p, q2Dstan [ p : q ]Dstan p : qD *stan [q : p ]1X21/3p, q2X 1/3 q, pRemark: dually flat caseD[ p : q ]10t2(t ) dtDstan2DcanDivergence and projectionˆparg min D p : qq SSpˆpprojection theorem:Xc grad q D p : q
The statistical structure on a manifold M is predicated upon a special kind of coupling between the Riemannian metric g and a torsionfree affine connection ∇ on the TM, such that ∇ g is totally symmetric, forming, by definition, a “Codazzi pair” { ∇ , g}. In this paper, we first investigate various transformations of affine connections, including additive translation (by an arbitrary (1,2)tensor K), multiplicative perturbation (through an arbitrary invertible operator L on TM), and conjugation (through a nondegenerate twoform h). We then study the Codazzi coupling of ∇ with h and its coupling with L, and the link between these two couplings. We introduce, as special cases of Ktranslations, various transformations that generalize traditional projective and dualprojective transformations, and study their commutativity with Lperturbation and hconjugation transformations. Our derivations allow affine connections to carry torsion, and we investigate conditions under which torsions are preserved by the various transformations mentioned above. Our systematic approach establishes a general setting for the study of Information Geometry based on transformations and coupling relations of affine connections – in particular, we provide a generalization of conformalprojective transformation.

This paper address the problem of online learning finite statistical mixtures of exponential families. A short review of the ExpectationMaximization (EM) algorithm and its online extensions is done. From these extensions and the description of the kMaximum Likelihood Estimator (kMLE), three online extensions are proposed for this latter. To illustrate them, we consider the case of mixtures of Wishart distributions by giving details and providing some experiments.

Online kMLE for mixture modelling withexponential familiesChristophe SaintJeanFrank NielsenGeometry Science Information 2015Oct 2830, 2015  Ecole Polytechnique, ParisSaclayApplication ContextWe are interested in building a system (a model) which evolveswhen new data is available:x1 , x2 , . . . , xN , . . .The time needed for processing a new observation must beconstant w.r.t the number of observations.The memory required by the system is bounded.Denote π the unknown distribution of X2/27Outline of this talk1Online learning exponential families2Online learning of mixture of exponential familiesIntroduction, EM, kMLERecursive EM, Online EMStochastic approximations of kMLEExperiments3Conclusions3/27Reminder : (Regular) Exponential FamilyFirstly, π will be approximated by a member of a (regular)exponential family (EF):EF = {f (x; θ) = exp { s(x), θ + k(x) − F (θ)θ ∈ Θ}Terminology:λ source parameters.θ natural parameters.η expectation parameters.s(x) suﬃcient statistic.k(x) auxiliary carrier measure.F (θ) the lognormalizer:diﬀerentiable, strictlyconvexΘ = {θ ∈ RD F (θ) < ∞}is an open convex setAlmost all common distributions are EF members but uniform,Cauchy distributions.4/27Reminder : Maximum Likehood Estimate (MLE)Maximum Likehood Estimate for general p.d.f:Nˆθ(N) = argmaxθi=11f (xi ; θ) = argmin −NθNlog f (xi ; θ)i=1assuming a sample χ = {x1 , x2 , ..., xN } of i.i.d observations.Maximum Likehood Estimate for an EF:1ˆθ(N) = argmin −s(xi ), θ − cst(χ) + F (θ)Nθiwhich is exactly solved in H, the space of expectation parameters:η (N) =ˆ1ˆF (θ(N) ) =Nˆs(xi ) ≡ θ(N) = ( F )−1i1Ns(xi )i5/27Exact Online MLE for exponential familyA recursive formulation is easily obtainedAlgorithm 1: Exact Online MLE for EFInput: a sequence S of observationsInput: Functions s and ( F )−1 for some EFOutput: a sequence of MLE for all observations seen beforeη (0) = 0; N = 1;ˆfor xN ∈ S doη (N) = η (N−1) + N −1 (s(xN ) − η (N−1) );ˆˆˆyield η (N) or yield ( F )−1 (ˆ(N) );ˆηN = N + 1;Analytical expressions of ( F )−1 exist for most EF (but not all)6/27Case of Multivariate normal distribution (MVN)Probability density function of MVN:d11T Σ−1 (x−µ)N (x; µ, Σ) = (2π)− 2 Σ− 2 exp− 2 (x−µ)One possible decomposition:N (x; θ1 , θ2 ) = exp{ θ1 , x + θ2 , −xx T F1d1−1− t θ1 θ2 θ1 − log(π) + log θ2 }422=⇒s(x) = (x, −xx T )1TT( F )−1 (η1 , η2 ) = (−η1 η1 − η2 )−1 η1 , 2 (−η1 η1 − η2 )−17/27Case of the Wishart distributionSee details in the paper.8/27Finite (parametric) mixture modelsNow, π will be approximated by a ﬁnite (parametric) mixturef (·; θ) indexed by θ:Kπ(x) ≈ f (x; θ) =Kwj fj (x; θj ),0 ≤ wj ≤ 1,j=1wj = 1j=10.050.100.150.20Unknown true distribution f*Mixture distribution fComponents density functions f_j0.000.1 * dnorm(x) + 0.6 * dnorm(x, 4, 2) + 0.3 * dnorm(x, −2, 0.5)0.25where wj are the mixing proportions, fj are the componentdistributions.When all fj ’s are EFs, it is called a Mixture of EFs (MEF).−505x109/27Incompleteness in mixture modelsincompleteobservableχ = {x1 , . . . , xN }deterministic←completeunobservableχc = {y1 = (x1 , z1 ), . . . , yN }Zi ∼ catK (w )Xi Zi = j ∼ fj (·; θj )For a MEF, the joint density p(x, z; θ) is an EF:K[z = j]{log(wj ) + θj , sj (x) + kj (x) − Fj (θj )}log p(x, z; θ) =j=1K=j=1[z = j]log wj − Fj (θj ),[z = j] sj (x)θj+ k(x, z)10/27ExpectationMaximization (EM) [1]ˆThe EM algorithm maximizes iteratively Q(θ; θ(t) , χ).Algorithm 2: EM algorithmˆInput: θ(0) initial parameters of the modelInput: χ(N) = {x1 , . . . , xN }ˆ ∗Output: A (local) maximizer θ(t ) of log f (χ; θ)t ← 0;repeatˆCompute Q(θ; θ(t) , χ) := Eθ(t) [log p(χc ; θ)χ] ;ˆˆ(t+1) = argmaxθ Q(θ; θ(t) , χ) ;ˆChoose θ// EStep// MStept ← t +1;until Convergence of the complete loglikehood;11/27EM for MEFFor a mixture, the EStep is always explicit:(t)ˆ(t)wj f (xi ; θj )ˆ(t)(t)ˆ(t)zi,j = wj f (xi ; θj )/ˆˆjFor a MEF, the MStep then reduces to:K(t)ˆi zi,j(t)ˆi zi,j sj (xi )ˆθ(t+1) = argmax{wj ,θj } j=1,log wj − Fj (θj )θjN(t+1)wjˆ(t)=zi,j /Nˆi=1(t+1)ηjˆ=ˆ(t+1) ) =F ( θj(t)ˆi zi,j sj (xi )(t)ˆi zi,j(weighted average of SS)12/27kMaximum Likelihood Estimator (kMLE) [2]The kMLE introduces a geometric split χ =accelerate EM :(t)(t)Kˆj=1 χjto(t)ˆzi,j = [argmax wj f (xi ; θj ) = j]˜jEquivalently, it amounts to maximize Q over partition Z [3]For a MEF, the MStep of the kMLE then reduces to:(t)Kχj ˆlog wj − Fj (θj ),(t) sj (xi )θjx ∈χˆˆθ(t+1) = argmax{wj ,θj } j=1ij(t)(t+1)wjˆ(t)= χj /Nˆ(t+1)ηjˆ=ˆ(t+1) ) =F (θjxi ∈χjˆsj (xi )(t)χj ˆ(clusterwise unweighted average of SS)13/27Online learning of mixturesConsider now the online settingx1 , x2 , . . . , xN , . . .ˆDenote θ(N) or η (N) the parameter estimate after dealing NˆobservationsˆDenote θ(0) or η (0) their initial valuesˆRemark: For a ﬁxedsize dataset χ, one may apply multiplepasses (with shuﬄe) on χ.The increase in the likelihood function is no more guaranteedafter an iteration.14/27Stochastic approximations of EM(1)Two main approaches to online EMlike estimation:Stochastic MStep : Recursive EM (1984) [5]ˆˆˆθ(N) = θ(N−1) + {NIc (θ(N−1) }−1θˆlog f (xN ; θ(N−1) )where Ic is the Fisher Information matrix for the completedata:log p(x, z; θ)ˆIc (θ(N−1) ) = −Eθ(N−1)ˆ∂θ∂θTjA justiﬁcation for this formula comes from the Fisher’sIdentity:log f (x; θ) = Eθ [log p(x, z; θ)x]One can recognize a second order Stochastic Gradient Ascentwhich requires to update and invert Ic after each iteration.15/27Stochastic approximations of EM(2)Stochastic EStep : Online EM (2009) [7]ˆˆˆQ(N) (θ) = Q(N−1) (θ)+α(N) Eθ(N−1) [log p(xN , zN ; θ)xN ] − Q(N−1) (θ)ˆIn case of a MEF, the algorithm works only with the cond.expectation of the suﬃcient statistics for complete data.zN,j = Eθ(N−1) [zN,j xN ]ˆˆ(N)Swjˆ(N)Sθj=ˆ(N−1)Swjˆ(N−1)Sθj+αzN,jˆ−zN,j sj (xN )ˆ(N)ˆ(N−1)Swjˆ(N−1)SθjThe MStep is unchanged:(N)wjˆ(N)(N)ˆ= ηwj = Swjˆ(N)ˆ(N)ˆ(N) ˆ(N)θj = ( Fj )−1 (ˆθj = Sθj /Swj )η16/27Stochastic approximations of EM(3)Some properties:ˆInitial values S (0) may be used for introducing a ”prior”:(0)ˆ(0)ˆ(0)Swj = wj , Sθj = wj ηjParameters constraints are automatically respectedNo matrix to invert !Policy for α(N) has to be chosen (see [7])Consistent, asymptotically equivalent to the recursive EM !!17/27Stochastic approximations of kMLE(1)In order to keep previous advantages of online EM for an onlinekMLE, our only choice concerns the way to aﬀect xN to a cluster.Strategy 1 Maximize the likelihood of the complete data(xN , zN )(N−1)ˆzN,j = [argmax wj˜jˆ(N−1) ) = j]f (xN ; θjEquivalent to Online CEM and similar to MacQueeniterative kMeans.18/27Stochastic approximations of kMLE(2)Strategy 2 Maximize the likelihood of the complete data(xN , zN ) after the MStep:(N)zN,j = [argmax wj˜ˆjˆ(N)f (xN ; θj ) = j]Similar to Hartigan’s method for kmeans.Additional cost: precompute all possibleMSteps for the Stochastic E Step.19/27Stochastic approximations of kMLE(3)Strategy 3 Draw zN,j from the categorical distribution˜(N−1)zN sampled from CatK ({pj = log(wj˜ˆˆ(N−1) ))}j )fj (xN ; θjSimilar to sampling in Stochastic EM [3]The motivation is to try to break theinconsistency of kMLE.For strategies 1 and 3, the MStep reduces the update of theparameters for a single component.20/27Experiments2True distribution π = 0.5N (0, 1) + 0.5N (µ2 , σ2 )Diﬀerent values for µ2 , σ2 for more or less overlap betweencomponents.A small subset of observations has be taken for initialization(kMLE++ / kMLE).Video illustrating the inconsistency of online kMLE.21/27Experiments on Wishart22/27Conclusions  Future worksOn consistency:EM, Online EM are consistentkMLE, online kMLE (Strategies 1,2) are inconsistent(due to the Bayes error in maximizing the classiﬁcationlikelihood)Online stochastic kMLE (Strategy 3) : consistency ?So, when components overlap, online EM > kMLE > onlinekMLE for parameter learning.Need to study how the dimension inﬂuences theinconstancy/convergence rate for online kMLE.Convergence rate is lower for online methods (sublinearconvergence of the SGD)Time for an update vs sample size:online kMLE (1,3) < online EM < online kMLE (2) << kMLE23/27online EM appears to be the best compromise !!24/27References IDempster, A.P., Laird, N.M. and Rubin, D.B.:Maximum likelihood from incomplete data via the EMalgorithm.Journal of the Royal Statistical Society. Series B(Methodological), pp. 1–38, 1977.Nielsen, F.:On learning statistical mixtures maximizing the completelikelihoodBayesian Inference and Maximum Entropy Methods in Scienceand Engineering (MaxEnt 2014), AIP Conference ProceedingsPublishing, 1641, pp. 238245, 214.Celeux, G. and Govaert, G.:A classiﬁcation EM algorithm for clustering and two stochasticversions.Computational Statistics and Data Analysis, 14(3), pp.315332, 1992.25/27References IISam´, A., Ambroise, C., Govaert, G.:eAn online classiﬁcation EM algorithm based on the mixturemodelStatistics and Computing, 17(3), pp. 209–218, 2007.Titterington, D. M. :Recursive Parameter Estimation Using Incomplete Data.Journal of the Royal Statistical Society. Series B(Methodological), Volume 46, Number 2, pp. 257–267, 1984.Amari, S. I. :Natural gradient works eﬃciently in learning.Neural Computation, Volume 10, Number 2, pp. 251?276,1998.Capp´, O., Moulines, E.:eOnline expectationmaximization algorithm for latent datamodels.Journal of the Royal Statistical Society. Series B(Methodological), 71(3):593613, 2009.26/27References IIINeal, R. M., Hinton, G. E.:A view of the EM algorithm that justiﬁes incremental, sparse,and other variants.In Jordan, M. I., editor, Learning in graphical models, pages355368. MIT Press, Cambridge, 1999.Bottou, L´on :eOnline Algorithms and Stochastic Approximations.Online Learning and Neural Networks, Saad, DavidEds.,Cambridge University Press, 1998.27/27
We discuss the optimization of the stochastic relaxation of a realvalued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the secondorder geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize secondorder optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of secondorder methods, such as the Newton method.

GSI2015 2nd conference on Geometric Science of Information2830 Oct 2015 Ecole Polytechnique ParisSaclaySecondorder Optimization over theMultivariate Gaussian DistributionLuigi Malag`o1 Shinshu2 de12Giovanni PistoneUniversity JP & INRIA Saclay FRCastro Statistics, Collegio Carlo Alberto, Moncalieri ITIntroduction• This is is the presentation by Giovanni of the paper with the sametitle in the Proceedings.• Unfortunately, Giovanni is the least qualiﬁed of the two authors topresent this speciﬁc application of Information Geometry, hisspeciﬁc ﬁeld of expertise being nonparametric InformationGeometry and its applications in Probability and Statistical Physics.Luigi is currently working in Japan and could not make it.• Among the two of us, Luigi is the responsible for the idea of usinggradient methods and later, Newton methods, in black boxoptimization. Our collaboration started with the preparation of theFOGA 2011 paper•L. Malag`, M. Matteucci, and G. Pistone. Towards the geometry of estimation of distribution algorithmsobased on the exponential family.In Proceedings of the 11th workshop on Foundations of genetic algorithms, FOGA ’11, pages 230–242,New York, NY, USA, 2011. ACMSummary1. Geometry of the Exponential Family2. SecondOrder Optimization: The Newton Method3. Applications to the Gaussian Distribution4. Discussion and Future Work• A short introduction for Taylor formulæ on Gaussian exponentialfamilies is provided. The binary case has been previously discussed in•L. Malag` and G. Pistone. Combinatorial optimization with information geometry: Newton method.oEntropy, 16:4260–4289, 2014.• Riemannian Newton methods are discussed in a Session of thisConference cf,•P.A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds.Princeton University Press, Princeton, NJ, 2008.With a foreword by Paul Van Dooren• The focus of this short presenation is on a speciﬁc framework forInformation Geometry we call statistical bundle.Hilbert vs Tangent vs Statistical Bundle••S. Amari. Dual connections on the Hilbert bundles of statistical models.In Geometrization of statistical theory (Lancaster, 1987), pages 123–151, Lancaster, 1987. ULDM PublR. E. Kass and P. W. Vos. Geometrical foundations of asymptotic inference.Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York,1997.Statistical Bundle: Gaussian case• Hα (x), x ∈ Rm , are Hermite polynomials of order 1 and 2.2• E.g, m = 3, H010 (x) = x2 , H011 (x) = x2 x3 , H020 (x) = x2 − 1.• The Gaussian model with suﬃcient statisticsB = {X1 , . . . , Xn } ⊂ {Hα α = 1, 2}, isnN = p(x; θ) = exp θj Xj − ψ(θ)j=1• The ﬁbers are Vp = Span (Xj − Ep [Xj ]j = 1, . . . , n)• The statistical bundle isSN = {(p, U)p ∈ N , U ∈ Vp }• Each U ∈ Vp , p ∈ N , is a polynomial of degree up to 2 andt → Eq etU is ﬁnite around 0, q ∈ N• Every polynomial X belongs to ∩q∈N L2 (q)Parallel transportsDeﬁnition• etransport:eUq : VppU → U − Eq [U] ∈ Vq .• mtransport: for each U ∈ Vp and V ∈ VqU, m Up Vqp=eUq U, VpqProperties• e Ur e Uq = e Urqpp••mU r m Uq = m Urqppe• IfUq U, m Uq VppqpV2q= U, V∈ L (p), thenmpUp Vqis its orthogonal projection onto Vp .Parallel transports in coordinates IWe deﬁne on the statistical bundle SN a system of moving frames.1. The exponential frame of the ﬁber Sp N = Vp is the vector basisBp = {Xj − Ep [Xj ]j = 1, . . . , n}2. Each element U ∈ Vp is uniquely written asnαj (U)(Xj − Ep [Xj ]) = α(U)T (X − Ep [X ])U=j=13. The expression in the exponential frame of the scalar product is theFisher information matrix:eIij (p) = Xi − Ep [Xi ] , Xj − Ep [Xj ]p= Covp (Xi , Xj ) =4.eU → α(U) = I (p)−1 Covp (X , U)∂2θj ψ(θ)∂θi 2Parallel transports in coordinates II5. The mixture frame of the ﬁber Sp N = Vp isneI (p)−1Bp =e ijI (p)(Xi − Ep [Xi ]) j = 1, . . . , ni=16. Each element V ∈ Vp is uniquely written asnV
We prove the equivalence of two online learning algorithms, mirror descent and natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a nonEuclidean manifold. Natural gradient descent selects the steepest descent direction along a Riemannian manifold by multiplying the standard gradient by the inverse of the metric tensor. Mirror descent induces nonEuclidean structure by solving iterative optimization problems using different proximity functions. In this paper, we prove that mirror descent induced by a Bregman divergence proximity functions is equivalent to the natural gradient descent algorithm on the Riemannian manifold in the dual coordinate system.We use techniques from convex analysis and connections between Riemannian manifolds, Bregman divergences and convexity to prove this result. This equivalence between natural gradient descent and mirror descent, implies that (1) mirror descent is the steepest descent direction along the Riemannian manifold corresponding to the choice of Bregman divergence and (2) mirror descent with loglikelihood loss applied to parameter estimation in exponential families asymptotically achieves the classical CramérRao lower bound.

Information geometry of mirror descentGeometric Science of InformationAnthea MonodDepartment of Statistical ScienceDuke University Information Initiative at DukeG. Raskutti (UW Madison) and S. Mukherjee (Duke)29 Oct 2015Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20151 / 18Optimization of largescale problemsOptimization of a function f (θ) where θ ∈ Rp .√O( p)  convergence rate of standard subgradient descent. A problem inmodern optimization, e.g. machine learning.Mirror descent [A Nemirovski, 1979. A Beck & M Teboulle, 2003]:O(log p)  convergence rate of mirror descent. Widely used tool inoptimization and machine learning.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20152 / 18Diﬀerential geometry in statistics(1) Cram´rRao lower bound (Rao 1945)  Lower bound on the varianceeof an estimator is a function of curvature. Sometimes calledCram´rRaoFr´chetDarmois lower bound.ee(2) Invariant (noninformative) priors (Jeﬀreys 1946)  An uniformativeprior distribution for a parameter space is based on a diﬀerential form.(3) Information geometry (Amari 1985)  Diﬀerential geometry ofprobability distributions.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20153 / 18Stochastic gradient descentGiven a convex diﬀerentiable cost function, f : Θ → R.Generate a sequence of parameters {θt }∞ which incur a loss f (θt ) thatt=1minimize regret at a time T , T f (θt ).t=1One solutionθt+1 = θt − αt f (θt ),where (αt )∞ denotes a sequence of stepsizes.t=0Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20154 / 18Natural gradientFor certain cost functions (loglikelihoods of exponential family models)the set of parameters Θ are supported on a pdimensional Riemannianmanifold, (M, H).Typically the metric tensor H = (hjk ) is determined by the Fisherinformation matrix(I(θ))ij = EDataAnthea Monod (Duke)∂f (x; θ)∂θi∂f (x; θ)∂θjInformation geometry of mirror descentθ,i, j = 1, . . . , p.29 Oct 20155 / 18Natural gradientGiven a cost function f on the Riemannian manifold f : M → R, thenatural gradient descent step is:θt+1 = θt − αt H−1 (θt ) f (θt ),where H−1 is the inverse of the Riemannian metric.The natural gradient algorithm steps in the direction of steepest descentalong the Riemannian manifold (M, H). It requires a matrix inversion.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20156 / 18Mirror descentGradient descent can be writtenθt+1 = arg minθ∈Θθ, f (θt ) +1θ − θt2αt22.For a (strictly) convex proximity function Ψ : Rp × Rp → R+ mirrordescent is1θt+1 = arg min θ, f (θt ) + Ψ(θ, θt ) .θ∈ΘαtAnthea Monod (Duke)Information geometry of mirror descent29 Oct 20157 / 18Bregman divergenceLet G : Θ → R be a strictly convex twicediﬀerentiable function theBregman divergence isBG (θ, θ ) = G (θ) − G (θ ) −Anthea Monod (Duke)G (θ ), θ − θ .Information geometry of mirror descent29 Oct 20158 / 18Bregman divergences for exponential familyFamilyN (θ, Ip×p )Poi(e θ )Be11+e −θAnthea Monod (Duke)G (θ)122 θ 2exp(θ)BG (θ, θ )122 θ−θ 2exp (θ/θ ) − exp(θ ), θ − θlog(1 + exp(θ))log1+e θ1+e θInformation geometry of mirror descent−eθ1+e θ,θ − θ29 Oct 20159 / 18Mirror descentMirror descent using the Bregman divergence as the proximity functionθt+1 = arg minθAnthea Monod (Duke)θ, f (θt ) +1BG (θ, θt ) .αtInformation geometry of mirror descent29 Oct 201510 / 18Convex dualsThe convex conjugate function for a function G is deﬁned to be:H(µ) := sup { θ, µ − G (θ)} .θ∈ΘLet µ = g (θ) ∈ Φ be the extremal point of the dual. The dual Bregnmandivergence BH : Φ × Φ → R+ isBH (µ, µ ) = H(µ) − H(µ ) −Anthea Monod (Duke)H(µ ), µ − µ .Information geometry of mirror descent29 Oct 201511 / 18Dual Bregman divergences for exponential familyG (θ)122 θ 2exp(θ)H(µ)122 µ 2µ, log µ − µBH (µ, µ )122 µ−µ 2µµ log µlog(1 + exp(θ))η log µ(1 − µ) log+(1 − µ) log(1 − µ)µ+µ log µAnthea Monod (Duke)Information geometry of mirror descent1−µ1−µ29 Oct 201512 / 18Manifolds in primal and dual coordinatesBG (·, ·) induces a Riemannian manifold (Θ,coordinates.2G )in the primalΦ be the image of Θ under the continuous map g = G .BH : Φ × Φ → R+ induces the same Riemannian manifold (Φ,dual coordinates Φ.Anthea Monod (Duke)Information geometry of mirror descent2 H)29 Oct 2015under13 / 18EquivalenceTheorem (Raskutti, Mukherjee)The mirror descent step with Bregman divergence deﬁned by G applied tofunction f in the space Θ is equivalent to the natural gradient step alongRiemannian manifold (Φ, 2 H) in dual coordinates.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 201514 / 18ConsequencesExponential family with density: p(y  θ) = h(y ) exp( θ, y − G (θ)).Consider the following mirror descent step given ytθt+1 = arg minθθ,θ BG (θ, h(yt ))θ=θt+1BG (θ, θt ) .αtIn dual coordinates one would minimizeft (µ; yt ) = − log p(yt  µ) = BH (yt , µ).The natural gradient step isµt+1 = µt − αt [2H(µt )]−1 BH (yt , µt ),= µt+1 = µt − αt (µt − yt ),the curvature of the loss BH (yt , µt ) matches the metric tensorAnthea Monod (Duke)Information geometry of mirror descent2 H(µ).29 Oct 201515 / 18Statistical eﬃciencyGiven independent samples YT = (y1 , ..., yT ) and a sequence of unbiasedestimators µT is Fisher eﬃcient iflim EYT [(µT − µ)(µT − µ)T ] →T →∞where2H1T2H,is the inverse of the Fisher information matrix.Theorem (Raskutti, Mukherjee)The mirror descent step applied to the log loss (??) with stepsizes αt =asymptotically achieves the Cram´rRao lower bound.eAnthea Monod (Duke)Information geometry of mirror descent29 Oct 20151t16 / 18Challenges(1) Information geometry on mixture of manifolds.(2) Proximity functions for functions over the Grassmannian.(3) EM algorithms for mixtures.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 201517 / 18AcknowledgementsFunding:Center for Systems Biology at DukeNSF DMS and CCFDARPAAFOSRNIHAnthea Monod (Duke)Information geometry of mirror descent29 Oct 201518 / 18
Geometry of Time Series and Linear Dynamical systems (chaired by Bijan Afsari, Arshia Cont)
We present in this paper a novel nonparametric approach useful for clustering independent identically distributed stochastic processes. We introduce a preprocessing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic nonparametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com .

IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionClustering Random Walk Time SeriesGSI 2015  Geometric Science of InformationGautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat29 October 2015Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionContext (data from www.datagrapple.com)Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat is a clustering program?DeﬁnitionClustering is the task of grouping a set of objects in such a waythat objects in the same group (cluster) are more similar to eachother than those in diﬀerent groups.Example of a clustering programWe aim at ﬁnding k groups by positioning k group centers{c1 , . . . , ck } such that data points {x1 , . . . , xn } minimizeminc1 ,...,ckni=1mink d(xi , cj )2j=1But, what is the distance d between two random walk time series?Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat are clusters of Random Walk Time Series?French banks and building materialsCDS over 20062015Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat are clusters of Random Walk Time Series?French banks and building materialsCDS over 20062015Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeometry of RW TS ≡ Geometry of Random Variablesi.i.d. observations:X1 :X2 :XN :12TX1 , X1 , . . . , X11,2,TX2X2. . . , X2..., ..., ..., ..., ...12TXN , XN , . . . , XNWhich distances d(Xi , Xj ) between dependent random variables?Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionPitfalls of a basic distance2Let (X , Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σX ),2 ) and whose correlation is ρ(X , Y ) ∈ [−1, 1].Y ∼ N (µY , σYE[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X , Y ))Now, consider the following values for correlation:22ρ(X , Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σX + σY .Assume µX = µY and σX = σY . For σX = σY1, weobtain E[(X − Y )2 ]1 instead of the distance 0, expectedfrom comparing two equal Gaussians.ρ(X , Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 .Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionPitfalls of a basic distance22Let (X , Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σX ), Y ∼ N (µY , σY ) and whose correlation isρ(X , Y ) ∈ [−1, 1].222E[(X − Y ) ] = (µX − µY ) + (σX − σY ) + 2σX σY (1 − ρ(X , Y ))Now, consider the following values for correlation:22ρ(X , Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σX + σY . Assume µX = µY and σX = σY . ForσX = σ Y1, we obtain E[(X − Y )2 ]1 instead of the distance 0, expected from comparing twoequal Gaussians.ρ(X , Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 .0.400.350.300.250.200.150.100.050.003020100102030Probability density functions of Gaussians N (−5, 1) and N (5, 1), Gaussians N (−5, 3) and N (5, 3), andGaussians N (−5, 10) and N (5, 10).Green, red and blue Gaussians areequidistant using L2 geometry on theparameter space (µ, σ).Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionSklar’s TheoremTheorem (Sklar’s Theorem (1959))For any random vector X = (X1 , . . . , XN ) having continuousmarginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P isuniquely expressed asP(X1 , . . . , XN ) = C (P1 (X1 ), . . . , PN (XN )),where C , the multivariate distribution of uniform marginals, isknown as the copula of X .Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionSklar’s TheoremTheorem (Sklar’s Theorem (1959))For any random vector X = (X1 , . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulativedistribution P is uniquely expressed as P(X1 , . . . , XN ) = C (P1 (X1 ), . . . , PN (XN )), where C , the multivariatedistribution of uniform marginals, is known as the copula of X .Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Copula TransformDeﬁnition (The Copula Transform)Let X = (X1 , . . . , XN ) be a random vector with continuousmarginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.The random vectorU = (U1 , . . . , UN ) := P(X ) = (P1 (X1 ), . . . , PN (XN ))is known as the copula transform.Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probabilityintegral transform): for Pi the cdf of Xi , we havex = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thusPi (Xi ) ∼ U[0, 1].Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Copula TransformDeﬁnition (The Copula Transform)Let X = (X1 , . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi ,1 ≤ i ≤ N. The random vector U = (U1 , . . . , UN ) := P(X ) = (P1 (X1 ), . . . , PN (XN )) is known as the copulatransform.ρ ≈ 0.842Y ∼ ln(X)00.8PY ( Y)2468100.2ρ =11.21.00.60.40.20.00.00.20.40.60.8X ∼U[0,1]1.01.20.20.20.00.20.40.6PX ( X )0.81.0The Copula Transform invariance to strictly increasing transformationGautier Marti, Frank NielsenClustering Random Walk Time Series1.2IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionDeheuvels’ Empirical Copula TransformttLet (X1 , . . . , XN ), 1 ≤ t ≤ T , be T observations from a random vector (X1 , . . . , XN ) with continuous margins.ttttSince one cannot directly obtain the corresponding copula observations (U1 , . . . , UN ) = (P1 (X1 ), . . . , PN (XN )),where t = 1, . . . , T , without knowing a priori (P1 , . . . , PN ), one can insteadDeﬁnition (The Empirical Copula Transform)T1estimate the N empirical margins PiT (x) = T t=1 1(Xit ≤ x),1 ≤ i ≤ N, to obtain the T empirical observationsTtTt˜t˜t(U1 , . . . , UN ) = (P1 (X1 ), . . . , PN (XN )).˜Equivalently, since Uit = Rit /T , Rit being the rank of observationXit , the empirical copula transform can be considered as thenormalized rank transform.In practicex_transform = rankdata(x)/len(x)Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeneric NonParametric Distance2dθ (Xi , Xj ) = θ3E Pi (Xi ) − Pj (Xj )21+ (1 − θ)2RdPi−dλdPjdλ2dλ(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,(iii) dθ is invariant under diﬀeomorphismGautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeneric NonParametric Distance2d0 :12RdPidλ−dPjdλ2dλ = Hellinger22d1 : 3E Pi (Xi ) − Pj (Xj )2 =1 − ρS= 2−62Remark:If f (x, θ) = cΦ (u1 , . . . , uN ; Σ)Ni=1 fi (xi ; νi )11C (u, v )dudv00thenN2ds 2 = dsGaussCopula +2dsmarginsi=1Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Hierarchical Block ModelA model of nested partitionsThe nested partitions deﬁned by themodel can be seen on the distancematrix for a proper distance and theright permutation of the data pointsGautier Marti, Frank NielsenIn practice, one observe and workwith the above distance matrixwhich is identitical to the left oneup to a permutation of the dataClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionResults: Data from Hierarchical Block ModelAlgo.Distance(1 − ρ)/2HCALE[(X − Y )2 ]GPR θ = 0GPR θ = 1GPR θ = .5GNPR θ = 0GNPR θ = 1GNPR θ = .5(1 − ρ)/2APE[(X − Y )2 ]GPR θ = 0GPR θ = 1GPR θ = .5GNPR θ = 0GNPR θ = 1GNPR θ = .5DistribAdjusted Rand IndexCorrelCorrel+Distrib0.00 ±0.010.99 ±0.010.56 ±0.010.00 ±0.000.09 ±0.120.55 ±0.050.34 ±0.010.01 ±0.010.06 ±0.020.00 ±0.010.99 ±0.010.56 ±0.010.34 ±0.010.59 ±0.120.57 ±0.0110.00 ±0.000.17 ±0.000.00 ±0.0010.57 ±0.000.99 ±0.010.25 ±0.200.95 ±0.080.00 ±0.000.99 ±0.070.48 ±0.020.14 ±0.030.94 ±0.020.59 ±0.000.25 ±0.080.01 ±0.010.05 ±0.020.00 ±0.010.99 ±0.010.48 ±0.020.06 ±0.000.80 ±0.100.52 ±0.0210.00 ±0.000.18 ±0.010.00 ±0.0110.59 ±0.000.39 ±0.020.39 ±0.111Gautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionResults: Application to Credit Default Swap Time SeriesDistance matricescomputed on CDStime series exhibit ahierarchical blockstructureMarti, Very, Donnat,Nielsen IEEE ICMLA 2015(un)Stability ofclusters with L2distanceGautier Marti, Frank NielsenStability of clusterswith the proposeddistanceClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionConsistencyDeﬁnition (Consistency of a clustering algorithm)A clustering algorithm A is consistent with respect to the HierarchicalBlock Model deﬁning a set of nested partitions P if the probability thatthe algorithm A recovers all the partitions in P converges to 1 whenT → ∞.Deﬁnition (Spaceconserving algorithm)A spaceconserving algorithm does not distort the space, i.e. the distanceDij between two clusters Ci and Cj is such thatDij ∈minx∈Ci ,y ∈Cjd(x, y ),Gautier Marti, Frank Nielsenmaxx∈Ci ,y ∈Cjd(x, y ) .Clustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionConsistencyTheorem (Consistency of spaceconserving algorithms (Andler,Marti, Nielsen, Donnat, 2015))Spaceconserving algorithms (e.g., Single, Average, CompleteLinkage) are consistent with respect to the Hierarchical BlockModel.T = 100T = 1000Gautier Marti, Frank NielsenT = 10000Clustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time SeriesIntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionDiscussion and questions?Avenue for research:distances on (copula,margins)clustering using multivariate dependence informationclustering using multiwise dependence informationOptimal Copula Transport for Clustering Multivariate Time Series,Marti, Nielsen, Donnat, 2015Gautier Marti, Frank NielsenClustering Random Walk Time Series
Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith informationtheoretic linksAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMSOperational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsthe announced talkLimit on accelerating consensus algorithmswith informationtheoretic linksseems cool … in press at IEEE Trans. Automatic ControlAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMSOperational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith informationtheoretic linksAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMSOperational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithms“ Symmetrization “L.Mazzarella, F.Ticozzi, A.S.arXiv:1311.3364andarXiv:1303.4077Classical consensus algorithm!x3 x1x... x2 x4xN x...Consensus: reaching agreement x1 = x2 = ... = xN is the basis for many distributed computing tasks Very flexible and robust convergence: as long as the network integrated over some finite T forms aconnected graph and α(t) ∈ [αm, αM] ⊂ (0,1) Convergence proof idea: shrinking convex hull!xk(t+1) α(t) ( xj(t) – xk(t) ) + xk(t) !! highest value can only decrease, lowest can only increaseOur initial goal: Bringing consensus into quantum regimeDefining consensus in tensor product space?How define consensus w.r.t. correlations, entanglement,...!How to write a consensus algorithm?Standard consensus: system states xk are directly accessible for computation, can be linearly combined, copied, communicated...Quantum consensus: the whole quantum state / proba distribution cannot be measured ➯ We must physically exchange “things” Consensus viewed as partial swappingPairwise consensus interaction between agents ( j, k ):Consensus viewed as partial swappingPairwise consensus interaction between agents ( j, k ):stay in placeswap j with kSuch mixture of two unitary operations: stay in place and swap can be easily implemented physically in quantum systemsor, for that matter, in other information structuresConsensus operation as discrete group actionLinear action a(g,x) of G on X!!(finite) groupvector space with objects “of interest” Target : symmetrization− xreach a state − ∈ X where a(g,x) = − for all g ∈ Gx Property: the projection on the symmetrization set can be writtenConsensus operation as discrete group actionLinear action a(g,x) of G on X!!(finite) groupvector space with objects “of interest” Dynamics :− with the−defining a convex combination over G at each tUsually sg(t) ≠0 only for g belonging to a very restricted subset of GLift from actions to group …* The state x(t) at any time can be written as a convex combination!!with p independent of x(0) * The dynamics can then be lifted to the vector p(t) and written as!!!… yields consensus on group weights!! starting point pg(0) = δ(g,e)target −g = 1/G for all gpPossibly large number of nodes, e.g. G = N! for permutation groupThe exact values of sh(t), and even the selected interactions ateach time step, need not be exactly controlled ➯ strong robustness– holds if:Convergence to p! !!!!!Proof:* possible by analogy with classical consensus* alternative: use entropy of p(t) as strict Lyapunov functionVarious applicationsG = Permutations leads to random consensus by acting onclassical state values (standard consensus)classical or quantum probability distributionsG = cyclic group leads to random Fourier transform (use?)G = decoupling group links to quantum Dynamical DecouplingG = operational gates gives uniform random gate generationConsensus with antagonistic interactionsConsensus towards leader valueGradient descent and coordinate descentthe announced talkConsensus with antagonistic interactionsG = permutation matrices with arbitrary sign ±1 on each entryWeights sg: Birkhoff decomposition on ajk as for standard consensusThen swap weights to nonpositive permutation if ajk <0Nontrivial weight assignment & convergence resultSolves previously not covered cases to distinguish {xk}=0 or {xk} = {xj}Consensus towards leader valuealso other algorithms with (ajk) substochasticG = permutation matrices with arbitrary sign ±1 on each entryNontrivial weight assignment (iterative procedure, see paper)Operator conclusions about which components of x converge to zero (slightly more general than standard convergence to x=0)Gradient & Coordinate descentSearch for min of f(x) by computingAssume (sorry) f(x)= xT A xIn the eigenbasis of A this becomes a (if stable) substochastic iteration. Not a big insight… extension:kcycle through coordinates kGradient & Coordinate descentSearch for min of f(x) by computingAssume (sorry) f(x)= xT A xIn the eigenbasis of A this becomes a (if stable) substochastic iteration. Not a big insight… extension:kcycle through coordinates kG = permutation matrices with arbitrary sign ±1 on each entryWeights * follow from reflection matrices around nonorthogonal directions * sum to 1 but may be negative➱ Study coordinate descent convergence via symmetric but possibly negative transition matrix: ∃ clear tools e.g. in consensusOperational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith informationtheoretic linksarXiv:1412.0402Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMSAdd one memory, no more+kkmemoryProperly using one memory x(t1)x(t) allows to converge [Muthukrishnan et al, 98]quadratically fasterWhat about more memories?Add one memory, no more+kkmemoryProperly using one memory x(t1)x(t) allows to converge exponentially [Muthukrishnan et al, 98]quadratically fasterWhat about more memories?Our result: if graph eigenvalues can be any in [a,b] with a,b knownthen more memories do not improve worst consensus eigenvalueproof: not very informationtheoretic, see arXiv:1412.0402Interesting linksOptimization:!Nesterov method not further improvable by m(t2),… ? Robust control: design plant=to be stable under feedback u = k y , k in interval!Communication theory:networkInteresting linksOptimization:!Nesterov method not further improvable by m(t2),… ? Robust control: design plant=to be stable under feedback u = k y , k in interval!Communication theory:networkimproves by taking direct feedback to itself into accountInteresting linksOptimization:!Nesterov method not further improvable by m(t2),… ? Robust control: design plant=to be stable under feedback u = k y , k in interval!Communication theory:networkif network poorly known, no benefit to account for longer loops
Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodnessoffittesting in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding informationgeometric 3D computergraphical selection procedure. Some samplesize asymptotics is given as well.

New model search fornonlinear recursive models,regressions and autoregressionsWolfgang Stummer and AnnaLena KißlingerFAU University of ErlangenNürnbergTalk at GSI 2015, Palaiseau, 29/10/2015OutlineOutline• introduce a new method for model search (model preselection,structure detection) in data streams/clouds:key technical tool: densitybased probability distances/divergences with “scaling”• gives much ﬂexibility for interdisciplinary situationbased applications(also with cost functions, utility, etc.)• goalspeciﬁc handling of outliers and inliers (dampening, ampliﬁcation)not directly covered today• give new general parameterfree asymptotic distributions for involveddataderived distances/divergences• outline a corresponding informationgeometric 3D computergraphical selectionprocedure29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20153WHY distances between (non)probability measures (1)• “distances” D (P , Q ) between two (non)probability measures P, Qplay a prominent role in modern statistical inferences:• parameter estimation,• testing for goodnessofﬁt resp. homogenity resp. independence,• clustering,• changepoint detection,• Bayesian decision proceduresas well as for other research ﬁelds such as• information theory,• signal processing including image and speech processing,• pattern recognition,• feature extraction,• machine learning,• econometrics, and• statistical physics.29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20154WHY distances between (non)probability measures (2)• suppose we want to describe the proximity/distance/closeness/similarity D (P , Q )of two (non)probability distributions P and Q22e.g. P = N (µ1 , σ1 ), Q = N (µ2 , σ2 )• either two “theoretical” distributions• or two (empirical) distributions representing data(e.g. derived from frequencies,histograms, . . . )• or one of each −→ today• P, Q may live on Rd , or on “spaces of functions with appropriate properties”:e.g. potential future scenarios of a time series, or a cont.time stochastic processe.g. functional data• exemplary statistical uses of distances D (P , Q ) −→29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20155WHY distances between probability measures (3)Applic. 1: plane = all probability distributions (on R, Rd , a path space, . . . )we have a “distance” on this, say D (P , Q )orige.g. P := PNemp:= PN :=1N·Ni =1 δXi [·]X1, . . . , XN of size N from Qθtrue ;. . . empirical distribution of an iid sampleputs equal “weight”θ = minimum distance estimator1Non each data point.emp(e.g. θ = MLE for D (PN , Qθ ) = KullbackLeib.)emphowever, D (PN , Qθ ) may still be large −→ “bad goodness of ﬁt” −→ test29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20156Time Series and Nonlinear Regressions (1)in time series, the data (describing random var.) . . . , X1, X2, . . . are noniid:e.g. autoregressive model AR(2) of order 2:Xm+1 − ψ1 · Xm − ψ2 · Xm−1 = εm+1,m ≥ k,where (εm+1)m≥k is a family of independent and identically distributed (i.i.d.)random variables on some space Y having parametric distribution Qθ (θ ∈ Θ).compact notation: take the parameter vector £ := (2, ψ1, ψ2),the backshift operator B deﬁned by B Xm := Xm−1,the identity operator 1 given by 1Xm := Xmthe 2−polynomial ψ1 · B + ψ2 · B 2,−→ lefthand side becomes F£ Xm+1, Xm , Xm−1, . . . , Xk = 1 −2jj =1 ψj BXm+1−→ as dataderived distribution we take the empirical distribution of lefthand sideorigPN ,£ [ · ]1:= P [ · ; Xk −1, . . . , Xk +N ; £] := ·NNδi =1[·]F£ Xk +i ,Xk +i −1 ,...,Xkwith histogramaccording probability mass function (relative frequencies)£pN (y ) =29/10/2015# i ∈ {1, . . . , N } : F£ Xk +i , . . . , Xk = yNWolfgang Stummer and AnnaLena KißlingerGSI 2015# i : Xk +i − γ1 · Xk +i −1 − γ2 · Xk +i −2 = y=N7Time Series and Nonlinear Regressions (2)−→ 2 issues: which time series models Xi and which distances D (·, ·)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20158Time Series and Nonlinear Regressions (3)more general: nonlinear autorecursions in the sense ofF£m+1 m+1, Xm+1, Xm , Xm−1, . . . , Xk , Zk −, am+1, am , am−1, . . . , ak = εm+1, m ≥ k ,• where (F£m+1 )m≥k is a sequence of nonlinear functions parametrized by £m+1 ∈ Γ,• (εm+1)m≥k are iid with parametric distribution Qθ (θ ∈ Θ),• (ak )m≥k are independent variables which are nonstochastic (deterministic) today,• the “backloginput” Zk − denotes the additional input on X and a before k to get therecursion started.today, we assume k = −∞, and EQθ [εm+1] = 0, andthat the initial data Xk as well as the backloginput Zk − are deterministic.Special case: Xm+1 = g f£m+1 (m+1, Xm , Xm−1, . . . , Xk , Zk −, am+1, am , am−1, . . . , ak ), εm+1for some appropriate functions f£m+1 and g, e.g. g (u , v ) := u + v , g (u , v ) := u · v−→ (εm+1)m≥k can be interpreted as “randomnessdriving innovations (noise)”29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 20159Time Series and Nonlinear Regressions (4)our general context covers in particular• NARX models = nonlinear autoregressive models with exogenous input:is the above special case with constant parameter vector £m+1 ≡ £ and additive g.Especially:• nonlinear regressions with deterministic independent variables:the only involved X is Xm+1• AR(r) = linear autoregressive models (time series) of order r ∈ N(recall the above example with r = 2)• ARIMA(r,d,0) = linear autoregressive integrated models (time series) of order r ∈ N0and d ∈ N0• SARIMA(r,d,0)(R,D,0)s = linear seasonal autoregressive integrated models (timeseries) of order d ∈ N0 of nonseasonal differencing, order r ∈ N0 of the nonseasonalARpart, length s ∈ N0 of a season, order D ∈ N0 of seasonal differencing and orderR ∈ N0 of the seasonal ARpart.29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201510Divergences / similarity measures (1)• so far: motiviations for “WHY to measure theproximity/distance/closeness/similarity D (P , Q )”orighere: P = PN ,£ [ · ] (= empirical distribution of iid noises)Q = Qθ( = candidate for true distribution of iid noises)• now: “HOW to measure”, which “distance” D (P , Q ) to use ?• prominent examples for D (P , Q ):relative entropy (KullbackLeibler information discrimination) –> MDE = MLE !!,Hellinger distance, Pearson’s ChiSquare divergence, Csiszar’s f −divergences ...−→ all will be covered by our much more general context• DESIRE: to have a toolbox {Dφ,M (P , Q ) : φ ∈ Φ, M ∈ M} which is farreachingand ﬂexible (reﬂected bydifferent choices of the “generator” φ and the scaling measure M)should also cover robustness issues !!29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201511Divergences / similarity measures (2)• from now on: probability distributions P , Q on (X , A)nonprobability distribution/(σ−)ﬁnite measure M on (X , A)we assume that all three of them have densities w.r.t. a σ−ﬁnite measure λdPdQdMp (x ) =(x ), q (x ) =(x ) and m(x ) =(x ) for a. all x ∈ Xdλdλdλ(for today: mostly X ⊂ R)• furthermore we take a “divergence (distance) generating function”φ : (0, ∞) → R which (for today) is twice differentiable, strictly convexwithout loss of generality we also assume φ(1) = 0the limit φ(0) := limt ↓0 φ(t ) always exists (but may be ∞)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201512Scaled Bregman Divergences (1)Deﬁnition (Stu. 07, extended in Stu. & Vajda 2012 IEEE Trans. Inf. Th.)The Bregman divergence (distance) of probability distributions P , Q scaled bythe (σ−)ﬁnite measure M on (X , A) is deﬁned byBφ (P , Q  M ) :=m (x ) φXp (x )m (x )−φq (x )m (x )−φq (x )q (x )p (x )−·m(x )m(x ) m(x )d λ(x )• if X = {x1, x2, . . . xs } where s may be inﬁnite, and “λ is a counting measure”−→ p(·), q (·), m(·) are classical probability mass functions (“counting densities”):sBφ (P , Q  M ) =m(xi ) φi =1p(xi )m(xi )−φq (xi )m(xi )e.g. φ(t ) = (t − 1) −→ Bφ (P , Q  M ) =1N·Ni =1 δεi [·]q (xi )p(xi )q (xi )·−m(xi )m(xi ) m(xi )(p(xi )−q (xi ))2si =1m(xi )2empEx.: P := PN :=−φweighted Pearson χ2. . . empirical distribution of an iid sampleof size N from Qθtrue ; corresponding pmf = relative frequencyempp(x ) := pN (x ) :=1N· #{j ∈ {1, . . . , N } : εj = x };Q := Qθ where the “hypothetical candidate distribution” Qθ has pmf q (x ) := qθ (x )empempM := W (PN , Qθ ) with pmf m(x ) = w (pN (x ), qθ (x )) > 0 for some funct. w (·, ·)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201513discrete case with φ(t ) = φα (t ) and m(x ) = wβ (p(x ), q (x ))3D presentation; exemplary goal: ≈ 0 for allα, β103D presentation; exemplary goal: ≈ 0 for all α, β29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201514Bφ (P , Q  M ) with composite scalings M = W (P , Q ) (1)• from now on: M = W (P , Q ), i.e. m(x ) = w (p(x ), q (x )) for some function w (·, ·)• w (u , v ) = 1 −→ unscaled/classical Bregman distance (discr.: Pardo/Vajda 97,03)e.g. for generator φ1(t ) = t log t + 1 − t −→ KullbackLeibler divergence (MLE)e.g. for the power functions φα (t ) :=t α −1+α−α·tα(α−1) ,α = 0, 1,−→ density power divergences of Basu et al. 98, Basu et al. 2013/14/15• new example (Kißlinger/Stu. (2015c): scaling by weighted rthpower means:wβ,r (u , v ) := (β · u r + (1 − β) · v r )1/r , β ∈ [0, 1], r ∈ R\{0}• e.g. r = 1: arithmeticmeanscaling (mixture scaling)subcase β = 0: w0,1 (u , v ) = v −→ all Csiszar φ−divergences/disparitiesfor φ2 (t ) one gets Pearson’s chisquare divergencesubcase β = 1 and φ2 (t ) −→ Neyman’s chisquare divergencesubcase β ∈ [0, 1] and φ2 (t ) −→ blended weight chisquare divergence, Lindsay 94subcase β ∈ [0, 1] and φα (t ) −→ Stu./Vajda (2012), Kißlinger/Stu. (2013, 2015a)√√• e.g. r = 1/2: wβ,1/2(u , v ) = (β · u + (1 − β) · v )2subcase β ∈ [0, 1] and φ2 (t ) −→ blended weight Hellinger distance: Lindsay (1994),Basu/Lindsay (1994)• e.g. r → 0: geometricmean scaling wβ,0(u , v ) = u β · v 1−β29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 2015Kißlinger/Stu. (2015b)15Some scale connectors w (u , v ) (for any generator φ)(b). w0.45,1(u, v ) = 0.45 · u + 0.55 · v(a) w0,1 (u , v ) = v Csiszar diverg.(c) w0.45,0.5 (u , v )(d) w0.45,0 (u , v ) = u 0.45 · v 0.55√√= (0.45 u + 0.55 v )229/10/2015Wolfgang Stummer and AnnaLena Kißlinger(1)GSI 201516Scale connectors w (u , v ), NOT r −th power means(e) WEXPM: w0.45,˜6 (u , v )f=16med(g) w0.45 (u , v )log 0.45e6u + 0.55e6v= med{min{u , v }, 0.45, max{u , v }}.smooth(j) wadj(u , v ) with hin = −0.5, hout = 0.3, δ = 10−7 , etc.29/10/2015(k) Parameter description forwadj (u , v )Wolfgang Stummer and AnnaLena KißlingerGSI 201517Robustnessto obtain the robustness against outliers and inliers(i.e. high unusualnesses in data, surprising observations),as well as the (asymptotic) efﬁciency of our procedureis a question of a good choice of the scale connector w (·, ·)−→ another long paper Kiss. and Stu. 2015b −→ another talkwe end up with a new transparent, farreaching 3D computergraphical “geometric”method called densitypair adjustment functionthis is vaguely a similar task to choosing a good copulain (inter)dependencemodelling frameworks29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201518Universal model search UMSPD (1)recall: which time series model Xi and which distance D (·, ·)now: model search in detail;basic idea (for ﬁnite discrete distributions):under the correct (“true”) modelF£0 +1 , Qθ0 we get thatmthe sequence Fγk +i (k + i , Xk +i , Xk +i −1, ..., Xk , Zk −, ak +i , ..., ak )0i =1...Nbehaves like a sizeNsample from an iid sequence under the distribution Qθ0 , i.e.1PN [·] :=NN£0δF£0i =1k +i(k +i ,Xk +i ,Xk +i −1 ,...,Xk ,Zk − ,ak +i ,...,ak ) [·]N →∞− − Qθ0 [·]−→and thus£Dα,β PN , Qθ0 − − 0−→0N →∞for a very broad family D := Dα,β (·, ·) : α ∈ [α, α] , β ∈ β, βof distances,emp££where we use the SBDs Dα,β (PN , Qθ ) := Bφα PN , Qθ0  Wβ (PN , Qθ0 )00for a α−family of generators φα (·) (today: the above power functions)and a β−family of scale connectors Wβ (·, ·) (today: geometricmean scalingwβ,0(u , v ) = u β · v 1−β )29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201519Universal model search UMSPD (2)We introduce the universal modelsearch by probability distance (UMSPD):1. choose F£m+1m ≥kfrom a principal parametricfunctionfamily class2. choose some preﬁxed class of parametric candidate distributions {Qθ : θ ∈ Θ}3. ﬁnd a parameter sequence £ := (£m+1)m≥k (often constant) and a θ ∈ Θ such that£Dα,β PN , Qθ ≈ 0for large enough sample size Nand all (α, β) ∈ [α, α] × β, β4. preselect the modelF£m+1 , Qθ if the “3D score surface” (the “mountains”)£S := {(α, β, Dα,β (PN , Qθ )) : α ∈ [α, α] , β ∈ β, β } is smaller thansome appropriatly chosen threshold T29/10/2015Wolfgang Stummer and AnnaLena Kißlinger(namely, a chisquarequantile, see below)GSI 201520Universal model search UMSPD (3)Graphical implementation by plotting the 3D preselectionscore surface S29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201521Universal model search UMSPD (4)ADVANTAGE OF UMSPD:after the preselection process one can continue to work with the same Dα,β (·, ·)in order to perform amongst all preselected candidate modelsa statistically sound inference in terms ofsimultaneous exact parameterestimation and goodnessofﬁt.one issue remains to be discussed for UMSPD:the choice of the threshold T29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201522Universal model search UMSPD (5)exemplarily show how to quantify the abovementioned preselection criterion“the 3D surface S should be smaller than a threshold T ”by some sound asymptotic analysis for the above special choices φα (·) and wβ (·, ·)the cornerstone is the following limit theoremTheoremLet Qθ0 be a ﬁnite discrete distribution with c := Y ≥ 2 possible outcomes and strictlypositive densities qθ0 (y ) > 0 for all y ∈ Y . Then for each α > 0, α = 1 and each β ∈ [0, 1[the random scaled Bregman power distance££12N · Bφα PN0 , Qθ0  (PN0 )β · Qθ0−β=: 2N · B (α, β; £0, θ0; N )is asymptotically chisquared distributed in the sense that2N · B (α, β; £0, θ0; N )L−−−→N →∞χ2−1 .cin terms of the corresponding χ2−1−quantiles, one can derive the threshold Tcwhich the 3D preselectionscore surface S has to (partially) exceedin order to believe with appropriate level of conﬁdencethat the investigated model ((F£m+1 )m≥k , Qθ ) is not good enough to be preselected.29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201523Further Topics• can use scaled Bregman divergences for robust statistical inferenceswith “completely general asymptotic results” for other choices of φ(·) and w (·, ·)−→ Kißlinger & Stu. (2015b)• can use scaled Bregman divergences for change detection in data streams−→ Kißlinger & Stu. (2015c)• explicit formulae for Bφα (Pθ1 , Pθ2 Pθ0 )where Pθ1 , Pθ2 , Pθ0 stem from the same arbitrary exponential family,cf. Stu. & Vajda (2012), Kißlinger & Stu. (2013);including stochastic processes (Levy processes)• we can do Bayesian decision making with important processes• nonstationary stochastic differential equations• e.g. nonstationary branching processes −→ Kammerer & Stu. (2010)• e.g. inhomogeneous binomial diffusion approximations −→ Stu. & Lao (2012)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201524Summary• introduced a new method for model search (model preselection,structure detection) in data streams/clouds:key technical tool: densitybased probability distances/divergences with “scaling”• gives much ﬂexibility for interdisciplinary situationbased applications(also with cost functions, utility, etc.)• gave a new parameterfree asymptotic distribution result for involveddataderived distances/divergences• outlined a corresponding informationgeometric 3D computergraphical selectionprocedure29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201525Ali, M.S., Silvey, D.: A general class of coefﬁcients of divergence of one distributionfrom another. J. Roy. Statist. Soc. B28,131140 (1966)Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efﬁcient estimation byminimising a density power divergence. Biometrika 85, 549–559 (1998)Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum DistanceApproach. CRC Press, Boca Raton (2011)Billings, S.A.: Nonlinear System Identiﬁcation. Wiley, Chichester (2013)Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung aufden Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar.Acad. Sci. A8, 85–108 (1963)Kißlinger, A.L., Stummer, W.: Some Decision Procedures Based on ScaledBregman Distance Surfaces. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2013,LNCS 8085, pp. 479–486. Springer, Berlin (2013)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201526Kißlinger, A.L., Stummer, W.: New model search for nonlinear recursive models,regressions and autoregressions. In: F. Nielsen and F. Barbaresco (Eds.): GSI2015, LNCS 9389, Springer, Berlin (2015a)Kißlinger, A.L., Stummer, W.: Robust statistical engineering by means of scaledBregman divergences. Preprint (2015b).Kißlinger, A.L., Stummer, W.: A New InformationGeometric Method of ChangeDetection. Preprint (2015c).Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)Nock, R., Piro, P., Nielsen, F., Ali, W.B.H., Barlaud, M.: Boosting k −NN forcategorization of natural sciences. Int J. Comput. Vis. 100, 294 – 314 (2012)Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall,Boca Raton (2006)Pardo, M.C., Vajda, I.: On asymptotic properties of informationtheoreticdivergences. IEEE Transaction on Information Theory 49(7), 1860 – 1868 (2003)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201527Read, T.R.C., Cressie, N.A.C.: GoodnessofFit Statistics for Discrete MultivariateData. Springer, New York (1988)Stummer, W.: Some Bregman distances between ﬁnancial diffusion processes.Proc. Appl. Math. Mech. 7(1), 1050503 – 1050504 (2007)Stummer, W., Vajda, I.: On Bregman Distances and Divergences of ProbabilityMeasures. IEEE Transaction on Information Theory 58 (3), 1277–1288 (2012)29/10/2015Wolfgang Stummer and AnnaLena KißlingerGSI 201528
In the context of sensor networks, gossip algorithms are a popular, well established technique, for achieving consensus when sensor data are encoded in linear spaces. Gossip algorithms also have several extensions to non linear data spaces. Most of these extensions deal with Riemannian manifolds and use Riemannian gradient descent. This paper, instead, studies gossip in a broader CAT(k) metric setting, encompassing, but not restricted to, several interesting cases of Riemannian manifolds. As it turns out, convergence can be guaranteed as soon as the data lie in a small enough ball of a mere CAT(k) metric space. We also study convergence speed in this setting and establish linear rates of convergence.

Gossip in CAT (κ) metric spacesAnass BellachehabJ´r´mie JakubowiczeeT´l´com SudParis, Institut MinesT´l´com & CNRS UMR 5157eeeeGSI 2015Palaiseau October 281 / 21ProblemWe consider a network of N agents such that:The network is represented by a connected, undirected graphG = (V , E ), where V = {1, . . . , N} stands for the set ofagents and E denotes the set of available communication linksbetween agents.At any given time t an agent v stores stores data representedas an element xv (t) of a data space M.Xt = (x1 (t), . . . , xN (t)) is the tuple of data values of thewhole network at instant t.2 / 21Problem (cont’d)Each agent has its own Poisson clock that ticks with acommon intensity λ (the clocks are identically made)independently of other clocks.When an agent clock ticks, the agent is able to perform somecomputations and wake up some neighboring agents.The goal is to take the system from an initial state X (0) to aconsensus state; meaning a state of the form X∞ = (x∞ , . . . , x∞ )with: x∞ ∈ M.3 / 21Random Pairwise Gossip (Xiao & Boyd’04)−1 −1 1 −1x0 = −2 1 124 / 21Random Pairwise Gossip (Xiao & Boyd’04)−1 −1 1 −1x0 = −2 1 124 / 21Random Pairwise Gossip (Xiao & Boyd’04)0 −1 0 −1x1 = −2 1 124 / 21Random Pairwise Gossip (Xiao & Boyd’04)0 −1 0 −1x1 = −2 1 124 / 21Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5 0 −1x1 = −2 1 0.5 0.54 / 21Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5 0 −1x1 = −2 1 0.5 0.54 / 21Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5−1 0 x2 = −1 0 0.5 0.54 / 21Random Pairwise Gossip (Xiao & Boyd’04)x∞−0.25−0.25=−0.25−0.250.250.250.250.254 / 21Random Pairwise Gossip (Xiao & Boyd’04)x∞xn =−0.25−0.25=−0.25−0.251I − (δin − δjn )(δin − δjn )T20.250.250.250.25xn−14 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21A natural extension in a metric setting5 / 21Outline1. Motivation2. State of the art3. CAT(κ) spaces4. Previous result for κ = 05. Why the κ > 0 case is more complex6. Our result6 / 21MotivationIn its Euclidean setting, Random Pairwise Midpoint cannot addressseveral useful type of data:Sphere positions (Sphere)Line orientations (Projective space)Solid orientations (Rotations)Subspaces (Grassmanians)Phylogenetic Trees (Metric space)Cayley graphs (Metric space)Reconﬁgurable systems (Metric space)7 / 21State of the artConsensus optimization on manifolds :[SarletteSepulchre’08],[Tron et al.’12],[Bonnabel’13]Synchronization on the circle : [Sarlette et al.’08]Synchronization on SO(3) : [Tron et al.’12]Our previous work: Distibuted pairwise gossip on CAT (0)spacesCaveat: In this work, we deal the problem of synchonization, i.e.attaining a consensus, whatever its value; contrarily to theEuclidean case where it is known that random pairwise midpointsconverges to x0 .¯8 / 21CAT(κ) spacesModel spacesConsider a model surface Mκ with constant sectional curvature κ:κ < 0 corresponds to a hyperbolic spaceκ = 0 corresponds to a Euclidean spaceκ > 0 corresponds to a sphereGeodesicsAssume M is a metric space equipped with metric d. A mapγ : [0, l] → M such that:∀0 ≤ t, t ≤ l,d γ(t), γ(t ) = t − t is called a geodesic in M; a = γ(0) and b = γ(l) are its endpoints.If there exists one and only one geodesic linking a to b, it isdenoted [a, b].9 / 21CAT(κ) spaces (cont’d)TrianglesA triple of geodesics γ, γ and γ with respective endpoints a, band c is called a triangle and is denoted (γ, γ , γ ) or (a, b, c)when there is no ambiguity.Comparison trianglesWhen κ ≤ 0, given a triangle (γ, γ , γ ), there always exist atriangle (aκ , bκ , cκ ) in Mκ such that d(a, b) = d(aκ , bκ ),d(b, c) = d(bκ , cκ ) and d(c, a) = d(cκ , aκ ) with a = γ(0),b = γ (0) and c = γ (0).aκallbllclbκlcκ10 / 21CAT(κ) spaces (cont’d)CAT(κ) inequalityA triangle (γ, γ , γ ) in a metric space M satisﬁes the CAT(κ)inequality if for any x ∈ [a, b] and y ∈ [a, c] one has:d(x, y ) ≤ d(xκ , yκ )where xκ ∈ [aκ , bκ ] is such that d(aκ , xκ ) = d(a, x) andyκ ∈ [aκ , cκ ] is such that d(aκ , yκ ) = d(a, y ).axbdxκycd ≤ dκbκaκdκyκcκA metric space is said CAT(κ) if every pair of points can be joinedby a geodesic and every triangle with perimeter less than2π2Dκ = √κ satisfy the CAT(κ) inequality.11 / 21Formal settingAssumptions1. Time is discrete t = 0, 1, . . .2. At each time each agent holds a “value” xt,v in a CAT (κ)metric space M3. At each time t, an agent Vt randomly wakes up and wakes upa neighbor Wt , according to the probability distribution:P[{Vk , Wk } = {v , w }] =Pv ,w > 00if v ∼ wotherwiseAlgorithm descriptionxt,v =Midpoint(xt−1,Vt , xt−1,Wt ) if v ∈ {Vt , Wt }xt−1,votherwise12 / 21Previous resultThe algorithm is soundBecause geodesics exist and are unique in CAT(0) spaces.ConvergenceThe algorithm converges to a consensus with probability 1,whatever the initial state x0 .Rate of convergenceConvergence occur at a linear rate: deﬁneσ 2 (x) =d 2 (xv , xw ) ;v ∼wthen, there exists a constant L < 0 such thatEσ 2 (Xk ) ≤ C0 exp(Lk)13 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21What changes for the κ > 0 (the case of the sphere)14 / 21Our resultProvided the diameter of the initial set of values is less than Dκ /2,The algorithm is soundBecause geodesics exist and are unique using this restriction.ConvergenceThe algorithm converges to a consensus with probability 1.Rate of convergenceConvergence occur at a linear rate: deﬁneσ 2 (x) =χκ (d(xv , xw )) ;v ∼wwith:√χκ (x) = 1 − cos( κx)then, there exists a constant L ∈ (−1, 0) such that:Eσ 2 (Xk ) ≤ C0 exp(Lk)15 / 21Before iterationxt−1,Vt •xt−1,u•• xt−1,Wt16 / 21After iteration•xt,u••xt,Vt xt,Wt•16 / 21Net balancext−1,Vt •xt−1,u••xt,Vt xt,Wt• xt−1,Wt16 / 21Sketch of proof (Net balance)Let us look at the increments:22N(σκ (Xt ) − σκ (Xt−1 )) = −χκ (d(XVt (t − 1), XWt (t − 1)))+Tκ (Vt , Wt , u)u∈Vu=Vt ,u=Wtwith:Tκ (Vt , Wt , u) = 2χκ (d(Xu (t), Mt )) − χκ (d(Xu (t), XVt (t − 1)))−χκ (d(Xu (t), XWt (t − 1)))Using the inequality:χκ dp+q2,r≤χκ (d(p, r )) + χκ (d(q, r ))217 / 21Sketch of proof (Two propositions)We can prove the a ﬁrst propostion:22E[σκ (Xk+1 ) − σκ (Xk )] ≤ −with:∆κ (x) =12N1E∆κ (Xk )NPv ,w χκ (d(xv , xw ))v ∼w{v ,w }∈EUsing graph connectedeness we prove a second proposition:Assume G = (V , E ) is an undirected connected graph, there existsa constant CG ≥ 1 depending on the graph only such that:∀x ∈ MN ,12∆κ (x) ≤ σκ (x) ≤ CG ∆κ (x)218 / 21Sketch of proof (cont’d)The following lemmaAssume an is a sequence of nonnegative numbers such thatan+1 − an ≤ −βan with β ∈ (0, 1). Then,∀n ≥ 0,an ≤ a0 exp(−βn)Combined with the two propositions, gives the desired result.Eσ 2 (Xk ) ≤ exp(Lk)19 / 21Simulation resultsSphere20 / 21Simulation resultsRotations20 / 21SummaryWe have proved that, when the data belong to completeCAT(κ) metric space, provided the initial values are closeenough, the same algorithm makes sense and also convergelinearly.We have checked that our results are consistent withsimulations.21 / 21
Optimal Transport (chaired by JeanFrançois Marcotorchino, Alfred Galichon)
In this paper we relate the Equilibrium Assignment Problem (EAP), which is underlying in several economics models, to a system of nonlinear equations that we call the “nonlinear BernsteinSchrödinger system”, which is wellknown in the linear case, but whose nonlinear extension does not seem to have been studied. We apply this connection to derive an existence result for the EAP, and an efficient computational method.

T OPICS IN E QUILIBRIUM T RANSPORTATIONAlfred Galichon (NYU and Sciences Po)GSI, Ecole polytechnique, October 29, 2015G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 1/ 22T HIS TALKThis talk is based on the following two papers:AG, Scott Kominers and Simon Weber (2015a). Costly Concessions: AnEmpirical Framework for Matching with Imperfectly Transferable Utility.AG, Scott Kominers and Simon Weber (2015b). The NonlinearBernsteinSchr¨dinger Equation in Economics, GSI proceedings.oG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 2/ 22T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22Section 1E CONOMIC MOTIVATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 4/ 22M OTIVATION : A MODEL OF LABOUR MARKETConsider a very simple model of labour market. Assume that apopulation of workers is characterized by their type x ∈ X , whereX = Rd for simplicity. There is a distribution P over the workers,which is assumed to sum to one.A population of ﬁrms is characterized by their types y ∈ Y (sayY = Rd ), and their distribution Q. It is assumed that there is the sametotal mass of workers and ﬁrms, so Q sums to one.Each worker must work for one ﬁrm; each ﬁrm must hire one worker.Let π (x, y ) be the probability of observing a matched (x, y ) pair. πshould have marginal P and Q, which is denotedπ ∈ M (P, Q ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 5/ 22O PTIMALITYIn the simplest case, the utility of a worker x working for a ﬁrm y atwage w (x, y ) will beα (x, y ) + w (x, y )while the corresponding proﬁt of ﬁrm y isγ (x, y ) − w (x, y ) .In this case, the total surplus generated by a pair (x, y ) isα (x, y ) + w + γ (x, y ) − w = α (x, y ) + γ (x, y ) =: Φ (x, y )which does not depend on w (no transfer frictions). A central plannermay thus like to choose assignment π ∈ M (P, Q ) so tomaxπ ∈M(P ,Q )Φ (x, y ) d π (x, y ) .But why would this be the equilibrium solution?G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 6/ 22E QUILIBRIUMThe equilibrium assignment is determined by an important quantity: thewages. Let w (x, y ) be the wage of employee x working for ﬁrm of typey.Let the indirect surpluses of worker x and ﬁrm y be respectivelyu (x ) = max {α (x, y ) + w (x, y )}yv (y ) = max {γ (x, y ) − w (x, y )}xso that (π, w ) is an equilibrium whenu (x ) ≥ α (x, y ) + w (x, y ) with equality if (x, y ) ∈ Supp (π )v (y ) ≥ γ (x, y ) − w (x, y ) with equality if (x, y ) ∈ Supp (π )By summation,u (x ) + v (y ) ≥ Φ (x, y ) with equality if (x, y ) ∈ Supp (π ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 7/ 22T HE M ONGE K ANTOROVICH THEOREM OF O PTIMAL T RANSPORTATIONOne can show that the equilibrium outcome (π, u, v ) is such that π issolution to the primal MongeKantorovich Optimal Transportationproblemmaxπ ∈M(P ,Q )Φ (x, y ) d π (x, y )and (u, v ) is solution to the dual OT problemminu ,vu (x ) dP (x ) +v (y ) dQ (y )s.t. u (x ) + v (y ) ≥ Φ (x, y )Feasibility+Complementary slackness yield the desired equilibriumconditionsπ ∈ M (P, Q )u (x ) + v (y ) ≥ Φ (x, y )(x, y ) ∈ Supp (π ) =⇒ u (x ) + v (y ) = Φ (x, y )“Second welfare theorem”, “invisible hand”, etc.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 8/ 22E QUILIBRIUM VS . OPTIMALITYIs equilibrium always the solution to an optimization problem?It is not. This is why this talk is about “Equilibrium Transportation,”which contains, but is strictly more general than “OptimalTransportation”.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 9/ 22E QUILIBRIUM VS . OPTIMALITYIs equilibrium always the solution to an optimization problem?It is not. This is why this talk is about “Equilibrium Transportation,”which contains, but is strictly more general than “OptimalTransportation”.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 9/ 22I MPERFECTLY TRANSFERABLE UTILITYConsider the same setting as above, but instead of assuming thatworkers’ and ﬁrm’s payoﬀs are linear in surplus, assumeu (x ) = max {Uxy (w (x, y ))}yv (y ) = max {Vxy (w (x, y ))}xwhere Uxy (w ) is nondecreasing and continuous, and Vxy (w ) isnonincreasing and continuous.Motivation: taxes, decreasing marginal returns, risk aversion, etc. Ofcourse, Optimal Transportation case is recovered whenUxy (w ) = αxy + wVxy (w ) = γxy − w .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 10/ 22I MPERFECTLY TRANSFERABLE UTILITYFor (u, v ) ∈ R2 , letΨxy (u, v ) = min {t ∈ R : ∃w , u − t ≤ Uxy (w ) and v − t ≤ Vxy (w )}so that Ψ is nondecreasing in both variables and(u, v ) = (Uxy (w ) , Vxy (w )) for some w if and only if Ψxy (u, v ) = 0.Optimal Transportation case is recovered whenΨxy (u, v ) = (u + v − Φxy ) /2.As before, (π, w ) is an equilibrium whenu (x ) ≥ Uxy (w (x, y )) with equality if (x, y ) ∈ Supp (π )v (y ) ≥ Vxy (w (x, y )) with equality if (x, y ) ∈ Supp (π )We have therefore that (π, u, v ) is an equilibrium whenΨxy (u (x ) , v (y )) ≥ 0 with equality if (x, y ) ∈ Supp (π ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 11/ 22Section 2T HE MATHEMATICAL PROBLEMG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 12/ 22E QUILIBRIUM TRANSPORTATION : DEFINITIONWe have therefore that (π, u, v ) is an equilibrium outcome when π ∈ M (P, Q )Ψxy (u (x ) , v (y )) ≥ 0.(x, y ) ∈ Supp (π ) =⇒ Ψxy (u (x ) , v (y )) = 0Problem: existence of an equilibrium outcome? This paper: yes in thediscrete case (X and Y ﬁnite), via entropic regularization.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 13/ 22R EMARK 1: LINK WITH G ALOIS CONNECTIONSAs soon as Ψxy is strictly increasing in both variables, Ψxy (u, v ) = 0expresses as−u = Gxy (v ) and v = Gxy1 (u )−where the generating functions Gxy and Gxy1 are decreasing and continuousfunctions. In this case, relations−u (x ) = max Gxy (v (y )) and v (y ) = max Gxy1 (u (x ))y ∈Yx ∈Xgeneralize the LegendreFenchel conjugacy. This pair of relations form aGalois connection; see Singer (1997) and Noeldeke and Samuelson (2015).G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 14/ 22R EMARK 2: T RUDINGER ’ S LOCAL THEORY OF PRESCRIBED J ACOBIANSAssuming everything is smooth, and letting fP and fQ be the densities of Pand Q we have under some conditions that the equilibrium transportationplan is given by y = T (x ), where mass balance yieldsdet DT (x ) =f (x )g (T (x ))and optimality yieds−−∂x GxT1(x ) (u (x )) + ∂u GxT1(x ) (u (x ))u (x ) = 0which thus inverts intoT (x ) = e (x, u (x ) ,u (x )) .Trudinger (2014) studies MongeAmpere equations of the formdet De (., u,u ) =fg (e (., u,u )).(more general than Optimal Transport where no dependence on u).G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 15/ 22D ISCRETE CASEOur work (GKW 2015a and b) focuses on the discrete case, when Pand Q have ﬁnite support. Call px and qy the mass of x ∈ X andy ∈ Y respectively.In the discrete case, problem boils down to looking for (π, u, v ) suchthat πxy ≥ 0, ∑y πxy = px , ∑x πxy = qy.Ψ (u , v ) ≥ 0 xy x yπxy > 0 =⇒ Ψxy (ux , vy ) = 0G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 16/ 22Section 3C OMPUTATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 17/ 22E NTROPIC REGULARIZATIONTake temperature parameter T > 0 and look for π under the formπxy = exp −Ψxy (ux , vy )TNote that when T → 0, the limit of Ψxy (ux , vy ) is nonnegative, andthe limit of πxy Ψxy (ux , vy ) is zero.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 18/ 22¨T HE NONLINEAR B ERNSTEIN S CHR ODINGER EQUATIONIf πxy = exp (−Ψxy (ux , vy ) /T ) , condition π ∈ M (P, Q ) boils downto set of nonlinear equations in (u, v ) ∑y ∈Y exp − Ψxy (ux ,vy ) = pxT ∑x ∈X exp − Ψxy (ux ,vy )T= qywhich we call the nonlinear BernsteinSchr¨dinger equation.oIn the optimal transportation case, this becomes the classical BSequation ∑y ∈Y exp Φxy −ux −vy = px2T ∑x ∈X expG ALICHONΦxy −ux −vy2TE QUILIBRIUM T RANSPORTATION= qySLIDE 19/ 22A LGORITHMΨ ( u ,v )Note that Fx : ux → ∑y ∈Y exp − xy Tx yis a decreasing andcontinuous function. Mild conditions on Ψ therefore ensure theexistence of ux so that Fx (ux ) = px .Our algorithm is thus a nonlinear Jacobi algorithm:0 Make an initial guess of vyk +1 to ﬁt the p margins, based on the v k Determine the uxxykk Update the vy +1 to ﬁt the qy margins, based on the ux +1 . Repeat until v k +1 is close enough to v k .0kOne can proof that if vy is high enough, then the vy decrease to ﬁxedpoint. Convergence is very fast in practice.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 20/ 22Section 4S TATISTICAL E STIMATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 21/ 22M AXIMUM L IKELIHOOD ESTIMATIONˆIn practice, one observes πxy and would like to estimate Ψ. Assumethat Ψ belongs to a parametric family Ψθ , so thatθθθ θπxy = exp −Ψxy ux , vy ∈ M (P, Q ).ˆThe loglikelihood l (θ ) associated to observation πxy isl (θ ) =θˆ∑ πxy log πxyxyθθ θˆ= − ∑ πxy Ψxy ux , vyxyand thus the maximum likelihood procedure consists inθθ θˆmin ∑ πxy Ψxy ux , vy .θG ALICHONxyE QUILIBRIUM T RANSPORTATIONSLIDE 22/ 22
This note presents a short review of the Schrödinger problem and of the first steps that might lead to interesting consequences in terms of geometry. We stress the analogies between this entropy minimization problem and the renowned optimal transport problem, in search for a theory of lower bounded curvature for metric spaces, including discrete graphs.

..Christian L´onardeUniversit´ Paris OuesteGSI’15´Ecole Polytechnique. October 2830, 2015.....Some geometric aspects of theSchr¨dinger problemoInterpolations in P(X )X :Riemannian manifold(state space)P(X ) : set of all probability measures on Xµ0 , µ1 ∈ P(X )interpolate between µ0 and µ1Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tInterpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=0Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=1Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=0Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.25Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.5Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.75Interpolations in P(X )Standard aﬃne interpolation between µ0 and µ1µaﬀ := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=1Interpolations in P(X )....Aﬃne interpolations require mass transference with inﬁnite speed...Interpolations in P(X )..Denial of the geometry of XWe need interpolations built upon trans portation, not tele portation..Aﬃne interpolations require mass transference with inﬁnite speed...Interpolations in P(X )We seek interpolations of this typeInterpolations in P(X )We seek interpolations of this typet=0Interpolations in P(X )We seek interpolations of this typet = 0.25Interpolations in P(X )We seek interpolations of this typet = 0.5Interpolations in P(X )We seek interpolations of this typet = 0.75Interpolations in P(X )We seek interpolations of this typet=1Displacement interpolationµ1µ0Displacement interpolationy = T (x)xyµ1µ0Displacement interpolationgeodesicsµ1µ0Displacement interpolationgeodesicsµ1µ0Displacement interpolationDisplacement interpolationxyγtxyDisplacement interpolationDisplacement interpolationCurvaturegeodesics and curvature are intimately linkedseveral geodesics give information on the curvatureCurvaturegeodesics and curvature are intimately linkedseveral geodesics give information on the curvatureδ(t)θpδ(t) =...√()σp (S) cos2 (θ/2) 242(1 − cos θ) t 1 −t + O(t )6....Displacement interpolationy = T (x)xyµ1µ0Displacement interpolation.Respect geometry..we have already used geodesicshow to choose y = T (x) such that interpolations encrypt curvatureas best as possible?.no shock.....Displacement interpolation.Respect geometry..we have already used geodesicshow to choose y = T (x) such that interpolations encrypt curvatureas best as possible?......no shockperform optimal transportd : Riemannian distance.T : T# µ0 = µ1...Monge’s problem.∫.2. X d (x, T (x)) µ0 (dx) → min;..Lazy gas experimentt=00
This article leans on some previous results already presented in [10], based on the Fréchet’s works,Wilson’s entropy and Minimal Trade models in connectionwith theMKPtransportation problem (MKP, stands for MongeKantorovich Problem). Using the duality between “independance” and “indetermination” structures, shown in this former paper, we are in a position to derive a novel approach to design a copula, suitable and efficient for anomaly detection in IT systems analysis.

Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaOptimal Transport, Independance versusIndetermination duality, impact on a new CopulaDesignBenoit Huyot, Yves MabialaThales Communications and Security29 October 2015Benoit Huyot, Yves Mabiala1Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba1 Cybersecurity problem overviewCurrent Intrusion Detection SystemsAnomaly based IDSIDS as a classiﬁcation problem2 Properties of Copula FunctionCopula theory historicSklar’s Theorem and Frechet’s BoundsRegularity properties on copula function3 Copula theory used in anomalies detection applicationsClassiﬁcation AUC with copula paradigmExperimental resultsBenoit Huyot, Yves Mabiala2Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCurrent Intrusion Detection SystemsRule based approachesSuitable to detect previously known patternsRules are easily understandableEasy addition of new rulesButUnable to detect unknown patternsBenoit Huyot, Yves Mabiala3Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaAnomaly based IDSAnomaly based approachesSuitable to detect unknown patternsTime consuming to update modelAlerts are diﬃcult to understand through existing toolsToo many false alertsButOur approach is an attempt to overcome these problemsBenoit Huyot, Yves Mabiala4Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaAnomaly based IDSAnomaly detection as a classiﬁcation problemY is a binary random variable where Y = 0 if the event isabnormal Y = 1 else.p0 is the a priori attack probability deﬁne by p0 = P(Y ≤ 0)X represents the diﬀerence characteristics of the networkeventIf X is a pdimensional random vector, the cumulativedistribution function will be denotedF (x) = P(X1 ≤ x1 , ..., Xp ≤ xp )Benoit Huyot, Yves Mabiala5Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaIDS as a classiﬁcation problemScoring functionScoring function is deﬁned as P(Y = 0X = x)P(Y = 0, X = x)By deﬁnition we have P(Y = 0X = x) =P(X = x)Anomalies are identiﬁed thanks to the classical Bayes’s rulemodelEmpirical estimation is diﬃcult due to the ”Curse ofDimensionnality”Joint probabilities will be computed using copula theory toease computationsBenoit Huyot, Yves Mabiala6Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCopula theory historicIntroduction to Copula theoryOriginated by M.Fr´chet in 1951eFr´chet, M. (1951): ”Sur les tableaux de corr´lations dont leseemarges sont donn´es”, Annales de l’Universit´ de Lyon,eeSection A no 14, 5377A.Sklar gave a breakthrough in 1959Sklar, A. (1959), ”Fonctions de r´partition ` n dimensions etealeurs marges”, Publ. Inst. Statist. Univ. Paris 8: 229231Benoit Huyot, Yves Mabiala7Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaSklar’s Theorem and Frechet’s BoundsMain results on copula functionTheorem (Sklar’s theorem)Given two continuous random variables X and Y in L1 , withcumulative distribution functions written F and G . It exists anunique function C, called, copula such as:P(X ≤ x, Y ≤ y ) = C(F (x), G (y ))Theorem (Fr´chetHoeﬀding’s Bounds)eGiven a copula function C, ∀(u, v ) ∈ [0, 1]2 we have the followingFr´chet’s bounds:eMax(u + v − 1, 0) ≤ C(u, v ) ≤ Min(u, v )Benoit Huyot, Yves Mabiala8Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaRegularity properties on copula function2increasing property or Monge’s conditionsB + D = C(u1 , v2 )D + C = C(v1 , u2 )A + B + C + D = C(v1 , v2 )D = C(u1 , u2 )A = (A + B + C + D) − (B + D) − (D + C ) + D and A ≥ 0∀(u1 , v1 ) as 0 ≤ u1 ≤ v1 ≤ 1∀(u2 , v2 ) as 0 ≤ u2 ≤ v2 ≤ 1C(v1 , v2 ) − C(u1 , v2 ) − C(v1 , u2 ) + C(u1 , u2 ) ≥ 0Benoit Huyot, Yves Mabiala9Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaRegularity properties on copula functionCopula is an Holderian functionB + C + E = C(u2 , v2 ) − C(u1 , v1 )A + C + E = C(u2 , 1) − C(u1 , 1)B + C + D = C(v2 , 1) − C(v1 , 1)B + C + E ≤ (B + C + D) + (A + C + E )We obtain a 1Holderian condition for the Copula C:∀(u1 , v1 , u2 , v2 ) ∈ [0, 1]4C(u2 , v2 )−C(u1 , v1 ) ≤ u2 −u1 +v2 −v1 Benoit Huyot, Yves Mabiala10Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCopula theory used in anomalies detection applicationsOnly unfrequent events could have a score greater than12Looking for attack remains to looking for rare eventsFr´chet’s Bounds gives useP(Y = 0X ) ≤min(P(X ), P(Y = 0))P(X )and we get:P(Y = 0X ) ≥1⇒ P(X ) ≤ 2.P(Y = 0)2Benoit Huyot, Yves Mabiala11Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaLower bound for anomalies detectionIt’s possible to show limitThe ”lower tail dependance” is deﬁned as: λL = Limv →0λL ≤ Limv →0C(v , v )vC(u, v )vBenoit Huyot, Yves Mabiala12Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaVariation of the score functionWe want to study to variation of v →1v2v∂C(u, v ) − C(u, v )∂vC(u, v )in [0, 2p0 ]v≤0∂CC(u, v )(u, v ) ≤link to convexity∂vv∂⇔ v log C(u, v ) ≤ 1 link to Fisher’s information∂v⇔Benoit Huyot, Yves Mabiala13Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaClassiﬁcation AUC with copula paradigmROC curve and AUCC(p0 , s)p01Speciﬁcity (antiSpeciﬁcity): FalsesPositive Rate,(1 − C(p0 , s))1 − p0Sensitivity: True Positive Rate,AUC =121 − p0 −2p0 (1 − p0 )1(C(p0 , s) − 1)2 ds0In case of a bivariate random vector X we get:11AUC = K1 (p0 )−K2 (p0 )0(C2 (s1 , s2 ) − 1)20Benoit Huyot, Yves Mabiala∂2C2 (s1 , s2 )ds1 ds2∂s1 ∂s214Optimal transport problemIn the MongeKantorovich problem we want to minimize followingquantity:AB1 2h(x, y ) −minhAB00Under constraints:A BThe solution is given by:h(x, y ) = 110 0Ah(x, y ) = g (y )20Bh(x, y ) = f (x)30h∗ (x, y ) =f (x) g (y )1+−BAABThe cumulative distribution functionassociated to the solution is:H ∗ (x, y ) = yG (y )xyF (x)+x−BAABOutline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaClassiﬁcation AUC with copula paradigmAlgorithm principleBenoit Huyot, Yves Mabiala16Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaExperimental resultsExperimental resultsQuantile level used for copula benchmarkQuantile level10−45.10−4 10−35.10−3Optimal Transport CopulaDetection rate18.64% 73.86% 74.32% 74.82%False alarms rate 23.15% 2.32%4.38%3.72%Clayton CopulaDetection rate0.0%0.0%19.28% 71.73%False alarms rate 0.0%0.0%0.63%36.76%Frechet’s upper bound CopulaDetection rate30.35% 31.39% 32.73% 36.93%False alarms rate 41.26% 38.68% 31.89% 27.48%Benoit Huyot, Yves Mabiala10−275.09%4.71%79.86%34.20%79.11%27.95%17Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaExperimental resultsThanks for your attention!Benoit Huyot, Yves Mabiala18Link to Fisher’s InformationWe will use the following equation:∂∂vC(u, v ) =logC (u, v ).vC(u, v ) ∂v∂vThis condition is the statistical scoreThe variance of this quantity gives the Fisher’s InformationSensitivitySensitivity represents how many events are well assigned toanomaliesˆSensitivity : P(Y = 0Y = 0)ˆY = 0 when F (X ) ≤ s for a given threshold sˆY = 0 when X ∈ F −1 ([0; s])Sensitivity: P(X ∈ F −1 ([0; s])p0 )SensitivitySensitivity appears so as :ˆP(Y = 0Y = 0) =ˆP(Y = 0, Y = 0)P(Y = 0)−1P(Y = 0, X ≤ FX (s))P(Y = 0)C(p0 , s)=p0=Speciﬁcity/AntispeciﬁcityAntispeciﬁcity represents how many misclassiﬁcations aregiven by the algorithmˆSpeciﬁcity : P(Y = 1Y = 1)ˆY = 1 when F (X ) ≥ s for a given threshold sˆY = 1 when X ∈ F −1 ([s; 1])Speciﬁcity: P(X ∈ F −1 ([s; 1])p0 )AntispeciﬁcityAntispeciﬁcity appears using survival copula function as:ˆˆ1 − P(Y = 1Y = 1) = P(Y = 0Y > 0)ˆP(Y = 0)ˆP(Y > 0Y = 0)=P(Y > 0)s=(1 − C(p0 , s))1 − p0Area under ROC Curve (AUC)1PD (PF )dPFAUC =0Using an integration by substitution we obtain:1AUC =PD (s).0∂PF (s)ds∂sC(p0 , s)p0sAntispeciﬁcity PF (s) =(1 − C(p0 , s))1 − p0Sensitivity: PD (s)
Optimal Mass Transport overBridgesMichele PavonDepartment of MathematicsUniversity of Padova, ItalyGSI’15, Paris, October 29, 2015Joint work with Yongxin Chen, Tryphon Georgiou, Department ofElectrical and Computer Engineering, University of MinnesotaA Venetian Schr¨dinger bridgeoDynamic version of OMT“Fluiddynamic” version of OMT (Benamou and Brenier (2000)):1inf(ρ,v)∂ρ+Rn012v(x, t) 2ρ(x, t)dtdx,· (vρ) = 0,∂tρ(x, 0) = ρ0(x),(1a)(1b)ρ(y, 1) = ρ1(y).(1c)Proposition 1 Let ρ∗(x, t) with t ∈ [0, 1] and x ∈ Rn, satisfy∂ρ∗∂t+· ( ψρ∗) = 0,ρ∗(x, 0) = ρ0(x),where ψ is the (viscosity) solution of the HamiltonJacobi equation∂ψ∂t+12ψ 2=0for some boundary condition ψ(x, 1) = ψ1(x). If ρ∗(x, 1) = ρ1(x),then the pair (ρ∗, v ∗) with v ∗(x, t) =ψ(x, t) is optimal for (1).Schr¨dinger’s Bridgeso• Cloud of N independent Brownian particles;• empirical distr. ρ0(x)dx and ρ1(y)dy at t = 0 and t = 1, resp.• ρ0 and ρ1 not compatible with transition mechanism1ρ1(y) =0p(t0, x, t1, y)ρ0(x)dx,where−n2p(s, y, t, x) = [2π(t − s)]exp −x − y22(t − s),s
Information Geometry in Image Analysis (chaired by Yannick Berthoumieu, Geert Verdoolaege)
The current paper introduces new prior distributions on the zeromean multivariate Gaussian model, with the aim of applying them to the classification of covariance matrices populations. These new prior distributions are entirely based on the Riemannian geometry of the multivariate Gaussian model. More precisely, the proposed Riemannian Gaussian distribution has two parameters, the centre of mass ˉY and the dispersion parameter σ. Its density with respect to Riemannian volume is proportional to exp(−d2(Y;ˉY)), where d2(Y;ˉY) is the square of Rao’s Riemannian distance. We derive its maximum likelihood estimators and propose an experiment on the VisTex database for the classification of texture images.

Geometric Science of Information 2015Non supervised classificationin the space of SPD matricesSalem Said – Lionel Bombrun – Yannick BerthoumieuLaboratoire IMS CNRS UMR 5218 – Universit´ de Bordeauxe29 October 2015Said et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 201529 October 20150 / 11Context of our workOur project : Statistical learning in the space of SPD matricesOur team :3 members of IMS laboratory + 2 post docs (Hatem Hajri, Paolo Zanini)Target applications : remote sensing , radar signal processing , Neuroscience (BCI)Our partners : IMB (Marc Arnaudon + PhD student), Gipsalab, Ecole des MinesOur recent workhttp ://arxiv.org/abs/1507.01760Riemannian Gaussian distributions on the space of SPD matrices (in review, IEEE IT)Some of our problems :Given a population of SPD matrices (any size or structure)− Nonsupervised learning of its class structure− Semiparametric learning of its densityPlease look up our paper on Arxiv :)Said et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 201529 October 20151 / 11Geometric toolsStatistical manifold : Θ = SPD, Toeplitz, BlockToeplitz, etc, matricesHessian or Fisher metric :ds 2 (θ ) = Hess Φ (dθ,dθ )Φ model entropy— Θ becomes a Riemannian homogeneous space of negative curvature ! !Example : 2 × 2 correlation (baby Toeplitz)Θ =1θ∗⇒ ds 2 (θ ) =θ1θ  < 1dθ  2[1 − θ  2 ] 2Φ(θ ) = − log[1 − θ  2 ]Poincar´ disc modeleWhy do we use this ?– Suitable mathematical properties– Relation to entropy or “information”– Often leads to excellent performanceSaid et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 2015First place in IEEE BCI challenge29 October 20152 / 11ContributionIIntroduction of Riemannian Gaussian distributionsA statistical model of a class/cluster :RiemannianGaussiandistribution¯G(θ, σ )[Pennec 2006]¯d 2 (θ, θ )¯p(θ  θ, σ ) = Z −1 (σ ) × exp −2σ 2Expression unknownin the literature¯d (θ , θ ) Riemannian distanceComputing Z (σ )Case where Θis the spaceof m × mcovariancematricesZ (σ ) =exp −Θ¯d 2 (θ, θ )dv (θ )2σ 2¯¯d (θ, θ ) = tr log θ −1 θ22dv (θ ) = det(θ ) −m+12dθ iji
We present a new texture discrimination method for textured color images in the wavelet domain. In each wavelet subband, the correlation between the color bands is modeled by a multivariate generalized Gaussian distribution with fixed shape parameter (Gaussian, Laplacian). On the corresponding Riemannian manifold, the shape of texture clusters is characterized by means of principal geodesic analysis, specifically by the principal geodesic along which the cluster exhibits its largest variance. Then, the similarity of a texture to a class is defined in terms of the Rao geodesic distance on the manifold from the texture’s distribution to its projection on the principal geodesic of that class. This similarity measure is used in a classification scheme, referred to as principal geodesic classification (PGC). It is shown to perform significantly better than several other classifiers.

FACULTY OF ENGINEERING ANDARCHITECTUREColor Texture Discrimination using thePrincipal Geodesic Distance on a MultivariateGeneralized Gaussian ManifoldGeert Verdoolaege1,2 and Aqsa Shabbir1,31 Departmentof Applied Physics, Ghent University, Ghent, Belgiumfor Plasma Physics, Royal Military Academy (LPP–ERM/KMS),Brussels, Belgium3 MaxPlanckInstitut für Plasmaphysik, D85748 Garching, Germany2 LaboratoryGeometric Science of InformationParis, October 28–30, 2015Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classiﬁcation4Classiﬁcation experiments5Conclusions2Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classiﬁcation4Classiﬁcation experiments5Conclusions3VisTex database128 × 128 subimages extracted from RGB images from 40 classes(textures)4CUReT database200 × 200 RGB images from 61 classes with varying illumination andviewpoint5Texture modelingStructure at various scalesStochasticityCorrelations between colors, neighboring pixels, etc.⇒ Multivariate wavelet distributions6Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classiﬁcation4Classiﬁcation experiments5Conclusions7Generalized Gaussian distributionsUnivariate: generalized Gaussian distribution (zero mean):p(xα, β) =βexp −2αΓ(1/β)xαβmvariate multivariate generalized Gaussian (MGGD, zeromean):m2m2ββ1x Σ−1 x2π Γ2 ΣShape parameterβ = 1: Gaussian; β = 1/2: Laplace (heavy tails)p(xΣ, β) =Γm2βm2β12exp −8MGGD geometry: coordinate system(Σ1 , β1 ) → (Σ2 , β2 ): ﬁnd K such thatK Σ1 K = Im ,K Σ2 K ≡ Φ2 ≡ diag(λ1 , . . . , λp ),22λi2 eigenvalues of Σ−1 Σ21In fact,∀ Σ(t), t ∈ [0, 1]: K Σ(t)K ≡ Φ(t) ≡ diag(λ1 , . . . , λp ),22λi2 eigenvalues of Σ−1 Σ(t)1r i (t) ≡ ln[λi (t)]M. Berkane et al., J. Multivar. Anal., 63, 35–46, 1997G. Verdoolaege and P. Scheunders, J. Math. Imaging Vis., 43, 180–193, 20129MGGD geometry: Fisher information metricgββ (β) =+1β21+m2β2Ψ1mm[ln(2)]2 + Ψ 1 +2β2βm2β+mln(2) + Ψβln(4) + Ψ 1 +1m1 + ln(2) + Ψ 1 +2β2β1gii (β) = 3bh −41gij (β) = bh − , i = j4m2βm2β+ Ψ1 1 +m2βgβi (β) = −bh ≡1 m + 2β4 m+210MGGD geometry: geodesics and exponential mapGeodesic equations for ﬁxed β:r i (t) ≡ ln(λi2 ) tGeodesic distance:1GD(Σ1 , Σ2 ) = 3bh −41/2i1i(r2 )2 + 2 bh −4i jr2 r2 i
Practical estimation of mixture models may be problematic when a large number of observations are involved: for such cases, online versions of ExpectationMaximization may be preferred, avoiding the need to store all the observations before running the algorithms. We introduce a new online method wellsuited when both the number of observations is large and lots of mixture models need to be learned from different sets of points. Inspired by dictionary methods, our algorithm begins with a training step which is used to build a dictionary of components. The next step, which can be done online, amounts to populating the weights of the components given each arriving observation. The usage of the dictionary of components shows all its interest when lots of mixtures need to be learned using the same dictionary in order to maximize the return on investment of the training step. We evaluate the proposed method on an artificial dataset built from random Gaussian mixture models.

Information Geometry for mixturesCoMixture ModelsBag of componentsBagofcomponents: an online algorithm forbatch learning of mixture modelsOlivier SchwanderFrank NielsenUniversité Pierre et Marie Curie, Paris, FranceÉcole polytechnique, Palaiseau, FranceOctober 29, 20151 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsExponential familiesDeﬁnitionp(x ; λ) = pF (x ; θ) = exp ( t(x )θ − F (θ) + k(x ))λ source parametert(x ) suﬃcient statisticθ natural parameterF (θ) lognormalizerk(x ) carrier measureF is a stricly convex and diﬀerentiable function·· is a scalar product2 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMultiple parameterizations: dual parameter spacesMultiple source parameterizationsSource Parameters (not unique)λ1 ∈ Λ1 , λ2 ∈ Λ2 , . . . , λn ∈ Λnθ=F (η)Legendre Transform(F, Θ) ↔ (F , H)θ∈ΘNatural Parametersη=F (θ)η∈HExpectation ParametersTwo canonical parameterizations3 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman divergencesDeﬁnition and propertiesBF (x y ) = F (x ) − F (y ) − x − y , F (y )F is a stricly convex and diﬀerentiable functionNo symmetry!Contains a lot of common divergencesSquared Euclidean, Mahalanobis, KullbackLeibler,ItakuraSaito. . .4 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman centroidsLeftsided centroidmincωi BF (c xi )iRightsided centroidmincωi BF (xi c)iClosedformcL = F ∗ωi F (xi )icR =ωi xii5 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsLink with exponential families[Banerjee 2005]Bijection with exponential familieslog pF (x θ) = −BF ∗ (t(x ) η) + F ∗ (t(x )) + k(x )KullbackLeibler between exponential familiesbetween members of the same exponential familyKL(pF (x , θ1 ), pF (x , θ2 )) = BF (θ2 θ1 ) = BF (η1 η2 )KullbackLeibler centroidsIn closedform through the Bregman divergence6 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMaximum likelihood estimatorA Bregman centroidη = arg maxˆlog pF (xi , η)ηiBF ∗ (t(xi ) η) −F ∗ (t(xi )) − k(xi )= arg minηiBF ∗ (t(xi ) η)= arg minη=does not depend on ηit(xi )iˆAnd θ =F (ˆ)η7 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMixtures of exponential familiesm(x ; ω, θ) =ωi pF (x ; θi )1≤i≤kFixedFamily of the components PFNumber of components k(model selection techniquesto choose)ParametersWeightsiωi = 1Component parameters θiLearning a mixtureInput: observations x1 , . . . , xNOutput: ωi and θi8 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman Soft Clustering: EM for exponential families[Banerjee 2005]Estepp(i, j) =Mstepηj = arg maxωj pF (xi , θj )m(xi )p(i, j) log pF (xi , θj )ηi= arg minη=ip(i, j) BF ∗ (t(xi ) η) −F ∗ (t(xi )) − k(xi )idoes not depend on ηp(i, j)t(xu )u p(u, j)9 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationsJoint estimation of mixture modelsExploit shared information between multiple pointsetsto improve qualityto improve speedInspirationEﬃcient algorithmsDictionary methodsBuildingTransfer learningComparing10 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationsCoMixturesSharing components of all the mixtureskm1 (x ω(1)(1)ωi pF (x  ηj ), η) =i=1...kmS (x ω (S) , η) =(S)ωi pF (x  ηj )i=1Same η1 . . . ηk everywhereDiﬀerent weights ω (l)11 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationscoExpectationMaximizationMaximize the mean of the likelihoods on each mixturesEstepA posterior matrix for each dataset(l)p (l) (i, j) =ωj pF (xi , θj )(l)m(xi ω (l) , η)MstepMaximization on each dataset(l)ηj=ip(i, j)(l)t(xu )p (l) (u, j)uAggregationηj =1SS(l)ηjl=112 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationsVariational approximation of KullbackLeibler[Hershey Olsen 2007]KKLVariationnal (m1 , m2 ) =(1)logpF (·; θj ))i=1(2)pF (·; θj ))jωj e −KL(pF (·; θi )j(1)ωiωj e −KL(pF (·; θi )With shared parametersPrecompute Dij = e −KL(pF (· ηi ),pF (· ηj ))Fast version(1)(1)KLvar (m1 m2 ) =ωiilogjωj e −Dijjωj e −Dij(2)13 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationscoSegmentationSegmentation from 5D RGBxy mixturesOriginalEMCoEM14 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsMotivationAlgorithmsApplicationsTransfer learningIncrease the quality of one particular mixture of interestFirst image: only 1% of the pointsTwo other images: full set of pointsNot enough points for EM15 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsAlgorithmExperimentsBag of ComponentsTraining stepComix on some training setKeep the parametersCostly but oﬄineD = {θ1 , . . . , θK }Online learning of mixturesFor a new pointsetFor each observation arriving:arg max pF (xj , θ)θ∈Dorarg min BF (t(xj ), θ)θ∈D16 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsAlgorithmExperimentsNearest neighbor searchNaive versionLinear searchO(number of samples × number of components)Same order of magnitude as one step of EMImprovementComputational Bregman Geometry to speedup the searchBregman Ball TreesHierarchical clusteringApproximate nearest neighbor17 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsAlgorithmExperimentsImage segmentationSegmentation on a random subset of the pixels100%10%1%EMBoC18 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsAlgorithmExperimentsComputation times120TrainingEMBoC100806040200Training100%10%1%19 / 20Information Geometry for mixturesCoMixture ModelsBag of componentsAlgorithmExperimentsSummaryComixMixtures with shared componentsCompact description of a lot of mixturesFast KL approximationsDictionarylike methodsBag of ComponentsOnline methodPredictable time (no iteration)Works with only a few pointsFast20 / 20
Stochastic watershed is an image segmentation technique based on mathematical morphology which produces a probability density function of image contours. Estimated probabilities depend mainly on local distances between pixels. This paper introduces a variant of stochastic watershed where the probabilities of contours are computed from a gaussian model of image regions. In this framework, the basic ingredient is the distance between pairs of regions, hence a distance between normal distributions. Hence several alternatives of statistical distances for normal distributions are compared, namely Bhattacharyya distance, Hellinger metric distance and Wasserstein metric distance.

Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStatistical Gaussian Model of Image Regionsin Stochastic Watershed SegmentationJesús Angulojesus.angulo@minesparistech.fr ; http://cmm.ensmp.fr/∼anguloMINES ParisTech, PSLResearch University,CMMCentre de Morphologie MathématiqueGSI'2015  2nd Conference on Geometric Science of InformationEcole Polytechnique, ParisSaclay (France)  October 28th30th 20151 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: Color imageLarge homogenous areas, well contrasted objects as well as textured zonesand fuzzy boundaries2 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: its color gradient imageLarge homogenous areas, well contrasted objects as well as textured zonesand fuzzy boundaries3 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: pdf of contours using stochastic watershedUsing watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured4 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: hdynamics watershed cut from SW pdf,h = 0.1Using watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured5 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: hdynamics watershed cut from SW pdf,h = 0.3Using watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured6 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goal7 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goalContext:Segmentation approaches based on statistical modeling of pixels andregions, e.g, mean shift and statistical region mergingHierarchical contour detection and segmentation, e.g., machinelearned edge detection, watershed transformStochastic watershed (SW): to estimate a probability densityfunction of contours7 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goalContext:Segmentation approaches based on statistical modeling of pixels andregions, e.g, mean shift and statistical region mergingHierarchical contour detection and segmentation, e.g., machinelearned edge detection, watershed transformStochastic watershed (SW): to estimate a probability densityfunction of contoursGoal: Take into account regional information in the probabilityestimation by SW by means of a statistical gaussian model⇒Moreperceptual strength function of contours7 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPlan1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives8 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo Simulations1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives9 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsUniform random germsGenerate realizations of a Poisson point process with a constantintensityθ(i.e., average number of points per unit area)Random number of pointsBorel set), with areaparameterθD,D,N(D)falling in a domainD(boundedfollows a Poisson distribution withi.e.,nPr{N(D)= n} = e −θDConditionally to the fact thatN(D) = n,(−θD)n!theindependently and uniformly distributed overnumber of points inDisθDn points areD , and the average(i.e., the mean and variance of aPoisson distribution is its parameter10 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsRegionalized random germsLet us suppose that the densityθis not constant; but considered asmeasurable positivevalued function, dened inlet us writeθ(D) =θ(x)d xNumber of points falling in a Borel setdensity functionθBaccording to a regionalizedθ(D),nPr{N(D)N(D) = n,For simplicity,follows a Poisson distribution of parameteri.e.,IfRd .then= n} = e −θ(D)(−θ(D))n!are independently distributed overDwith theprobability density function:θ(x) = θ(x)/θ(D)11 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsN random germs in an image m : E → {0, 1}θ(x) using inverse transform samplingGeneratedensity1Initialization:2Compute cumulative distribution function:3forj =1toaccording tom(xi ) = 0 ∀xi ∈ E ; P = Card(E )cdf (xi ) =k≤iPk=1θ(xk )θ(xk )N4rj ∼ U (1, P)5Find the value6m(xsj ) = 1sjsuch thatrj ≤ cdf (xsj ).12 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsStochastic watershed paradigmSpreading random germs as markers on the watershed segmentation.This arbitrary choice is stochastically balanced by the use of a givennumberMof realizations, in order to lter out non signicantuctuationsEach piece of contour may then be assigned the number of times itappears during the various simulations in order to estimate aprobability density function (pdf ) of contoursIn the case of uniformly distributed random germs, large regions willbe sampled more frequently than smaller regions and will be selectedmore oftenImage gradient as density for regionalization of random germsinvolves sampling high contrasted image areas: probability ofselecting a contour will oer a tradeo between strength of thecontours and size of the adjacent regions13 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedLet{mrkn (x)}M 1n=be a series ofMrealizations ofNspatiallydistributed random markers according to its gradient imagegEach realization of random germs considered as the marker imagefor a watershed segmentation of gradient imagegin order to obtainthe binary image:1if0WS(g , mrkn )(x) =ifx ∈ Watershedx ∈ Watershed/lineslinesProbability density function of contours is computed by the kerneldensity estimation method:pdf (x) =1MMWS(g , mrkn )(x) ∗ Kσ (x).n=1where the smoothing kernelwidthKσ (x)is a spatial Gaussian function ofσ14 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Color gradientg (x)15 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershed{mrkn (x)}M 1 : Mn=θ(x) = g (x)realizations ofNregionalized Poisson points of···{WS(g , mrkn )}1≤n≤M :Watershed segmentations···16 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.117 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.318 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SW1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives19 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SW⇒Watershed transformTessellationτofETessellationfrom watershedWS(x):(Finite) family of disjointopen sets (or classes, or regions)τ = {Rr }1≤r ≤N ,withi = j ⇒ R i ∩ Rj = ∅such thatE = ∪r RrWS(x) ⇔ WS(x) = E \ ∪r Rr = ∪li,jBoundary between regionsRiandRj(1≤ i, j ≤ N , i = j):Irregular arcsegmentli,j = ∂Ri ∩ ∂Rj20 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWColor regions as multivariate normal distributionsThe color image values restricted to each region of the partition,Pi = f (Ri ),can be modeled by dierent statistical distributionsHere we focuss on a multivariate normal modelPi ∼ N (µi , Σi ),of meanµiand covariance matrixΣiDierent (statistical) distances are dened in the space ofN (µi , Σi )Boundary li,j will be weighted with a function depending on thedistance betweenN (µi , Σi )andN (µj , Σj )21 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsBhattacharyya distanceDB (P1 , P2 )It measures the similarity of two discrete or continuous probabilitydistributionsP1andP2by computing the amount of overlapbetween the two statistical populations:DB (P1 , P2 ) = − logP1 (x)P2 (x)dxFor multivariate normal distributionsDB (P1 , P2 ) =18(µ1 −µ2 )T Σ−1 (µ1 −µ2 )+whereΣ=12log√det Σdet Σ1 det Σ2,Σ1 + Σ22Note that the rst term in the Bhattacharyya distance is related tothe Mahalanobis distance, both are the same when the covariance ofboth distributions is the same22 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsHellinger metric distance0≤ DB ≤ ∞DH (P1 , P2 )and it is symmetricDB (P1 , P2 ),butDBdoes not obeythe triangle inequality and therefore it is not a metricBhattacharyya distance can be metrized by transforming it into tothe following Hellinger metric distanceDH (P1 , P2 ) =1− exp (−DB (P1 , P2 )),For multivariate normal distributionsDH (P1 , P2 ) =1−√det Σdet Σ1 det Σ2Hellinger distance is anα=0αdivergence,−1/21e (− 4 (µ1 −µ2 )1 +Σ2 )−1 (µ1 −µ2 ))T (Σwhich corresponds to the caseand it is the solely being a metric distance. Hellinger distancecan be related to measure theory and asymptotic statistics23 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsWasserstein metric distanceDW (P1 , P2 )Wasserstein metric is a distance function dened between probabilitymeasuresµandνRnonis based on the notion optimal transport:W2 (µ, ν) = inf E( X − Y2 1 /2)where the inmum runs over all random vectorswithX ∼µand,(X , Y ) ∈ Rn × RnY ∼νFor the case of discrete distributions, it corresponds to thewellknown earth mover's distanceFor two multivariate normal distributions:µ1 − µ2DW (P1 , P2 ) =2+ Tr (Σ1 + Σ2 − 2Σ1,2 ),where1/21/2Σ1,2 = Σ1 Σ2 Σ1In particular, in the commutative case2DW (P1 , P2 ) = µ1 − µ221/2.Σ 1 Σ 2 = Σ 2 Σ1one has1/21.+ Σ−Σ1 /2 2F224 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimation25 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :πi,j =whereD(Pi , Pj )D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )is any of the distances discussed above25 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :πi,j =whereD(Pi , Pj )D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )is any of the distances discussed aboveFor any realizationnof SW, denotedWS(x, n),one can compute animage of weighted contours:Pr (x, n) =πi,jif0ifnx ∈ li,jnx ∈ li,j/25 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )πi,j =whereD(Pi , Pj )is any of the distances discussed aboveFor any realizationnof SW, denotedWS(x, n),one can compute animage of weighted contours:πi,jIntegrating across theMif0Pr (x, n) =ifnx ∈ li,jnx ∈ li,j/realizations, the MonteCarlo estimate ofthe probability density function of contours:pdf (x) =1MMPr (x, n) ∗ Kσ (x)n=125 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimation{WS(g , mrkn )}1≤n≤M :Watershed segmentations···{Pr (x, n)}1≤n≤M :Weighted contours (Bhattacharyya distance)···26 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationBhattacharyya distanceColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.0227 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationDistance of meansHellinger distance1Bhattacharyya distanceWasserstein distance1 F. LópezMir, V. Naranjo, S. Morales, J. Angulo. Probability Density Function ofObject Contours Using Regional Regularized Stochastic Watershed. In IEEE ICIP'14.28 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationDistance of meansHellinger distance2Bhattacharyya distanceWasserstein distance2 F. LópezMir, V. Naranjo, S. Morales, J. Angulo. Probability Density Function ofObject Contours Using Regional Regularized Stochastic Watershed. In IEEE ICIP'14.29 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo RobustestimationDistance of meansBhattacharyya distanceHellinger distanceWasserstein distance30 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo RobustestimationDistance of meansBhattacharyya distanceHellinger distanceWasserstein distance31 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWComparison with SRMStatistical Region Merging (SRM)3depending on scale parameterSegmentation forSum of contours from nineQQ = 128Q256, 128, 64, 32, 16, 8, 4, 2, 1Segmentation for3 R.Nock, F. Nielsen. Statistical Region Merging.26(11):14521458, 2004.Q = 32IEEE Trans. on PAMI,32 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPerspectives1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives33 / 34Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPerspectivesPerspectivesIn addition to the color, each pixeltensorxdescribed also by its structureT (x) ∈ SPD(2):Each regionEach regionRi : N (0, Σi ), Σi = Ri −1 x∈Ri T (x)Ri : The histogram of structure tensors {T (x)}x∈RiFrom to color to multi/hyperspectral images: Highdimensionalcovariance matrix estimated locally in regionsSupervised segmentation: Distance learning from training images ofannotated contoursMATLAB code available.34 / 34
A technique of spatialspectral quantization of hyperspectral images is introduced. Thus a quantized hyperspectral image is just summarized by K spectra which represent the spatial and spectral structures of the image. The proposed technique is based on αconnected components on a region adjacency graph. The main ingredient is a dissimilarity metric. In order to choose the metric that best fit the hyperspectral data manifold, a comparison of different probabilistic dissimilarity measures is achieved.

ÉÙ ÒØ Þ Ø ÓÒ ÓÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð Ñ Ñ Ò ÓÐ Ù× ÒÔÖÓ Ð ×Ø
×Ø Ò
×ÒÒ Ê Æ ÀÁ¸ Â × × Æ ÍÄÇÅÅ¹ÅÁÆÒØÖÅÓÖÔ ÓÐÓË È Ö ×Ì
¸ ÈËÄ¹Ê ×Å ØÖ
Ñ Ø ÕÙ ¸ÍÒ Ú Ö× ØÝËÁ ¾¼½ ¾Ò
ÓÒ Ö Ò
ÓÒ ÓÑ ØÖ
Ë
Ò
Ó ÁÒ ÓÖÑ Ø ÓÒÇ
ØÓ Ö ¾¼½½»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÈÐ Ò½ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×¾»¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×Ñ Ò ÓÐ Ù× Ò ÔÖÓÀÝÔ Ö×Ô
ØÖ Ð ÑÐ ×Ø
×Ø Ò
××ÀÝÔ Ö×Ô
ØÖ Ð Ñ
ÓÒ× ×Ø× Ó × ÑÙÐØ Ò ÓÙ×
ÕÙ × Ø ÓÒ Ó ×Ô
ØÖÙÑÓ Ö
Ø Ð Ø Ø
Ô Ü Ð Ó Ø Ñ ºÙÖ Ì Ò Ó Ò ÙÐØÖ » ÝÔ Ö¹×Ô
ØÖ Ð ÑÝ × Ø ÐÐ Ø½½ Å ÒÓÐ ×¸ º¸ Å Ö Ò¸ º¸ ² Ë Û¸ º º ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð ÑÔÖÓ
×× ÒÓÖ ÙØÓÑ Ø
Ø Ö Ø Ø
Ø ÓÒ ÔÔÐ
Ø ÓÒ×º Ä Ò
ÓÐÒ Ä ÓÖ ØÓÖÝ ÂÓÙÖÒ Ð¸ ½ ´½µ¸ ¹½½ º¿»¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×Ñ Ò ÓÐ Ù× Ò ÔÖÓÀÝÔ Ö×Ô
ØÖ Ð ÑÌ ÖÐ ×Ø
×Ø Ò
××Ö ØÛÓ Û Ý× ØÓ Ñ Ò ÔÙÐ Ø ÀÝÔ Ö×Ô
ØÖ Ð Ñ×ÙÖ Ê ÔÖ × ÒØ Ø ÓÒ Ó Ò ÝÔ Ö×Ô
ØÖ Ð Ñ¾× Ú ÒØØ × ØÝÔ Ó Ñ
ÓÒØ Ò× Ö ÙÒ ÒØ Ò ÓÖÑ Ø ÓÒ
ÓÖÖ Ð Ø Ú Ö Ð ×ÀÑ Ò× ÓÒ Ð ØÝ ÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ÙÒ Ö Ó
ÒÒ Ð× »ÙÐØÖ ×Ô
ØÖ Ð Ô
ØÙÖØ ÓÙ× Ò Ó
ÒÒ Ð×¾ Å ÒÓÐ ×¸ º¸ Å Ö Ò¸ º¸ ² Ë Û¸ º º ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð ÑÔÖÓ
×× ÒÓÖ ÙØÓÑ Ø
Ø Ö Ø Ø
Ø ÓÒ ÔÔÐ
Ø ÓÒ×º Ä Ò
ÓÐÒ Ä ÓÖ ØÓÖÝ ÂÓÙÖÒ Ð¸ ½ ´½µ¸ ¹½½ º»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒËØ Ø Ó ØÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐËØ Ø Ó ØØ×Ø Ò
×Ð ×Ø
×Ø Ò
×ÖØ ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ø×Ø Ò
×Ò ¸ º Áº ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð Ñ Ò Ø
Ò ÕÙ × ÓÖ ×Ô
ØÖ ÐØ
Ø ÓÒ Ò
Ð ××
Ø ÓÒ ´ÎÓÐº ½µº ËÔÖ Ò Ö Ë
Ò
² Ù× Ò ××Å ºÈ
Ð ¸ Èº¸ ² Ù Ò¸ Êº Èº ´¾¼¼¿µº ×× Ñ Ð Ö ØÝ¹ ×
Ð ××
Ø ÓÒ Ó×Ô
ØÖ
ÓÑÔÙØ Ø ÓÒ Ð ××Ù ×º Ê Ð¹Ì Ñ ÁÑ Ò ¸ ´ µ¸ ¾¿ ¹¾ ºÅ ¸ Äº¸ Ö Û ÓÖ ¸ Åº Åº¸ ² Ì Ò¸ Âº ´¾¼½¼µº ÄÓ
Ð Ñ Ò ÓÐÐ ÖÒ Ò ¹ × ¹Ò Ö ×Ø¹ÒÓÖ ÓÖ ÝÔ Ö×Ô
ØÖ Ð Ñ
Ð ××
Ø ÓÒº Ó×
Ò
Ò Ê ÑÓØ Ë Ò× Ò ¸ ÁÌÖ Ò×
Ø ÓÒ×ÓÒ¸ ´½½µ¸ ¼ ¹ ½¼ ºÖ Û ÓÖ ¸ Åº Åº¸ Å ¸ Äº¸ ² Ã Ñ¸ Ïº ´¾¼½½µº ÜÔÐÓÖ Ò ÒÓÒÐ Ò ÖÑ Ò ÓÐ Ð ÖÒ Ò ÓÖ
Ð ××
Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð Ø º ÁÒ ÇÔØ
ÐÊ ÑÓØ Ë Ò× Ò ´ÔÔº ¾¼ ¹¾¿ µº ËÔÖ Ò Ö ÖÐ Ò À Ð Ö ºÙ Ù Ò¸ Äº¸ Î Ð ×
Ó¹ ÓÖ ÖÓ¸ Ëº¸ ² ËÓ ÐÐ ¸ Èº ´¾¼½ µº ÄÓ
Ð ÑÙØÙ ÐÒ ÓÖÑ Ø ÓÒ ÓÖ ×× Ñ Ð Ö ØÝ¹ × Ñ× Ñ ÒØ Ø ÓÒº ÂÓÙÖÒ Ð ÓÑ Ø Ñ Ø
Ð Ñ Ò Ò Ú × ÓÒ¸ ´¿µ¸ ¾ ¹ º»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓØÅÓ Ð Ó ØÄ Ø Ù×ÒÓØÐ ×Ø
×Ø Ò
×ØÝÒ ∈ N Ø ÒÙÑ Ö Ó Ô Ü Ð ÓÒ Ø Ñ º ËÓ Ø ÒÙÑ Ö Ó ×Ô
ØÖÖ ÒØ= (Ü , . . . , Ü ) ∈ R ¸ Ò= (Ý , . . . , Ý ) ∈ R ¸ ØÛÓ½½×Ô
ØÖ Ó Ø ÑÁØ × ÔÓ×× Ð ØÓ ÒÓÖÑ Ð Þ Ø Ñ¸ ×Ù
Ø Ø Ø Ý)∈R ÒÐ Ø Ù×
ÓÒ× Ö È = ( ½ , . . . ,½È =(,...,) ∈ R ¸ Û
Ö ÔÖ × ÒØ ØØ × Ú
ØÓÖ×ºÜÝÝÝÝÜÜÜÝÜÚ Ø× Ñ ÒÓÖÑ ½¸ÒÓÖÑ Ð Þ Ú Ö× ÓÒ Ó»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑØÅÓ Ð Ó ØÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ØÙÖ Ê ÔÖ × ÒØ Ø ÓÒ Ó Ò ÝÔ Ö×Ô
ØÖ Ð Ñ¿¿ ÄÌÅ ÆÆ¸ º¸ Ç ÁÇÆ¸ Æº¸ Å Ä Í ÀÄÁÆ¸ Ëº¸ ² ÌÇÍÊÆ Ê Ì¸ Âº ºÑ Ð Ò ÒÓÒ¹Ð Ò Ö ³ Ñ × ÝÔ Ö×Ô
ØÖ Ð × Ð³ÓÒ
Ø ÓÒ× Ö Ð ××Ø ÑÓ Ò Ö ×
ÖÖ × ÓÖØ Ó ÓÒ ÙÜº»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑØÅÓ Ð Ó ØÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ØÄ Ø Ù× ÒÓØ Ý Ê ∈ N Ø ÒÙÑ Ö Ó
Ð ×× ÓÒ ØÐ Ò Û Ø Ø ÒÙÑ Ö Ó Ñ Ø Ö Ð×ºÄ Ø Ù×
ÓÒ× Ö Ø Ø ÛÚÖÓÙÒ ØÖÙØ º´ µÑ¸Û
×´ µÙÖ Ë
ØØ Ö ÔÐÓØ Ó ×Ø Ò
× Ó Ô Ü Ð× ´Ó È Ú Ñ µ Ó
ÐÙ×Ø Ö ½ ´ ÒÐÙ µ ØÓ Ø
ÒØÖÓ Ó
ÐÙ×Ø Ö ½ Ò ¸ ØÓ ØÓ Ø
ÒØÖÓ Ó
ÐÙ×Ø Ö ¾ Ò ¸ ØÓØ
ÒØÖÓ Ó
ÐÙ×Ø Ö ¿ Ò º Ë Ñ Ð ÖÐÝ¸ Ò Ö ÓÖ
ÐÙ×Ø Ö ¾¸ Ò Ò Ö Ò ÓÖ
ÐÙ×Ø Ö ¿º ÁÒ ´ µ Û Ù× Ø Ä¾ ÒÓÖÑ × ×× Ñ Ð Ö ØÝ Ñ ×ÙÖ ¸ Ò ´ µ Û Ù×Ø ÃÙÐÐ
¹Ä Ð Ö Ú Ö Ò
º»¾ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÓ ÐÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÓÐÖ×Ø¸ Û Û ÒØ ØÓ Ò×Ø Ò
Ø Ø ÛÓÙÐ × Ô Ö ØØ ÖÓÑÖ ÒØ
ÐÙ×Ø Ö×¸ Û Ø ÓÙØ ÒÝ ÔÖ ÓÖ ÒÓÛÐºÌ Ò ÓÙÖ Ó Ð ÛÓÙÐØÓ ÕÙ ÒØ Þ ØÝÔ Ö×Ô
ØÖ Ð Ñ
ÓÖ Ò ØÓ Ø × ×Ø Ò
¸ Ò ØÓ ×Ô Ø Ð Ò ÓÖÑ Ø ÓÒº»¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×½Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×½¼ » ¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Å Ò ÓÛ× ÒÓÖÑ× ÓÒ Ô×Ø Ò
Ò Ë Å Ò ×Ô Ö
ÐÄÔÏ Ù×ØÔ−ÒÓÖÑºÈÜ − ÈÝ½Ô=ÈÜ , − ÈÝ , ÔÔºÏ
Ò
ÓÒ× Ö Ø Ø
Ô Ü Ð ÓÐÐÓÛ ÑÙÐØ ÒÓÑ Ð ×ØÖ ÙØ ÓÒ ÓÔ Ö Ñ Ø Ö ÈÜ º ÁØ × Ø Ò ÔÓ×× Ð ØÓÒ× Ö¹Ê Ó ×Ø Ò
ØÛ Ò , ¸Ö ÔÖ × ÒØÝ Ø Ö Ô ÈÜ , ÈÝ ¸ Û
× Ø ×Ô Ö
Ð ×Ø Ò
ËÔ Ö ( , ) = ¾ Ö
Ó×(ÈÜ , ÈÝ , )Ì Ö Ö ×ÓÑ × Ñ Ð Ö Ø × Û Ø×Ø Ò
Ø Ø ×
Ð ××
ÐÐÝ Ù× ÓÒÝÔ Ö×Ô
ØÖ Ð Ñ × Ø Ë Å¸ Û
ÒÒØÛ Ò ,×½)=( ¾, ¾)Ë Å ( , ) = Ö
Ó×(¾ ËÔ Ö
ÐÌ × Ñ ØÖ
× ÒÚ Ö ÒØ ØÓ ×Ô
ØÖ Ð ÑÙÐØ ÔÐ
Ø ÓÒ × Ò
∗Ë Å (α , ) = Ë Å ( , )¸ ∀α ∈ R º ËÓ Ø × ÒÚ Ö ÒØ ØÓ ÐÐÙÑ Ò Ø ÓÒ
Ò ×¸ Û
Ò ÔÖÓ Ð Ñ Ø
ÓÒ Ö ÑÓØ × Ò× Ò º½½ » ¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ÒÝ Ú Ö Ò
×Ï
Ò Ù× ÓÒ ÝÔ Ö×Ô
ØÖ Ð ÑÚ Ö Ò
¸ Ø Ø Ö
ÐÐ Ø Ê ÒÝ Ú Ö Ò
Ó×ØÖ ÙØ ÓÒ ÈÜ ÖÓÑ×ØÖ ÙØ ÓÒ ÈÝÓÖ Ö α¸ α > ¼ Ó½½αÈÜ , ÈÝ ,−α .Ëα (ÈÜ ÈÝ ) =ÐÓα−½Ëα→½ (ÈÜ ÈÝ ) = Ë (ÈÜ ÈÝ ) Û Ö Ë × Ø ÙÐÐ
Ú Ö Ò
ÒÝÈÜ ,Ë (ÈÜ ÈÝ ) = ÈÜ , ÐÓÈÝ ,α = ½/¾ ÛÚ Ëα=½/¾ (ÈÜ ÈÝ ) = −¾ ÐÓ ½ − À ÐÐ Ò Ö ( , )/¾ Û ÖÒÝÀ ÐÐ Ò Ö × Ø À ÐÐ Ò Ö ×Ø Ò
À ÐÐ ( ,√) = (½/ ¾) (=½ÈÜ ,−ÈÝ ,½/¾)¾ ,ÕÙ Ö Ø
Ê ÒÝ Ú Ö Ò
¸ Û
× Ëα=¾ (ÈÜ ÈÝ ) =× Ø χ¾ ×Ø Ò
ÒÝÒ Ø
× α = ¾ Ð × ØÓ ØÐÓ ½ + χ¾ ( , ) ¸ Û Öχ¾χ¾ (,ÈÜ ,)==½ÛØ Ñ =ÈÜ ,−Ñ¾Ñ+ ÈÝ ,¾,½¾ » ¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×ÅÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ð ÒÓ × ×Ø Ò
Ï
Ò
ÓÒ× Ö
Ð ××
Ð ÑÓ Ð ÓÒ ÝÔ Ö×Ô
ØÖ Ð ÑÛ
××ÙÑ ×Ø Ø
×Ô
ØÖÙÑÓÐÐÓÛ× ×
ÓÖÖÙÔØÝ ÒØ Ú ÑÙÐØ Ú Ö ØÒÓÖÑ Ð ÒÓ × Ó Ñ Ò ¼ Ò Û ØÜ
ÓÚ Ö Ò
ÓÖ ÐÐ Ø ×Ô
ØÖ ºËÓ Û ÛÖ ØØ Ö Ð ×Ô
ØÖ ÒØ Ó × ÖÚ Ø ÓÒ Û Ø Ø= + Æ Ò ×Ó ∼ N (µ , Σ)¸ Û Ø µ = º ÁØ ØÙÖÒ× ÓÙØ Ø Ø ØÔÔ Ò× ØÓ Ø Å Ð ÒÓ ×× Ö¹Ê Ó ×Ø Ò
ØÛ Ò ,×Ø Ò
ÒÝÅÌ ÒØ Ö ×ØÌ½−Ð ÒÓ ×( , ) = (µ − µ ) Σ (µ − µ )´½µÕÙ ×Ø ÓÒ ÓÒ ÓÛ ØÓ ×× ×× Σº½¿ » ¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÖØ ÅÓÚ Ö ×Ø Ò
Ä Ø Ù×
ÓÒ× Ö ØÛÓ ×Ô
ØÖÒÌ Ö ÖØ ÅÓÚ Ö ×Ø Ò
Ò¸ Ö ÔÖ × ÒØÒÝÒÅ (ÈÜ , ÈÝ ) = α Ñ∈M,ÝØÖ Ö ×Ô
Ø Ú ÔÈÜ , ÈÝ ºα,( , )=½ =½Û Ö M = {α , ≥ ¼; =½ α , = ÈÝ , ; =½ α , == ÈÜ , } Ò× Ø
Ó×ØÙÒ
Ø ÓÒºÖ ÒØ
Ó
× Ó
Ó×Ø ÙÒ
Ø ÓÒ× ÚÒ
ÓÒ× Ö º À Ö Û Û ÐÐ
ÓÓ× ØÛÓÖ ÒØ
Ó×Ø ÙÒ
Ø ÓÒ×º ÌÖ×Ø ÓÒ
ÒÒ×½½( , ) =ÁÒ Ø ×
× ¸ ØØÔÔ Ò× Ø Ø ØÖØ ÅÓÚ Ö½Å ½ (ÈÜ , ÈÝ ) =ÓØ Ö
Ó×Ø ÙÒ
Ø ÓÒ × Ð
ØÚ ÐÙ Ó Ø×Ø Ò
×Ü− Ý½×¾( , ) =Û Ö × ×Ø −  −  − ≤×× ÓØ ÖÛ ×Ø Ö × ÓÐ º Ï Û ÐÐ ÛÖ Ø Ø × ×Ø Ò
Å ¾º½ »¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×½Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×½ »¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÉÙ ÒØ Þ Ø ÓÒ × Ø ÔÖÓ
×× Û
ÐÐÓÛ× ØÓ ÔÔÖÓ
× Ò Ð Û Ø Ð Ö× Ø Ó Ú ÐÙ × Ý × Ò Ð ÓÒ ×Ñ ÐÐ Ö × Øº ÁÑ × Ö × Ò Ð× ÓÒ ×Ô Ø ÐÓÑ Ò¸ ×Ó Ø Ö ÕÙ ÒØ Þ Ø ÓÒ × ÓÙÐ Ø × ÒØÓ
ÓÙÒØ Ø ÜÔ
Ø×Ô Ø Ð
Ó Ö Ò
º ÌÓ
Ú Ø × Ó Ð¸ Û
ÓÓ× ØÓ Ù× α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× Ö ÔÖ × ÒØ Ø ÓÒ ¸ Ø Ø ÔÖÓ Ù
× Ò ÑÔ ÖØ Ø ÓÒ ÒØÓÓÑÓ ÒÓÙ× ×Ô Ø Ð
Ð ×× ×ºËÓ ÐÐ ¸ Èº ´¾¼¼ µº ÓÒ×ØÖ Ò
ÓÒÒ
Ø Ú ØÝ ÓÖ Ö Ö
Ð Ñ Ô ÖØ Ø ÓÒ Ò Ò× ÑÔÐ
Ø ÓÒº È ØØ ÖÒ Ò ÐÝ× × Ò Å
Ò ÁÒØ ÐÐ Ò
¸ ÁÌÖ Ò×
Ø ÓÒ× ÓÒ¸ ¿¼´ µ¸½½¿¾¹½½ ºÙ Ù Ò¸ Äº¸ Î Ð ×
Ó¹ ÓÖ ÖÓ¸ Ëº¸ ² ËÓ ÐÐ ¸ Èº ´¾¼½ µº ÄÓ
Ð ÑÙØÙ Ð Ò ÓÖÑ Ø ÓÒ ÓÖ×× Ñ Ð Ö ØÝ¹ × Ñ × Ñ ÒØ Ø ÓÒº ÂÓÙÖÒ Ð Ó Ñ Ø Ñ Ø
Ð Ñ Ò Ò Ú × ÓÒ¸´¿µ¸ ¾ ¹ º½ »¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÚÒ×Ø Ò
: R × R −→ R¸ ØÛÓ Ô Ü Ð× ( (Ü ), (Ý )) ∈ (R )ÐÓÒ ØÓ Ø × Ñ α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× ÓÒ ÓÒÐÝ Ø Ö ×Ô Ø (Ô , . . . , Ô ) ∈×Ù
× Ô = Ü Ò Ô = Ý Ò∀ ∈ [½, Ò − ½]¸ ( (Ô ), (Ô + )) ≤ α Ò α ∈ R+¾¼ÒÒ¼Ò½ÙÖ ÉÙ × ¹ Ø ÞÓÒ Ó Ø
ÐÙ×Ø Ö×ºÔ ÚÝÔ Ö×Ô
ØÖ Ð Ñ¸Ø ÖÖ¿½½ »¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÖ×Ø Ô ÖØ Ø ÓÒ Ó Ò ÑØÓ ÛÓÖ Û Ø ×ÙÔ ÖÔ Ü Ð×Ø Ò Û ØÖ Ò× ÓÖÑ Ø ÑÖ ÔÖ × ÒØ Ø ÓÒ ÒØÓ Ö ÔÖ ÔÖ × ÒØ Ø ÓÒ
ÐÐ Ø Ö ÓÒ
Ò
Ý Ö Ô ´Ê µº ÁØ ×Ö Ô Û Ö
ÒÓ × ×ÙÔ ÖÔ Ü Ð¸ Ò× Ö ÔÖ × ÒØ Ø×× Ñ Ð Ö ØÝ ØÛ Ò ×ÙÔ ÖÔ Ü Ð×º´ µ´ µ½ »¾ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÏ Ö ÔÖ × ÒØ
ÒÓ ´Û
× × Ø Ó Ô Ü Ð×µ Ó ØÖ Ô ÝØÖÝ
ÒØÖ
ÓÖ Ò ØÓ Ø Ñ ØÖ
Ó Ø × Ø Ó Ô Ü Ð× È ¸ Û
×Ý(Ñ, Ü ).Ñ= Ö ÑÒÑ∈ÈÜÒ´¾µ∈ÈÁ ÐÐ Ø Û Ø× Ö ÕÙ Ð¸ Û × Ý × ÑÔÐÝ Ø Ø Ñ × ØÌ Ò Û
Ð
ÙÐ Ø Ø α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× Ó ØÓÑ ØÖ
ÑÖ Ô ºÒº½ »¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×½ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×¾¼ » ¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×ÌÓ Ú ÐÙ Ø ØÑ ×ÙÖ × Û Ù×Ö ÒØ
Ö Ø Ö ºÈÖ
Ø
Ð ××C½
ØÙ Ð
Ð ××ÌC½C¾C¿C¿½½½¾½¿¾½¾¾¾¿¿½¿¾¿¿ÇÚ Ö ÐÐ
ÙÖ
ÝÇ¿=½=¿¿=½ÌC¾× ½¼¼´¿µ=½Ð ××
ÙÖ
Ý Ó
Ð ××=¿× ½¼¼´ µ× ½¼¼´ µ=½ÌÐ ××
ÙÖ
Ý Ó
Ð ××¿==½¿¾½ » ¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×Ê ×ÙÐØ× ÓÒ È ÚÄ½ÇÊ ÒËÆÊÄ¾Ä∞¼.¼¼½¼º¼¼½¼º ¿¾¾º ¾¼º¼¼½¼º¼¼½¼º ¿¾¾º¼º¼¼¿¼º¼½¿¼º ½¾¾ºËÁÇÊ ÒËÆÊÌ Ð¼º¼½¾¼º¼¼º¿¾½º¾ËÔ ÖË Å¼º¼½¾¼º¼¼º¼¼º½¾¼º ½¼º ¾¾¾º ¾½ º ¼Ê ×ÙÐØ× ÓÒ È ÚÅ Ð½Å Ð¾¼º¼¼¼º¼¼º¼¼¿¼º¾¾¼º¼º¿¾½ºÓÑÔ Ö ×ÓÒ Ó ÔÖÓÑÀ ÐÐ¼º¼¼½¼º¼¼½¼º¾½º ¾ÑÃÓÐÑÓ¼º¼¼¼º¼¼º ¼¾¾º¼χ¾¼º¾¿¼º½½¼º¿¿¾¿º ¼Å ½¼º¼¼¼º½¼º ¼¾¾ºËËα=½/¾Ëα=¾¾½º¼º ¼¼º¾¾¼º¾½º¼º ¼¼º¾¾¼º¾¾¾½º¾¼¼º ½¼º¾¼º¾½Å ¾¼º¼¼¼º¼¼º ½Ð ×Ø
×Ø Ò
× ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ñ×º¾¾ » ¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×Ä½ÇÊ ÒËÆÊÄ¾Ä∞¼º¼½¼º¼½¾¼º½¾º¼º¼½¼º¼¼¾¾¼º½¾º¼º¼½½¼º¼½¼º½¾º ¿ËÁÇÊ ÒËÆÊÌ Ð¼º¼½¼¼º¼¼º½¾º ¿Ê ×ÙÐØ× ÓÒ ÁÒËÔ ÖÒ ÈÒ ×Ë ÅÑÀ ÐÐ¼º¼½¼º¼¼º¼½¼º¼½¼º½¿¼º¼½¼º¼º¾¾¼º½¾º ¿º½¾ºÊ ×ÙÐØ× ÓÒ ÁÒ Ò È Ò × ÑÅ Ð½Å Ð¾ÃÓÐÑÓ¼º¼½¼º¼¼º¼½ ¾¼º¼½¼º½¿¼º¼¼½¼º¿¼º¿¼º½¾º½½º¼½ÓÑÔ Ö ×ÓÒ Ó ÔÖÓχ¾¼º¿¼¼º¾¼º½½ º¼½Å ½¼º¼¾¼º¼¼º½¾ºËËα=½/¾Ëα=¾¼º¼½¼¼º¼¼º½¾º¼º¼½¼¼º¼¼º½¾º ½¼º¼½¼¼º¼¼º½¾ºÅ ¾¼º¼½ ¾¼º¼½¼º ¾Ð ×Ø
×Ø Ò
× ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ñ×º¾¿ » ¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×´ µ´ µ´
µ´ µ´µÙÖ ´ µ Ð× Ê
ÓÐÓÖ Ñ´Ù× Ò Ø Ö ×Ô
ØÖ Ð Ò ×µ Ó ÁÒ Ò È Ò ×ÝÔ Ö×Ô
ØÖ Ð Ñ º Ð× Ê
ÓÐÓÖ Ñ Ó Ø ÕÙ ÒØ ÞÝÔ Ö×Ô
ØÖ ÐÑ Ø Ò × ØÓ Ò ´ µ Ø ÒÓÖÑ ¾¸ ´
µ Ø Ë Å¸ ´ µ Ø χ¾ ×Ø Ò
¸ ´ µ ØÅ º¾ »¾ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×´ µ´ µ´
µ´ µ´µÙÖ ´ µ Ð× Ê
ÓÐÓÖ Ñ´Ù× Ò Ø Ö ×Ô
ØÖ Ð Ò ×µ Ó È ÚÝÔ Ö×Ô
ØÖ Ð Ñ º Ð× Ê
ÓÐÓÖ Ñ Ó Ø ÕÙ ÒØ ÞÝÔ Ö×Ô
ØÖ ÐÑ Ã = ¿¼¼¼ Ø Ò × ØÓ Ò ´ µ Ø ÒÓÖÑ ¾¸ ´
µ Ø Ë Å¸ ´ µ Ø χ¾×Ø Ò
¸ ´ µ ØÅ º¾ »¾
Optimal Transport and applications in Imagery/Statistics (chaired by Bertrand Maury, Jérémie Bigot)
Optimal transport (OT) is a major statistical tool to measure similarity between features or to match and average features. However, OT requires some relaxation and regularization to be robust to outliers. With relaxed methods, as one feature can be matched to several ones, important interpolations between different features arise. This is not an issue for comparison purposes, but it involves strong and unwanted smoothing for transfer applications. We thus introduce a new regularized method based on a nonconvex formulation that minimizes transport dispersion by enforcing the onetoone matching of features. The interest of the approach is demonstrated for color transfer purposes.

Introduction1 / 30Adaptive color transferwith relaxed optimal transportJulien Rabin1 , Sira Ferradans2 and Nicolas Papadakis31GREYC, University of Caen, 2 Data group, ENS, 3 CNRS, Institut de Mathématiques de BordeauxConference on Geometric Science of InformationJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportIntroduction2 / 30Optimal transport on histogramsMongeKantorovitch (MK) discrete mass transportation problem:Map µ0 onto µ1 while minimizing the total transport cost�������������The two histograms must have the same mass.Optimal transport cost is called the Wasserstein distance (Earth Mover’sDistance)Optimal transport map is the application mapping µ0 onto µ1J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportIntroduction3 / 30Applications in Image Processing and Computer VisionOptimal transport as a framework to deﬁne statisticalbased toolsApplications to many imaging and computer vision problems:• Robust dissimilarity measure (Optimal transport cost):Image retrieval [Rubner et al., 2000] [Pele and Werman, 2009]SIFT matching [Pele and Werman, 2008] [Rabin et al., 2009]3D shape recognition, Feature detection [Tomasi]Object segmentation [Ni et al., 2009] [Swoboda and Schnorr, 2013]• Tool for matching/interpolation (Optimal transport map):Nonrigid shape matching, image registration [Angenent et al., 2004]Texture synthesis and mixing [Ferradans et al., 2013]Histogram speciﬁcation and averaging [Delon, 2004]Color transfer [Pitié et al., 2007], [Rabin et al., 2011b]Not to mention other applications (physics, economy, etc).J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportIntroduction4 / 30Color transferOptimal transport of µ onto νTarget image (µ)Target image after color transferSource image (ν)Limitations:• Mass conservation artifacts• Irregularity of optimal transport mapJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportIntroduction5 / 30OutlineOutline:Part I. Computation of optimal transport between histogramsPart II. Optimal transport relaxation and regularizationApplication to color transferJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportOptimal transport framework6 / 30Part IWasserstein distance between histogramsJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportOptimal transport framework7 / 30Formulation for clouds of pointsDeﬁnition: L2 Wasserstein Distance Given two clouds of points1X , Y ⊂ Rd×N of N elements in Rd with equal masses N , the quadraticWasserstein distance is deﬁned asW2 (X , Y )2 = minσ∈ΣN1NNXi − Yσ(i)2i=1where ΣN is the set of all permutations of N elements.J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport(1)Optimal transport framework7 / 30Formulation for clouds of pointsDeﬁnition: L2 Wasserstein Distance Given two clouds of points1X , Y ⊂ Rd×N of N elements in Rd with equal masses N , the quadraticWasserstein distance is deﬁned asW2 (X , Y )2 = minσ∈ΣN1NNXi − Yσ(i)2(1)i=1where ΣN is the set of all permutations of N elements.⇔ Optimal Assignment problem, can be computed using standardsorting algorithms when d = 1J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transportOptimal transport framework8 / 30Exact solution in unidimensional case (d = 1) for histogramsHistograms may be seen as clouds of points with nonuniform masses, sothatMmi δXi (x),µ(x)
We introduce the generalized Pareto distributions as a statistical model to describe thresholded edgemagnitude image filter results. Compared to the more commonWeibull or generalized extreme value distributions these distributions have at least two important advantages, the usage of the high threshold value assures that only the most important edge points enter the statistical analysis and the estimation is computationally more efficient since a much smaller number of data points have to be processed. The generalized Pareto distributions with a common threshold zero form a twodimensional Riemann manifold with the metric given by the Fisher information matrix. We compute the Fisher matrix for shape parameters greater than 0.5 and show that the determinant of its inverse is a product of a polynomial in the shape parameter and the squared scale parameter. We apply this result by using the determinant as a sharpness function in an autofocus algorithm. We test the method on a large database of microscopy images with given ground truth focus results. We found that for a vast majority of the focus sequences the results are in the correct focal range. Cases where the algorithm fails are specimen with too few objects and sequences where contributions from different layers result in a multimodal sharpness curve. Using the geometry of the manifold of generalized Pareto distributions more efficient autofocus algorithms can be constructed but these optimizations are not included here.

Generalized Pareto Distributions, Image Statistics andAutofocusing in Automated MicroscopyReiner LenzMicroscopy34 slices changing focus along the optical axisFocal Sequence – First 4x16 images4Focal Sequence – Next 4x16 images5Focal Sequence – Final 4x16 imagesTotal Focus67Observations• Autofocus is easy••••It is independent on image content (what is in the image)It is independent of imaging method (how image is produced)It is fast (‘realtime’)It is local (which part of the image is in focus)• It is obviously useful in applications (microscopy, camera, …)• It is useful in understanding lowlevel vision processes• It is illustrates relation between scenestatistics and vision8Processing Pipeline / TechniquesFilteringGroupRepresentationsThresholdingExtreme ValueStatisticsCritical PointsInformationGeometry9FilteringRepresentations of dihedral GroupsMost images are defined on square gridsThe symmetry group of square grids is the dihedral group D(4)Consists of 8 elements: 4 rotations and 4 (rotation+reflection)For a 5x5 array choose six filter pairs resultingin a 6x2 vector at each pixel
We study barycenters in the Wasserstein space Pp(E) of a locally compact geodesic space (E, d). In this framework, we define the barycenter of a measure ℙ on Pp(E) as its Fréchet mean. The paper establishes its existence and states consistency with respect to ℙ. We thus extends previous results on ℝ d , with conditions on ℙ or on the sequence converging to ℙ for consistency.

Barycenter in Wasserstein spaces:existence and consistencyThibaut Le Gouic and JeanMichel Loubes*Institut de Math´matiques de Marseillee´Ecole Centrale MarseilleInstitut Math´matique de Toulouse*eOctober 29th 20151 / 23Barycenter in Wasserstein spacesBarycenterThe barycenter of a set {xi }1≤i≤J of Rd for J points endowed withweights (λi )1≤i≤J is deﬁned asλi xi .1≤i≤JIt is characterized by being the minimizer ofx→λi x − xi2.1≤i≤J2 / 23Barycenter in Wasserstein spacesBarycenterThe barycenter of a set {xi }1≤i≤J of Rd for J points endowed withweights (λi )1≤i≤J is deﬁned asλi xi .1≤i≤JIt is characterized by being the minimizer ofx→λi x − xi2.1≤i≤JReplace (Rd , . ) by a metric space (E , d), and minimizeλi d(x, xi )2 .x→1≤i≤J2 / 23Barycenter in Wasserstein spacesBarycenterLikewise, given a random variable/vector of law µ on Rd , itsexpectation EX is characterized by being the minimizer ofx →E X −x2.3 / 23Barycenter in Wasserstein spacesBarycenterLikewise, given a random variable/vector of law µ on Rd , itsexpectation EX is characterized by being the minimizer ofx →E X −x2.→ extension to a metric space (it summarizes the informationstaying in a geodesic space)3 / 23Barycenter in Wasserstein spacesBarycenterDeﬁnition (pbarycenter)Given a probability measure µ on a geodesic space (E , d), the setarg min x ∈ E ;d(x, y )p dµ(y ) ,is called the set of pbarycenters of µ.4 / 23Barycenter in Wasserstein spacesBarycenterDeﬁnition (pbarycenter)Given a probability measure µ on a geodesic space (E , d), the setarg min x ∈ E ;d(x, y )p dµ(y ) ,is called the set of pbarycenters of µ.Existence ?4 / 231Geodesic space2Wasserstein space3Applications5 / 23Barycenter in Wasserstein spacesGeodesic spaceDeﬁnition (Geodesic space)A complete metric space (E , d) is said to be geodesic if for allx, y ∈ E , there exists z ∈ E such that1d(x, y ) = d(x, z) = d(z, y ).26 / 23Barycenter in Wasserstein spacesGeodesic spaceDeﬁnition (Geodesic space)A complete metric space (E , d) is said to be geodesic if for allx, y ∈ E , there exists z ∈ E such that1d(x, y ) = d(x, z) = d(z, y ).2Include many spaces (vectorial normed spaces, compactmanifolds, ...),6 / 23Barycenter in Wasserstein spacesGeodesic spaceProposition (Existence)The pbarycenter of any probability measure on a locally compactgeodesic space, with ﬁnite moments of order p, exists.7 / 23Barycenter in Wasserstein spacesGeodesic spaceProposition (Existence)The pbarycenter of any probability measure on a locally compactgeodesic space, with ﬁnite moments of order p, exists.Not unique e.g. the sphereNon positively curved space → unique barycenter,1Lipschitz on 2Wasserstein space.7 / 231Geodesic space2Wasserstein space3Applications8 / 23Barycenter in Wasserstein spacesWasserstein metricDeﬁnition (Wasserstein metric)Let µ and ν be two probability measures on a metric space (E , d)and p ≥ 1.The pWasserstein distance between µ and ν is deﬁned aspWp (µ, ν) =infπ∈Γ(µ,ν)dE (x, y )p dπ(x, y ),where Γ(µ, ν) is the set of all probability measures on E × E withmarginals µ and ν.9 / 23Barycenter in Wasserstein spacesWasserstein metricDeﬁnition (Wasserstein metric)Let µ and ν be two probability measures on a metric space (E , d)and p ≥ 1.The pWasserstein distance between µ and ν is deﬁned aspWp (µ, ν) =infπ∈Γ(µ,ν)dE (x, y )p dπ(x, y ),where Γ(µ, ν) is the set of all probability measures on E × E withmarginals µ and ν.Deﬁned for any measure for which moments of order p areﬁnite : Ed(X , x0 )p < ∞ (denote this set Pp (E )),It is a metric on Pp (E ) ; (Pp (E ), Wp ) is called theWasserstein space,The topology of this metric is the weak convergence topologyand convergence of moments of order p.9 / 23Barycenter in Wasserstein spacesWasserstein metricThe Wasserstein space of a complete geodesic space is acomplete geodesic space.(Pp (E ), Wp ) is locally compact ⇔ (E , d) is compact.(E , d) ⊂ (Pp (E ), Wp ) isometrically.Existence of the barycenter on (Pp (E ), Wp ) ?10 / 23Barycenter in Wasserstein spacesMeasurable barycenter applicationDeﬁnition (Measurable barycenter application)Let (E , d) be a geodesic space. (E , d) is said to admitmeasurable barycenter applications if for any J ≥ 1 and anyweights (λj )1≤j≤J , there exists a measurable application T fromE J to E such that for all (x1 , ..., xJ ) ∈ E J ,Jminx∈EJpλj d(T (x1 , ..., xJ ), xj )p .λj d(x, xj ) =j=1j=111 / 23Barycenter in Wasserstein spacesMeasurable barycenter applicationDeﬁnition (Measurable barycenter application)Let (E , d) be a geodesic space. (E , d) is said to admitmeasurable barycenter applications if for any J ≥ 1 and anyweights (λj )1≤j≤J , there exists a measurable application T fromE J to E such that for all (x1 , ..., xJ ) ∈ E J ,Jminx∈EJpλj d(T (x1 , ..., xJ ), xj )p .λj d(x, xj ) =j=1j=1Locally compact geodesic spaces admit measurable barycenterapplications.11 / 23Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.12 / 23Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.Barycenter is not unique e.g. :1E = Rd with P = 1 δµ1 + 2 δµ2 ,21µ1 = 1 δ(−1,−1) + 1 δ(1,1) and µ2 = 2 δ(1,−1) + δ(−1,1)2212 / 23Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.Barycenter is not unique e.g. :1E = Rd with P = 1 δµ1 + 2 δµ2 ,21µ1 = 1 δ(−1,−1) + 1 δ(1,1) and µ2 = 2 δ(1,−1) + δ(−1,1)22Consistency of the barycenter ?12 / 23Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem2Weak consistency3Approximation by ﬁnitely supported measures13 / 23Barycenter in Wasserstein spacesPush forwardDeﬁnition (Push forward)Given a measure ν on E and an measurable applicationT : E → (F , F), the push forward of ν by T is given byT#ν (A) = ν T −1 (A) , ∀A ∈ F.Probabilist version : X is a r.v. on (Ω, A, P), then PX = X#P .14 / 23Barycenter in Wasserstein spacesMultimarginal problemTheorem (Barycenter and multimarginal problem[Agueh and Carlier, 2011])Let (E , d) be a complete separable geodesic space, p ≥ 1 andJ ∈ N∗ . Given (µi )1≤i≤J ∈ Pp (E )J and weights (λi )1≤i≤J , thereexists a measure γ ∈ Γ(µ1 , ..., µJ ) minimizingγ→ˆλi d(xi , x)p d γ (x1 , ..., xJ ).ˆinfx∈E1≤i≤JIf (E , d) admits a measurable barycenter applicationT : E J → E then the measure ν = T# γ is a barycenter of(µi )1≤i≤JIf T is unique, ν is of the form ν = T# γ.15 / 23Barycenter in Wasserstein spacesWeak consistencyTheorem (Weak consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) converging to P ∈ Pp (E ). Take anybarycenter µj of Pj .Then the sequence (µj )j≥1 is (weakly) tight and any limit point isa barycenter of P.16 / 23Barycenter in Wasserstein spacesApproximation by ﬁnitely supported measureProposition (Approximation by ﬁnitely supported measure)For any measure P on Pp (E ) there exists a sequence of ﬁnitelysupported measures (Pj )j≥1 ⊂ Pp (E ) such thatWp (Pj , P) → 0 as j → ∞.17 / 23Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem2Weak consistency3Approximation by ﬁnitely supported measures18 / 23Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P ﬁnitely supported.2Weak consistency3Approximation by ﬁnitely supported measures18 / 23Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P ﬁnitely supported.2Weak consistency→ existence of barycenter for probabilities that can beapproximated by measures with barycenters.3Approximation by ﬁnitely supported measures18 / 23Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P ﬁnitely supported.2Weak consistency→ existence of barycenter for probabilities that can beapproximated by measures with barycenters.3Approximation by ﬁnitely supported measures→ any probability can be approximated by a ﬁnitely supportedprobability measure.18 / 23Barycenter in Wasserstein spacesConsistency of the barycenterTheorem (Consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) and P ∈ Pp (E ). Take any barycenter µj ofPj .Then the sequence (µj )j≥1 is totally bounded in (Pp (E ), Wp ) andany limit point is a barycenter of P.19 / 23Barycenter in Wasserstein spacesConsistency of the barycenterTheorem (Consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) and P ∈ Pp (E ). Take any barycenter µj ofPj .Then the sequence (µj )j≥1 is totally bounded in (Pp (E ), Wp ) andany limit point is a barycenter of P.Imply continuity of barycenter when barycenter are unique.No rate of convergence (barycenter Lipschitz on (E , d)Lipschitz on Pp (E )).Imply compactness of the set of barycenters.19 / 231Geodesic space2Wasserstein space3Applications20 / 23Barycenter in Wasserstein spacesStatistical application : improvement of measures accuracyTake (µn )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J .iSet µn the barycenter of (µn )1≤j≤J .iBThen, as n → ∞,µ n → µB .B21 / 23Barycenter in Wasserstein spacesStatistical application : improvement of measures accuracyTake (µn )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J .iSet µn the barycenter of (µn )1≤j≤J .iBThen, as n → ∞,µ n → µB .BTexture mixing [Rabin et al., 2011]21 / 23Barycenter in Wasserstein spacesStatistical application : growing number of measuresTake (µn )n≥1 such that1nnµi → P.i=1Set µn the barycenter ofB1nnδµi .i=1Then, as n → ∞,µ n → µBB22 / 23Barycenter in Wasserstein spacesStatistical application : growing number of measuresTake (µn )n≥1 such that1nnµi → P.i=1Set µn the barycenter ofB1nnδµi .i=1Then, as n → ∞,µ n → µBBAverage of template deformation[Bigot and Klein, 2012],[Agull´Antol´ et al., 2015]oın22 / 23Agueh, M. and Carlier, G. (2011).Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2) :904–924.Agull´Antol´ M., CuestaAlbertos, J. A., Lescornel, H., andoın,Loubes, J.M. (2015).A parametric registration model for warped distributions withWasserstein’s distance.J. Multivariate Anal., 135 :117–130.Bigot, J. and Klein, T. (2012).Consistent estimation of a population barycenter in theWasserstein space.ArXiv eprints.Rabin, J., Peyr´, G., Delon, J., and Bernot, M. (2011).eWasserstein Barycenter and its Application to Texture Mixing.SSVM’11, pages 435–446.23 / 23
Univariate Lmoments are expressed as projections of the quantile function onto an orthogonal basis of univariate polynomials. We present multivariate versions of Lmoments expressed as collections of orthogonal projections of a multivariate quantile function on a basis of multivariate polynomials. We propose to consider quantile functions defined as transports from the uniform distribution on [0; 1] d onto the distribution of interest and present some properties of the subsequent Lmoments. The properties of estimated Lmoments are illustrated for heavytailed distributions.

Multivariate LMoments Based on TransportsAlexis DecurningeHuawei TechnologiesGeometric Science of InformationOctober 29th, 2015Outline1 LmomentsDeﬁnition of Lmoments2 Quantiles and multivariate LmomentsDeﬁnitions and propertiesRosenblatt quantiles and LmomentsMonotone quantiles and LmomentsEstimation of LmomentsNumerical applicationsDeﬁnition of LmomentsLmoments of a distribution :if X1 ,...,Xr are real random variables with common cumulativedistribution function Fλr =1rr −1(−1)kk=0r −1E[Xr −k:r ]kwith X1:r ≤ X2:r ≤ ... ≤ Xr :r : order statisticsλ1 = E[X ] : localizationλ2 = E[X2:2 − X1:2 ] : dispersionτ3 =λ3λ2=E[X3:3 −2X2:3 +X1:3 ]E[X2:2 −X1:2 ]τ4 =λ4λ2=E[X4:4 −3X3:4 +3X2:4 −X1:4 ]E[X2:2 −X1:2 ]Existence ifxdF (x) < ∞: asymmetry: kurtosisCharacterization of LmomentsLmoments are projections of the quantile function on anorthogonal basis1λr=F −1 (t)Lr (t)dt0F −1 generalized inverse of FF −1 (t) = inf {x ∈ R such that F (x) ≥ t}Lr Legendre polynomial (orthogonal basis in L2 ([0, 1]))r(−1)kLr (t) =k=0rk2t r −k (1 − t)kLmoments completely characterize a distribution∞F −1 (t) =(2r + 1)λr Lr (t)r =1Deﬁnition of Lmoments (discrete distributions)Lmoments for a multinomial distribution of supportx1 ≤ x2 ≤ ... ≤ xn and weights π1 , ..., πn ( n πi = 1)i=1nn(r )wi xi =λr =i=1iKri=1i−1πa− Kra=1with Kr the respective primitive of Lr : Kr = Lrπaa=1xiEmpirical LmomentsUstatistics : mean of all subsequences of size r withoutreplacement1nrˆλr =1≤i1 <···
Probability Density Estimation (chaired by Jesús Angulo, S. Said)
The two main techniques of probability density estimation on symmetric spaces are reviewed in the hyperbolic case. For computational reasons we chose to focus on the kernel density estimation and we provide the expression of Pelletier estimator on hyperbolic space. The method is applied to density estimation of reflection coefficients derived from radar observations.

Probability density estimation on the hyperbolicspace applied to radar processingOctober 28, 2015Emmanuel Chevalliera , Frédéric Barbarescob , Jesús AnguloaabCMMCentre de Morphologie Mathématique, MINES ParisTech; FranceThales Air Systems, Surface Radar Domain, Technical Directorate,Advanced Developments Department, 91470 Limours, Franceemmanuel.chevallier@minesparistech.fr1/20Probability density estimation on the hyperbolic spaceThree techniques of nonparametric probability densityestimation:histogramskernelsorthogonal seriesThe Hyperbolic space of dimension 2Histograms, kernels and orthogonal series in the hyperbolicspaceDensity estimation of radar data in the Poincaré disk2/20Probability density estimation on the hyperbolic spaceThree techniques of nonparametric probability densityestimationHistograms:partition of the space into a set of binscounting the number of samples per bins3/20Probability density estimation on the hyperbolic spaceKernels:a kernel is placed over each samplethe density is evaluated by summing the kernels4/20Probability density estimation on the hyperbolic spaceOrthogonal series: the true density f is studied through theestimation of the scalar products between f and an orthonormalbasis of real functions.Let f be the true densityf ,g =f g dµlet {ei } is a orthogonal Hilbert basis of real functions∞f =f , ei ei ,i=−∞sincefI , ei =fI ei dµ = E (ei (I )) ≈1nnei (I (pj ))j=1we can estimate f by:Nf ≈i=−N5/201nnei (I (pj )) ei = fˆ.j=1Probability density estimation on the hyperbolic spaceHomogeneity and isotropy considerationnon homogeneous binsnon istropic binsAbsence of prior on f : the estimation should be as homogeneousand isotropic as possible.→ choice of bins, kernels or orthogonal basis6/20Probability density estimation on the hyperbolic spaceRemark on homogeneity and isotropyFigure: Random variableX ∈ Circle .The underlying space is nothomogeneous and not isotropic, the density estimation can not considerevery points and directions in an equivalent way.7/20Probability density estimation on the hyperbolic spaceThe 2 dimensional hyperbolic space and the Poincaré diskThe only space of constant negative sectional curvatureThe Poincaré disk is a model of hyperbolic geometry2dsD = 4dx 2 + dy 2(1 − x 2 − y 2 )2Homogeneous and isotropic8/20Probability density estimation on the hyperbolic spaceDensity estimation in the hyperbolic space: histogramsA good tilling: homogeneous and isotropicThere are many polygonal tilings:There is no homotetic transformations for all λ ∈ RProblem: not always possible to scale the tiling to the studieddensity9/20Probability density estimation on the hyperbolic spaceDensity estimation in the hyperbolic space: orthogonal series10/20Standard choice of basis: eigenfunctions of the Laplacian operator∆In Rn : (ei ) = Fourier basis → characteristic function densityestimator.f , [a, b] → R,∞f =f , ei ei ,i=−∞f , R → R,∞f =f , eω eω dω,ω=−∞Compact case: estimation of a sumNon compact case: estimation of an integralProbability density estimation on the hyperbolic spaceDensity estimation in the hyperbolic space: orthogonal seriesOn the Poincaré disk D, solutions of ∆f = λf are known forf ,D → Rbut not for f , D ⊂ D → R with D compactComputational problem: the estimation involves an integral, evenfor bounded support functions11/20Probability density estimation on the hyperbolic spaceKernel density estimation on Riemannian manifoldsK : R+ → R+ such that:i) Rd K (x)dx = 1,ii) Rd xK (x)dx = 0,iii) K (x > 1) = 0, sup(K (x)) = K (0).Euclidean kernel estimator:1fˆ =kk1irdKx, xi rRiemannian case:K12/20x − xi r→Kd(x − xi )rProbability density estimation on the hyperbolic spaceFigure: Volume changeθxiinduced by the exponential mapexpxθx : volume change (T M, Lebesgue) −→ (M, vol)Kernel density estimator proposed by Pelletier:1fˆ =kk13/201i1rdθxi (x)Kd(x, xi )rProbability density estimation on the hyperbolic spaceθxin the hyperbolic spaceθx can easily be computed in hyperbolic geometry.Polar coordinates at p ∈ D:at p ∈ D, if the geodesic of angle α of length r leads to q ,(r , α) ↔ qIn polar coordinates:ds 2 = dr 2 + sinh(r )2 dα2thusdvolpolar = sinh(r )drdαandθp ((r , θ)) =14/20sinh(r )rProbability density estimation on the hyperbolic spaceDensity estimation in the hyperbolic space: kernelsKernel density estimator:1fˆ =kk1irdd(x, xi )Ksinh(d(x, xi ))Formulation as a convolutionFourier −Helgason←→d(x, xi )r0rthogonal seriesReasonable computational cost15/20Probability density estimation on the hyperbolic spaceRadar dataSuccession of input vector z = (z0 , .., zn−1 ) ∈ Cnz : background or target?Assumptions: z = (z0 , .., zn−1 ) is a centered Gaussian process.Centered → dened by its covariancer0r1..Rn = E [ZZ ] = rn−1∗r1 .r0 r1..r1rn−1rn−2 . r1 r0Rn ∈ T n : Toeplitz (additional stationary assumption) and SPDmatrix16/20Probability density estimation on the hyperbolic spaceAuto regressive model17/20Auto regressive model of order k:kajk zl−jzl = −ˆj=1k th reection coecient :kµk = akDieomorphism ϕ:ϕ : T n → R∗ × Dn−1 , Rn → (P0 , µ1 , · · · , µn−1 )+(z0 , ..., zn−1 ) ↔ (P0 , µ1 , · · · , µn−1 )Probability density estimation on the hyperbolic spaceGeometry on18/20Tnϕ : T n → R∗ × Dn−1 , Rn → (P0 , µ1 , · · · , µn−1 )+metric on T n : product metric on R∗ × Dn−1+Multiple acquisitions of an identical background:distribution of the µk ?Potential use: identication of a nonbackground objectsProbability density estimation on the hyperbolic spaceApplication of density estimation to radar dataµ1 , N = 0.007µ2 , N = 1.61µ3 , N = 14.86µ1 , N = 0.18µ2 , N = 2.13µ3 , N = 4.81Figure: First row: ground, second row: Rain19/20Probability density estimation on the hyperbolic spaceConclusionThe density estimation on the hyperbolic space is not afundamentally dicult problemEasiest solution: kernelsFuture works:computation of the volume change in kernels for Riemannianmanifoldsdeepen the application for radar signalsThank you for your attention20/20Probability density estimation on the hyperbolic space
We address here the problem of perceptual colour histograms. The Riemannian structure of perceptual distances is measured through standards sets of ellipses, such as Macadam ellipses. We propose an approach based on local Euclidean approximations that enables to take into account the Riemannian structure of perceptual distances, without introducing computational complexity during the construction of the histogram.

Color Histograms using the perceptual metricOctober 28, 2015Emmanuel Chevalliera , Ivar Farupb , Jesús AnguloaCMMCentre de Morphologie Mathématique, MINES ParisTech; FranceGjovik University College; Franceemmanuel.chevallier@minesparistech.frab1/16Color Histograms using the perceptual metricPlan of the presentationFormalization of the notion of image histogramPerceptual metric and Macadam ellipsesDensity estimation in the space of colors2/16Color Histograms using the perceptual metricImage histogram : formalizationI :Ω → Vp → I (p)Ω: support space of pixels: rectangle/parallelepiped.V: the value space(Ω, µΩ ), (V , µV ), µΩ and µV are induced by the choosengeometries on Ω and V .Transport of µΩ on V : I ∗ (µΩ )Image histogram: estimation off =3/16dI ∗ (µΩ )dµVColor Histograms using the perceptual metricpixels: p ∈ Ω, uniformly distributed with respect to µΩ{I (p), p a pixel }: set of independent draws of the "randomvariable" I∗(µEstimation of f = dIdµVΩ ) from {I (p), p a pixel }:→ standard problem of probability density estimation4/16Color Histograms using the perceptual metricPerceptual color histogramsI :Ω → (M = colors, gperceptual )p →I (p)Assumption: the perceptual distances between colors is induced bya Riemannian metricThe manifold of colors was one of the rst example of Riemannianmanifold, suggested by Riemann5/16Color Histograms using the perceptual metricMacadam ellipses: just noticeable dierences6/16Chromaticity diagram (constant luminance):Ellipses: elementary unit balls → local L2 metricColor Histograms using the perceptual metricLab spaceThe Euclidean metric of the Lab parametrization is supposed to bemore perceptual than other parametrizationsFigure: Macadam ellipses in the ab planHowever, the ellipses are clearly not balls7/16Color Histograms using the perceptual metricModiction of the density estimator8/16Density → local notion. No need of knowing long geodesicsSmall distances → local approximation by an Euclidean metricNotations:dR : Perceptual metric.Lab : Canonical Euclidean metric of Lab.c : Euclidean metric on Lab induced by the ellipse at cSmall distances around c : .c is "better" than .LabColor Histograms using the perceptual metricModiction of the density estimatorStandard kernel estimator:1fˆ(x) =k1pi ∈{pixels}r2Kx − I (pi )LabrPossible modicationKx − I (pi )Labr→Kx − I (pi )I (pi )rwhere .I (pi ) is an Euclidean distance dened by the interpolatedellipse at I (pi ).9/16Color Histograms using the perceptual metricGenerally, at c a color:limx→cx − cLabx − cc= 1 = limx→cdR (x, c)dR (x, c)Thus, ∃A > 0 such that,∀R > 0, ∃x ∈ BLab (c, R), A <x − c−1 .dR (x, c)while ∃Rc = Rc,A such that,∀x ∈ BLab (c, Rc ),x − cc− 1 < A.dR (x, c)hencesupBLab (c,Rc )10/16x − cc−1dR (x, c)< A < supBLab (c,Rc )x − c−1dR (x, c)Color Histograms using the perceptual metric.When the scaling factor r is small enough:r ≤ Rc and Bc (c, r ) ⊂ BLab (c, Rc )x ∈ B(c, Rc ), Kbetter than Kx ∈ B(c, Rc ), K/11/16x−ccrx−ccr=Kx−cLabrx−cLabr.=0Color Histograms using the perceptual metricInterpolation of a set of local metric: a deep question...What is a good interpolation?Interpolating a function: minimizing variation with respect toa metric.Interpolating a metric? No intrinsic method: depends on achoice of parametrization.Subject of the next study12/16Color Histograms using the perceptual metricBarycentric interpolation in the Lab space13/16Color Histograms using the perceptual metricVolume change(a)(b)Figure: (a): color photography (b): Zoom of the density change adaptedto colours present in the photography14/16Color Histograms using the perceptual metricexperimental results(a)(b)(c)Figure: The canonical Euclidean metric of the ab projective plane in (a),the canonical metric followed by a division by the local density of theperceptual metric in (b) and the modied kernel formula in (c).15/16Color Histograms using the perceptual metricConclusionA simple observation which improve the consistency of thehistogram without requiring additional computational costsFuture works will focus on:The interpolation of the ellipsesThe construction of the geodesics and their applicationsThank you for your attention16/16Color Histograms using the perceptual metric
Air traffic management (ATM) aims at providing companies with a safe and ideally optimal aircraft trajectory planning. Air traffic controllers act on flight paths in such a way that no pair of aircraft come closer than the regulatory separation norm. With the increase of traffic, it is expected that the system will reach its limits in a near future: a paradigm change in ATM is planned with the introduction of trajectory based operations. This paper investigate a mean of producing realistic air routes from the output of an automated trajectory design tool. For that purpose, an entropy associated with a system of curves is defined and a mean of iteratively minimizing it is presented. The network produced is suitable for use in a semiautomated ATM system with human in the loop.

Entropy minimizing curvesApplication to automated ight path designS. PuechmorelENAC29th October 2015Problem StatementFlight path planning•••Trac is expected to double by 2050 ;In future systems, trajectories will be negotiated and optimizedwell before the ights start ;But humans will be in the loop : generated ight plans mustcomply with operational constraints ;Mutiagent systems•••A promising approach to address the planning problem ;Does not end up with a human friendly trac !Idea : start with the proposed solution and rebuild a routenetwork from it.A curve optimization problemAn entropy criterion•••Route networks and currently made of straight segmentsconnecting beacons ;May be viewed as a maximally concentrated spatial densitydistribution ;Minimizing the entropy with such a density will intuitively yielda ight path system close to what is expected.Problem modelingDensity associated with a curve system••••A classical measure : counting the number of aircraft in eachbin of a spatial grid and averaging over time ;Suers from a severe aw : aircraft with low velocity willovercontribute ;May be corrected by enforcing invariance underreparametrization of curves ;Combined with a nonparametric kernel estimate to yield :˜d: x →1Ni=1 0 K1Ni=1 Ω 0 K( x − γi (t) ) γi (t) dt( x − γi (t) ) γi (t) dtdx(1)Problem modeling IIThe entropy criterion••Kernel K is normalized over the domain Ω so as to have a unitintegral ;Density is directly related to lengths li , i = 1 . . . n of curvesγi , i = 1 . . . N :˜d: x →•1Ni=1 0 K( x − γi (t) ) γi (t) dtNi=1 li(2)Associated entropy is :E (γ1 , . . . , γN ) = −Ω˜˜d(x) log d(x) dx(3)Optimal curve displacement eldEntropy variation˜• d has•integral 1 over the domain Ω ;It implies that :−•∂E (γ1 , . . . , γN )( ) =∂γjΩ˜∂ d(x)˜( ) log d(x) dx∂γj(4)where is an admissible variation of curve γi .˜The denominator in the expression of d has derivative :γj (t)[0,1]γj (t), (t) dt = −γj (t)[0,1]γj (t),Ndt(5)Optimal curve displacement eldEntropy variation•˜The numerator of d has derivative :[0,1]−γj (t) − xγj (t) − x,γj (t)[0,1]γj (t)K ( γj (t) − x ) γj (t) dt (6)N,NK ( γj (t) − x ) dt(7)Optimal curve displacement eld IINormal move•Final expression yield a displacement eld normal to the curve :γj (t) − xγj (t) − xΩ˜K ( γj (t) − x ) log d(x)dx γj (t)N(8)−Ω+Ωγj (t)˜K ( γj (t) − x ) log d(x))dx˜˜d(x) log(d(x))dxNnγj (t)γj (t)(9)γj (t)liNi=1(10)ImplementationA gradient algorithm••••The move is based on a tangent vector in the tangent space toImm([0, 1], R3 )/Di+ ([0, 1) ;It is not directly implementable on a computer ;A simple, landmark based approach with evenly spaced pointswas used ;A compactly supported kernel (epanechnikov) was selected : it˜allows the computation of density d on GPUs as a textureoperation that is very fast.A output from the multiagent systemIntegration in the complete system•Route building from initially conicting trajectories :Figure Initial ight plans and nal onesConclusion and future workAn integrated algorithm•••Entropy minimizer is now a part of the overall route designsystem ;Only a simple postprocessing is necessary to output a usableairways network ;The complete algorithm is being ported to GPU.Future work : take the headings into account••The behavior is not completely satisfactory when routes areconverging in opposite directions ;An improved version will make use of entropy of a distributionin a Lie group (publication in progress).
We introduce a novel kernel density estimator for a large class of symmetric spaces and prove a minimax rate of convergence as fast as the minimax rate on Euclidean space. We prove a minimax rate of convergence proven without any compactness assumptions on the space or Hölderclass assumptions on the densities. A main tool used in proving the convergence rate is the HelgasonFourier transform, a generalization of the Fourier transform for semisimple Lie groups modulo maximal compact subgroups. This paper obtains a simplified formula in the special case when the symmetric space is the 2dimensional hyperboloid.

Kernel Density Estimation on Symmetric SpacesDena Marie AstaDepartment of StatisticsOhio State UniversitySupported by NSF grant DMS1418124 and NSF Graduate Research Fellowship under grant DGE1252522.Geometric Methods for Statistical Analysisq Classical statistics assumes data is unrestricted on Euclidean spacenX¯= 1XXin i=1var[X] = E[X 2 ]E[X]2q Exploiting the geometry of the data leads to faster and more accurate toolsimplicit geometry in nonEuclidean dataexplicit geometry in networks2Motivation: NonEuclidean DataDirectionalHeadingssphereMaterial Stress,Gravitational Lensing3x3 symmetricpositive definitematricesDiffusion TensorImagingNormalDistributions3x3 symmetricpositive definitematriceshyperboloid3Nonparametric Methods: NonEuclidean Dataq Classical nonparametric estimators assume Euclidean structurekernel densityestimatorkernelregressionconditional densityestimatorq Sometimes the given data has other geometric structure to exploit.4Motivation: NonEuclidean DistancesEuclidean distances are often not the right notion of distance between data points.DirectionalHeadingssphereMaterial Stress,Gravitational Lensing3x3 symmetricpositive definitematricesDiffusion TensorImagingNormalDistributions3x3 symmetricpositive definitematriceshyperboloid5Motivation: NonEuclidean DistancesEuclidean distances are often not the right notion of distance between data points.DirectionalHeadingssphereDistance between directional headings should be shortest pathlength.6Motivation: NonEuclidean DistancesEuclidean distances are often not the right notion of distance between data points.standard deviationNormalDistributionshyperboloidmeanAn isometric representation of the hyperboloid is the Poincare HalfPlane. Each pointin either model represents a normal distribution. Distance is the Fisher Distance, whichis similar to KLDivergence.7Motivation: NonEuclidean DistancesEuclidean distance not the right distance à Euclidean volume not the right volumeWe want to minimize risk for density estimation on a (Riemmanian) manifold.EfZtrue density(fMmanifoldˆfn )2 dµvolume measurebased onintrinsic distanceestimator basedon n samples8Existing EstimatorsO(n2s/(2s+d))optimal rate of convergence1(s=smoothness parameter, d=dimension)Euclidean KDEsubtraction undefined for general MnX ✓ x Xi ◆1ˆhf(X1 ,...,Xn ) (x) =Knh i=1hdivision by h undefined for general M9Exploiting Geometry: Symmetriesq symmetries = geometryq symmetries make the smoothing of data (convolution by a kernel) tractableq translations in Euclidean space are specific examples of symmetriesq other spaces call for other symmetries10Exploiting symmetries to convolveKernel density estimation is about convolving a kernel with the data.ˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )density on thespace oftranslations on Rn(g ⇤ f )(x) =ZRng(t)f (xt) dtdensity on RnMore general spaces, depending on their geometry, we will requiresymmetries other than translations…11Exploiting symmetries to convolveKernel density estimation is about convolving a kernel with the data.density on thespace oftranslations on Rnˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )(g ⇤ f )(x) =ZRndensity on Rng(t)f (xt) dt =ZRng(Tt )f (Tt 1 (x)) dtTv (w) = v + wIdentify t with Tt and interpret g as a density on the space of Tt’s.More general spaces, depending on their geometry, we will requiresymmetries other than translations…12Exploiting symmetries to convolveGeneralized kernel density estimation involves convolving a generalized kernel with the data.ˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )(“empirical density”)density on the space G(g ⇤ f )(x) =density on XZg(T )f (TG1(x)) dTspace of symmetries on XX is a symmetric space, a space having a suitable space G of symmetries.13GKernel Density Estimator: general formbandwidth and cutoff parameters“empirical density” on symmetric space Xˆh,Cf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )sample observationsdensity on group ofsymmetries GWe can use harmonic analysis on symmetric spaces to define andanalyze this estimator.1Asta,D., 2014.
Harmonic Analysis on Symmetric SpacesFourier Transform: an isometryF : L2 (R) ⌧ L2 (R) : F1HelgasonFourier Transform: for symmetric space X, an isometryH : L2 (X) ⌧ L2 (· · · ) : H1frequency space depends on the geometry of XThe (Helgason)Fourier Transform sends convolutions to products.1Terras,A., 1985.
15Generalization: GKernel Density Estimatorassumptions on kernel and true density:q The true density is sufficiently smooth (in Sobolev ball).q The kernel transforms nicely with the space of dataq The kernel is sufficiently smooth1Asta,D., 2014.
16GKernel Density EstimatorTHEOREM: GKDE achieves the same minimax rate on symmetricspaces as the ordinary KDE achieves on Rd.1ˆh,Cf(X1 ,...,Xn ) = H1[(X1 ,...,Xn ) H[Kh ]IC ]O(n2s/(2s+d))optimal rate of convergence1(s=Sobolev smoothness parameter, d=dimension)1Asta,D., 2014.
17Kernels on SymmetriesSymmetric Positive Definite (nxn) Matrices SPDn:Kernels are densities on space G=GLn of nxn invertiblematrices.Each GLnmatrix M determines an isometry (distancepreserving function):M: SPDn ⇠ SPDn=M (X)= M T XMHyperboloid H2:Kernels are densities on space G=SL2 of 2x2invertible matrices having determinant 1.Each SL2matrix M determines an isometry (distancepreserving function):M: H2 ⇠ H2=M11 x + M12M (x) =M21 x + M2218Kernels on SymmetriesHyperboloid H2:Kernels are densities on space G=SL2 of 2x2invertible matrices having determinant 1.Each SL2matrix M determines an isometry (distancepreserving function):M: H2 ⇠ H2=M (x) =M11 x + M12M21 x + M22example of kernel K (hyperbolic version of the gaussian):solution to the heat equation on SL2: H[Kh ](s, k✓ ) / eh2 s2 h¯¯ssamples from K (pointsin SL2) represented inH2=SL2/SO219Recap: GKDEExploiting the geometric structure of the data type:q Tractable data smoothing = convolving a kernel on a space of symmetriesq Harmonic analysis on symmetric spaces allows us to prove minimax rateq Symmetric spaces are general enough to include:DirectionalHeadingsMaterial Stress,Gravitational Lensing1Asta,Diffusion TensorImagingD., 2014.
NormalDistributions20
Keynote speach Tudor Ratiu (chaired by Xavier Pennec)
The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity.

SYMMETRY METHODS INGEOMETRIC MECHANICSTudor S. RatiuSection de Math´matiqueseEcole Polytechnique F´d´rale de Lausanne, Switzerlande etudor.ratiu@epfl.chGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20151PLAN OF THE PRESENTATION• Lie group actions and reduction of dynamics• The above in the Hamiltonian case• Properties of the momentum map• Regular reduction• Singular reduction• Regular cotangent bundle reductionGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20152M , N manifolds, N ⊂ M as subsets.N is an initial submanifold of M if the inclusion map i : N → Mis an immersion satisfying the following condition: for any smoothmanifold P and any map g : P → N , g is smooth if and only ifi ◦ g : P → M is smooth. The smooth manifold structure thatmakes N into an initial submanifold of M is unique.g◦iPg.//iM==NThe integral manifolds of an integrable generalized distribution(thus forming a generalized foliation) are initial.Inﬁnitesimal generator ξM ∈ X(M ) associated to ξ ∈ g : Lie(G)dξM (m) :=Φexp tξ (m) = TeΦm · ξ.dt t=0ξM is a complete vector ﬁeld with ﬂow (t, m) → exp tξ · m.ξ ∈ g → ξM ∈ X(M ) is a Lie algebra antihomomorphismGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20153Isotropy, stabilizer, symmetry subgroup of m ∈ MGm := {g ∈ G  Φg (m) = m} ⊂ G,Gg·m = gGmg −1, ∀g ∈ Gclosed subgroup of G whose Lie algebra gm equalsgm = {ξ ∈ g  ξM (m) = 0}.Om ≡ G · m := {Φg (m)  g ∈ G} Gorbit of m∼Om g · m ←→ gGm ∈ G/Gm diﬀeomorphismOm initial submanifold of M• Transitive action: only one orbit, that is, Om = M• Free action: Gm = {e} for all m ∈ M(g, m) −→ (m, g · m) ∈ M × M is• Proper action: if Φ : G × Mproper. Equivalent to: for any two convergent sequences {mn} and{gn · mn} in M , there exists a convergent subsequence {gnk } in G.Examples of proper actions: compact group actions, SE(n) actingon Rn, Lie groups acting on themselves by translation.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20154Fundamental facts about proper Lie group actions(i) The isotropy subgroups Gm are compact.(ii) The orbit space M/G is a Hausdorﬀ topological space.(iii) If the action is free, M/G is a smooth manifold, and the canonical projection π : M → M/G deﬁnes on M the structure of a smoothleft principal G–bundle.(iv) If all the isotropy subgroups of the elements of M under the G–action are conjugate to a given subgroup H, then M/G is a smoothmanifold and π : M → M/G deﬁnes the structure of a smooth locallytrivial ﬁber bundle with structure group N (H)/H and ﬁber G/H.Normalizer of H is N (H) := {g ∈ G  gH = Hg}.(v) If the manifold M is paracompact then there exists a GinvariantRiemannian metric on it. (Palais)(vi) If the manifold M is paracompact then smooth Ginvariantfunctions separate the Gorbits.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20155Twisted productH ⊂ G Lie subgroup acting (left) on the manifold A. Right twistedaction of H on G × A, deﬁned by(g, a) · h = (gh, h−1 · a),g, h ∈ G,a ∈ A,is free and proper. Twisted product G ×H A := (G × A)/H.TubeG acts properly on M . For m ∈ M , let H := Gm. A tube aroundthe orbit G · m is a Gequivariant diﬀeomorphismϕ : G ×H A −→ U,where U is a Ginvariant neighborhood of G · m and A is somemanifold on which H acts.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20156Slice TheoremLet G be a Lie group acting properly on M at the point m ∈ M ,H := Gm. Then there exists a tubeϕ : G ×H B −→ Uabout G · m. B is an open Hinvariant neighborhood of 0 in a vectorspace which is Hequivariantly isomorphic to TmM/Tm(G·m), wherethe Hrepresentation is given byh · (v + Tm(G · m)) := TmΦh · v + Tm(G · m).SliceS := ϕ([e, B]) so that U = G · S.From now on, we assume that G acts on M properly.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20157Dynamical consequencesLet X ∈ X(U )G, U ⊂ M open Ginvariant, S slice at m ∈ U . Then:• ∃ XT ∈ X(G·S)G, XT (z) = ξ(z)M (z) for z ∈ G·S, where ξ : G·S → gis smooth Gequivariant and ξ(z) ∈ Lie(N (Gz )) for all z ∈ G · S. Theﬂow Tt of XT is given by Tt(z) = exp tξ(z) · z, so XT is complete.• ∃ XN ∈ X(S)Gm .• If z = g · s, for g ∈ G and s ∈ S, thenX(z) = XT (z) + TsΦg (XN (s)) = TsΦg (XT (s) + XN (s))• If Nt is the ﬂow of XN (on S) then the integral curve of X ∈ X(U )Gthrough g · s ∈ G · S isFt(g · s) = g(t) · Nt(s),where g(t) ∈ G is the solution ofg(t) = TeLg(t) ξ(Nt(s)) ,˙g(0) = g.This is the tangentialnormal decomposition of a Ginvariant vector ﬁeld (or Krupa decomposition in bifurcation theory).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20158Geometric consequencesM(H) = {z ∈ M  Gz ∈ (H)},orbit type setM H = {z ∈ M  H ⊂ Gz },ﬁxed point setMH = {z ∈ M  H = Gz },isotropy type setare (embedded) submanifolds of M , MH open in M H , but, in general, MH is not closed in M .Let N (H) := {g ∈ G  gH = Hg} be the normalizer of H in G.N (H)/H acts freely and properly on MH .m ∈ M is regular if ∃Um such that dim Oz = dim Om, ∀z ∈ U .Principal Orbit Theorem: M connected. M reg := {m ∈ M m regular} is connected, open, and dense in M . M/G contains onlyone principal orbit type, which is connected, open, dense in M/G.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 20159The Stratiﬁcation Theorem: The connected components of allorbit type manifolds M(H) and their projections onto M(H)/G constitute a Whitney stratiﬁcation of M and M/G, respectively. Thisstratiﬁcation of M/G is minimal among all Whitney stratiﬁcationsof M/G.GCodostribution Theorem: Let G be a Lie group acting properly on the smooth manifold M and m ∈ M a point with isotropysubgroup H := Gm. ThenTm(G · m)◦ H= df (m)  f ∈ C ∞(M )G .This is due to Ortega [1998].Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201510Reduction of general vector ﬁeldsG × M → M proper, X ∈ X(M )G (Gequivariant) with ﬂow FtLaw of conservation of isotropy: Every isotropy type submanifoldMH := {m ∈ M  Gm = H} is preserved by Ft.πH : MH → MH /(N (H)/H) projection , iH : MH → M inclusionX induces a unique Hisotropy type reduced vector ﬁeld X H onMH /(N (H)/H) byX H ◦ πH = T πH ◦ X ◦ iH ,whose ﬂow FtH is given byFtH ◦ πH = πH ◦ Ft ◦ iH .G compact linear action, then the construction of MH /(N (H)/H)can be implemented by using the invariant polynomials of the action and the theorems of Hilbert and SchwarzMather.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201511The Hamiltonian case(M, ω) symplectic manifold, G connected Lie group with Lie algebrag, G × M → M left free proper symplectic action: Φ∗ ω = ω, ∀g ∈ G.gJ : M → g∗ momentum map: XJξ = ξM , where Jξ := J, ξ .Nonequivariance (Souriau) group g∗valued 1cocycle:c(g) := J(g · m) − Ad∗−1 J(m), independent of m ∈ M if M connected.gΘ(M, ω) connected. G × g∗(g, µ) −→ Ad∗−1 µ + c(g) ∈ g∗ aﬃneg∗ is Θequivariant.action. J : M → gNoether’s Theorem: J is conserved along the ﬂow of any Ginvariant Hamiltonian.g∗ is an aﬃne LiePoisson space±{f, h}(µ) := ± µ,δf δh,δµ δµΣδf δh,,δµ δµf, h ∈ C ∞(g∗)Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201512The inﬁnitesimal nonequivariance twococycle Σ ∈ Z 2(g, R) isΣ:g×g(ξ, η) −→ dση (e) · ξ ∈ R,where ση : G → R deﬁned by ση (g) = σ(g), η .Its symplectic leaves (reachable sets) are the Θorbits Oµ:±ωOµ (ν)(ξg∗ (ν), ηg∗ (ν)) = ± ν, [ξ, η]Σ(ξ, η).J : M → g∗ is a Poisson map.+Example: lifted actions on cotangent bundles. G acts on themanifold Q and then by lift on its cotangent bundle T ∗Q.J(αq ), ξ = αq , ξQ(q) ,∀ αq ∈ T ∗Q, ∀ ξ ∈ g. This is an Ad∗equivariant momentum map.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201513Special case 1: linear momentum. Conﬁguration space of Nparticles in space is R3N . R3 acts on R3N by v · (qi, ) = (qi + v).Then J : T ∗R3N → R3 is the linear momentum J(qi, pi) = N pi.i=1Special case 2: angular momentum. SO(3) acts naturally onR3. Then J : T ∗R3N → R3 is the angular momentum J(q, p) = q × p.Example: symplectic linear actions. (V, ω) symplectic vectorspace, G ⊆ Sp(V, ω), acting naturally on V . Ad∗equivariant momentum map J : V → sp(V, ω)∗ is1J(v), ξ = ω(ξV (v), v).2Special case: CayleyKlein parameters and the Hopf ﬁbration.SU(2) acts on C2, J : C2 → su(2)∗ given, as above, by1J(z, w), ξ = ω(ξ(z, w)T, (z, w)),2z, w ∈ C, ξ ∈ su(2).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201514Lie algebra isomorphism (su(2), [ , ]) → (R3, ×) given byR3∼x = (x1, x2, x3) ←→1−ix3−ix1 − x2x :=ix32 −ix1 + x2∈ su(2).Identify su(2)∗ with R3 by the map µ ∈ su(2)∗ → µ ∈ R3 deﬁned byˇµ · x := −2 µ, x ,ˇ∀ x ∈ R3 .Then ˇ : C2 → R3 has the expressionJ1ˇ(z, w) = − (2wz, z2 − w2) ∈ R3.J2(z, w) are the CayleyKlein parameters or the Kustaanheimo2Stiefel coordinates. ˇS 3 : S 3 → S1/2 is the Hopf ﬁbration. SimilarJconstruction in ﬂuid dynamics: Clebsch variables.The momentum map of the SU(2)action on C2, the CayleyKleinparameters, the KustaanheimoStiefel coordinates, and the familyof Hopf ﬁbrations on concentric threespheres in C2 are the samemap.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201515Properties of the momentum map• range TmJ = (gm)◦. Points with symmetry are points of bifurcation. Freeness of the action is equivalent to the regularity of J.• ker TmJ = (g · m)ω .• The obstruction to the existence of J is the vanishing of the mapH 1(g, R) := g/[g, g] [ξ] −→ iξM ω ∈ H 1(M, R).• J[ξ, η] = {Jξ , Jη } ⇐⇒ TmJ (ξM (m)) = − ad∗ J(m) ∀m ∈ M, ξ, η ∈ gξAmong all possible choices of momentum maps for a given action,there is at most one inﬁnitesimally Ad∗equivariant one.G connected, then inﬁnitesimal Ad∗equivariance ⇔ Ad∗equivariance.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201516• H 1(g; R) = 0 or H 1(M, R) = 0 ⇒ J exists. H 2(g; R) = 0 ⇒ J equiv.Whitehead lemmas: g is semisimple =⇒ H 1(g; R) = H 2(g; R) = 0.• If G is compact J can always be chosen to be Ad∗equivariant• Reduction Lemma: gJ(m) · m = g · m ∩ ker TmJ = g · m ∩ (g · m)ω .Gµ • z•zJ–1(µ)symplecticallyorthogonal spacesG•zGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201517Momentum maps and isotropy type manifolds• MGm is a symplectic submanifold of M for any m ∈ M .This is based on: H compact Lie group and (V, ω) symplectic representation space. Then V H is a symplectic subspace of V .m• Let MGm be the connected component of MGm containing m andmmN (Gm)m := {n ∈ N (Gm)  n · z ∈ MGm for all z ∈ MGm }.N (Gm)m is a closed subgroup of N (Gm) that contains the connected component of the identity. So it is also open and henceLie(N (Gm)m) = Lie(N (Gm)).In addition, (N (Gm)/Gm)m = N (Gm)m/Gm so thatLie (N (Gm)m/Gm) = Lie (N (Gm)/Gm) .m• Lm := N (Gm)m/Gm acts freely properly and canonically on MGmby Ψ(nGm, z) := n · z.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201518m• The free proper canonical action of Lm := N (Gm)m/Gm on MGmmhas a momentum map JLm : MGm → (Lie(Lm))∗ given byJLm (z) := Λ(JM m (z) − J(m)),Gmmz ∈ M Gm .In this expression Λ : (g◦ )Gm → (Lie(Lm))∗ denotes the naturalmLmequivariant isomorphism given bydΛ(β),(exp tξ ) Gm = β, ξ ,dt t=0for any β ∈ (g◦ )Gm , ξ ∈ Lie(N (Gm)m) = Lie(N (Gm)).mm• The nonequivariance onecocycle τ : MGm → (Lie(Lm))∗ of themomentum map JLm is given by the mapτ (l) = Λ(c(n) + n · J(m) − J(m)),l = nGm ∈ Lm, n ∈ N (Gm)m.So, even if J is equivariant, the induced momentum map JLm isnot, in general!Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201519ConvexityJ : M → g∗ equivariant, G, M compact connected. The intersectionof range J with a Weyl chamber is a compact and convex polytope,the momentum polytope (Atiyah, Guillemin, Kirwan, Sternberg).Delzant polytope in Rn is a convex polytope that is also:(i) Simple: there are n edges meeting at each vertex.(ii) Rational: the edges meeting at a vertex p are of the formp + tui, 0 ≤ t < ∞, ui ∈ Zn, i ∈ {1, . . . , n}.(iii) Smooth: the vectors {u1, . . . , un} can be chosen to be anintegral basis of Zn.Delzant’s Theorem: There is a biection∼{symplectic toric manifolds} ←→ {Delzant polytopes}∼(M, ω, Tn, J : M → Rn)←→J(M )Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201520MarsdenWeinstein Reduction Theorem• If µ ∈ J(M ) ⊂ g∗ regular value of J and• Gµaction on J−1(µ) is free and proper; Gµ := {g ∈ G  Θg µ = µ},then (Mµ := J−1(µ)/Gµ, ωµ) is symplectic:∗πµωµ = i∗ ω,µiµ : J−1(µ) → M inclusion πµ : J−1(µ) → J−1(µ)/Gµ projection.The ﬂow Ft of Xh, h ∈ C ∞(M )G, leaves the connected componentsof J−1(µ) invariant and commutes with the Gaction, so it inducesµa ﬂow Ft on Mµ byµπµ ◦ Ft ◦ iµ = Ft ◦ πµ.µFt is Hamiltonian on (Mµ, ωµ) for the reduced Hamiltonian hµ ∈C ∞(Mµ) given byhµ ◦ πµ = h ◦ iµ.Moreover, if h, k ∈ C ∞(M )G, then {h, k}µ = {hµ, kµ}Mµ .Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201521Orbit symplectic form from reductionG a Lie group, Lg (h) = gh, Rg (h) = hg, left and right translationsOµ := {Ad∗ µ  g ∈ G} coadjoint Gorbit through µ ∈ g∗gTake the special case M = G and the left action g · h := gh, for allg, h ∈ G. The momentum map JL : T ∗G → g∗ has the expression∗JL(αg ) = Te Rg (αg ) ∈ g∗, ∀αg ∈ T ∗G.−∼Then, (J−1(µ)/Gµ, Ωµ) = (Oµ, ωOµ ); orbit symplectic form isL±ωOµ (ν)(ad∗ ν, ad∗ ν) = ± ν, [ξ, η] ,ηξ∀ ξ, η ∈ g, ν ∈ OµJR∗ is a LiePoisson space for the bracket ((T ∗ G)/G ←→ g∗ )g−{f, h}(µ) := ± µ,δf δh,δµ δµ,f, h ∈ C ∞(g∗)and its symplectic leaves (reachable sets) are Oµ.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201522Reconstruction of dynamicsGiven is an integral curve cµ(t) of Xhµ ∈ X(Mµ). Let m0 ∈ J−1(µ).Find integral curve c(t) of Xh ∈ X(M ) with initial condition m0.Pick a smooth curve d(t) ⊂ J−1(µ) such that d(0) = m0 andπµ(d(t)) = cµ(t). If c(t) is the integral curve of Xh with initialcondition c(0) = m0, then there is a curve g(t) ⊂ Gµ such thatc(t) = g(t) · d(t).˙1.) Find smooth curve ξ(t) ⊂ gµ s.t. ξ(t)M (d(t)) = Xh(d(t)) − d(t).2.) With this ξ(t), solve g(t) = TeLg(t)ξ(t), g(0) = e.˙Let A ∈ Ω1 J−1(µ); gµ be a connection on the Gµprincipal bundleJ−1(µ) → Mµ. Choose d(t) to be the horizontal lift of cµ(t) through˙m0, i.e., A(d(t))(d(t)) = 0, πµ(d(t)) = cµ(t), d(0) = m0.Then the solution of 1.) isξ(t) = A(d(t)) Xh(d(t)) .Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201523Orbit reduction••••Φ : G × M → M free proper symplectic action.The action admits a momentum map J : M → g∗.M is connected; if J equivariant, this is not needed.The aﬃne coadjoint orbitOµ := {Ad∗−1 µ + c(g)  g ∈ G}gis an initial submanifold of g∗.• Bifurcation Lemma (range (TmJ) = (gm)◦) + the freeness of theaction (hence gm = {0}) =⇒ J is a submersion onto some opensubset of g∗. So J is transversal to Oµ, i.e., for any z ∈ J−1(Oµ),we have (Tz J)(Tz M ) + TJ(z)Oµ = g∗. So J−1(Oµ) is an initial submanifold of M of dimensiondim(J−1(Oµ)) = dim M − dim Gµwhose tangent space at z ∈ J−1(Oµ) equalsTz (J−1(Oµ)) = (Tz J)−1(TJ(z)Oµ) = g · z + ker(Tz J).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201524• Gaction restricts to a free and proper Gaction on the Ginvariantinitial submanifold J−1(Oµ). Why is this restricted action smooth?Action Φ : G × M → M is smooth, so ΦOµ : G × J−1(Oµ) → M issmooth (restriction: composition of smooth maps, J−1(Oµ) → M ).But ΦOµ (G×J−1(Oµ)) ⊂ J−1(Oµ). Since J−1(Oµ) is initial, it followsthat ΦOµ : G × J−1(Oµ) → J−1(Oµ) is smooth.• Hence MOµ := J−1(Oµ)/G is a manifold and the projection πOµ :J−1(Oµ) → MOµ is a surjective submersion.(i) On MOµ := J−1(Oµ)/G there is a unique symplectic form ωOµ+∗characterized by ι∗ µ ω = πOµ ωOµ + J∗ µ ωOµ .OO+ιOµ : J−1(Oµ) → M , JOµ := JJ−1(Oµ), and ωOµ is the +symplecticstructure on the aﬃne orbit Oµ.(MOµ , ωOµ ) is the symplectic orbit reduced space.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201525(ii) h ∈ C ∞(M )G. The ﬂow Ft of Xh leaves the connected components of J−1(Oµ) invariant and commutes with the Gaction, so itOµinduces a ﬂow Fton MOµ , uniquely determined byOµπ Oµ ◦ F t ◦ i Oµ = F t◦ π Oµ .O(iii) The vector ﬁeld generated by the ﬂow Ft µ on (MOµ , ωOµ ) isHamiltonian with associated reduced Hamiltonian hOµ ∈ C ∞(MOµ )deﬁned by hOµ ◦ πOµ = h ◦ iOµ . The vector ﬁelds Xh and XhO areµπOµ related.(iv) h, k ∈ C ∞(M )G ⇒ {h, k} ∈ C ∞(M )G and {h, k}Oµ = {hOµ , kOµ }MOµ ,where {·, ·}MOµ denotes the Poisson bracket associated to the symplectic form ωOµ on MOµ .This is a theorem in the Poisson category whereas the point reduction theorem is in the symplectic category.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201526Problems with the hypotheses of the Reduction TheoremThe hypotheses are too restrictive, even in classical examples, suchas Jacobi’s elimination of the nodes. Properness of the actioncannot be eliminated because one needs the theory of Gmanifolds.1.) How does one recover the conservation of isotropy? Themomentum map seems incapable to get this. J−1(µ) are not thesmallest invariant sets. Reduction completely ignores this point.2.) If the Gaction is not free, Mµ is not a smooth manifold. Thenwhat is the structure of the reduced topological space? What isleft that remains symplectic?3.) If G is discrete, the momentum map is zero. What is reductionin that case?These are questions in bifurcation theory with symmetry. Forgeneric vector ﬁelds, a lot is known. For Hamiltonian vector ﬁelds,almost nothing (a few papers).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201527Singular point reductionGiven: (M, ω) connected, m ∈ MG acting symplectically on MJ : (M, ω) → g∗ momentum mapc : G → g∗ group 1cocycle deﬁned by c(g) := J(g · z) − Ad∗−1 J(z)g∗ ν + c(g) on g∗aﬃne Gaction Θ(g, ν) := Adg−1Gµ the Θisotropy at µmNotation: MH connected component of MH containing m,H := Gm ⊆ Gµ := J(m) ∈ g∗Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201528Singular symplectic point stratam(i) J−1(µ) ∩ (Gµ · MH ) is embedded in M .(H)m(ii) Mµ := [J−1(µ)∩(Gµ ·MH )]/Gµ has a unique quotient manifoldstructure such that(H)πµ(H)m: J−1(µ) ∩ (Gµ · MH ) −→ Mµis a surjective submersion.(H)(iii) There is a unique symplectic form ωµby(H) ∗ιµ(H)(H)on Mµcharacterized(H) ∗ (H)ωµ ,ω = πµ(H)mιµ: J−1(µ) ∩ (Gµ · MH ) → M inclusion. (Mµsingular symplectic point strata.(H), ωµ) are theGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201529(iv) h ∈ C ∞(M )G. Flow Ft of Xh leaves the connected componentsmof J−1(µ) ∩ (Gµ · MH ) invariant and commutes with the Gµaction,(H)µso it induces ﬂow Ft on Mµ(H)πµ:(H)◦ Ft ◦ iµ(H)µ= F t ◦ πµ.(H)µ(H)(v) Ft is Hamiltonian on Mµfor the reduced Hamiltonian hµ :(H)(H)(H)(H)(H)Mµ → R, hµ ◦ πµ = h ◦ iµ . Xh and X (H) are πµ related.hµ(vi) h, k ∈ C ∞(M )G ⇒ {h, k} ∈ C ∞(M )G and(H){h, k}µwhere {·, ·}(H)Mµ(H), kµ}(H)Mµis the Poisson bracket induced by the symplectic(H)structure on Mµ(H)= {hµ.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201530Sjamaar point reduction principleGOAL: Realize the strata as usual reduced spacesRecall: we start with a proper symplectic Gaction on (M, ω)• m ∈ M is ﬁxed, H := Gm, µ := J(m).mm• N (H)m := {n ∈ N (H)  n · MH ⊂ MH }.N (H)m is open, hence closed, in N (H). Also H ⊂ N (H)m. ThusLie(N (H)m/H) = Lie(N (H)/H) =: lm• Lm := N (H)m/H acts freely, properly, and symplectically on MHwith momentum mapmJLm : MHz −→ Λ(JM m (z) − µ) ∈ (Lie(Lm))∗H• Λ : (g◦ )H → (Lie(Lm))∗, Lmequivariant isomorphismmΛ(β),d(exp tξ)Hdt t=0= β, ξ ,β ∈ (g◦ )H , ξ ∈ Lie(N (H)m) = Lie(N (H))mGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201531•••g◦ ⊆ g∗ denotes the annihilator of gm in g∗m(g◦ )H are the Hﬁxed points in g◦mmNonequivariance onecocycle of JLmτ : Lml −→ Λ(c(n) + n · µ − µ) ∈ (Lie(Lm))∗for l = nH ∈ Lm and n ∈ N (H)m.(H)(H)mm:= [J−1(µ) ∩ (Gµ · MH )]/Gµ(i) πµ: J−1(µ) ∩ (Gµ · MH ) → Mµis a smooth ﬁber bundle with ﬁber Gµ/H and structure groupNGµ (H)m/H.(ii)m(MH )0 := J−1 (0)/Lm0Lmm= [J−1(µ) ∩ MH ]/(NGµ (H)m/H)Lm = Lm, in general (recall, the Lmaction is aﬃne).0Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201532m(iii) π0 : J−1 (0) → (MH )0 is a principal Lmbundle. Gµ/H is a right0Lmm /H)space and J−1 (µ)∩M m is a left (N (H)m /H)space.(NGµ (H)GµHThe associated bundle with ﬁber Gµ/HmGµ/H ×NG (H)m/H J−1(µ) ∩ MH −→µm[J−1(µ) ∩ MH ]/(NGµ (H)m/H).is Gµsymplectomorphic to(H)πµ(H)m: J−1(µ) ∩ (Gµ · MH ) −→ Mµ,which means that∼mm• Gµ/H ×NG (H)m/H J−1(µ) ∩ MH ←→ J−1(µ) ∩ (Gµ · MH )µis a Gµdiﬀeomorphismmm• (MH )0 = J−1 (0)/Lm = (J−1(µ) ∩ MH )/(NGµ (H)m/H) is symplec0Lm(H)tomorphic to Mµ .Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201533m• {J−1(µ) ∩ (Gµ · MH )  J(z) = µ} forms a Whitney (B) stratiﬁcationof J−1(µ).(H)• {Mµ (H)} is a symplectic Whitney (B) stratiﬁcation of thecone space Mµ := J−1(µ)/Gµ.• Each connected component of Mµ contains a unique open stratumthat is connected, open, and dense in the connected component ofMµ that contains it.There are similar theorems for orbit reduction. In the diagram,at every level, the corresponding spaces are isomorphic and in therespective category.In the diagram below:• Lµ is an isomorphism of cone (hence Whitney (B)) stratiﬁedspaces; in particular, Lµ it is a homeomorphism(H)is the restriction of Lµ to the stratum determined by H := Gm(H)and fOµ• Lµ• fµ(H)are the Sjamaar principle symplectomorphismsGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201534J−1(µ)lµJ−1(Oµ) inclusionEπµπOµ projectionsccLµJ−1(µ)/GµETJ−1(Oµ)/G stratiﬁed isomorphismTstratum inclusions(H)LµmJ−1(µ) ∩ (Gµ · MH )/GµEmG · (J−1 (µ) ∩ MH )/G symplectomorphismTT(H)fOµ symplectomorphism(H)fµJ−1(0)/Lm0Lm mJ−1(µ) ∩ MH / NGµ (H)m/HL0EJ−1(O0)/Lm symplectomorphismLmddddddddddddmJ−1(N (H)m · µ) ∩ MH / (N (H)m/H)35Cotangent bundle reduction – embeddingΦ : G × Q → Q left free proper action =⇒ Qµ := Q/Gµ is a smoothmanifold and πQ,Qµ : Q → Qµ is a principal Gµbundle.Lift Φ to a Gaction on (T ∗Q, ωQ); it is free, proper, and it admitsan equivariant momentum map J : T ∗Q → g∗ given byJ(αq ), ξ = αq (ξQ(q)),∀αq ∈ T ∗Q, ξ ∈ g.Reduce at µ ∈ g∗ to get a symplectic manifold ((T ∗Q)µ, Ωµ).HYPOTHESIS: ∃ αµ ∈ Ω2(Q), Gµinvariant, taking values in J−1(µ).∗∃! βµ ∈ Ω2(Qµ) such that πQ,Qµ βµ = dαµ.βµ is closed (not exact, in general).Note: αµ does not drop to Qµ whereas dαµ does.∗Deﬁne Bµ := πQµ βµ ∈ Ω2(T ∗Q), where πQµ : T ∗Qµ → Qµ projection.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201536Cotangent Bundle Reduction – Embedding Version. There isa symplectic embeddingϕµ : ((T ∗Q)µ, (ωQ)µ) − (T ∗Qµ, ωQµ − Bµ),−onto the vector subbundle [T πQ,Gµ (V )]◦ ⊆ T ∗Qµ, where V ⊂ T Q isthe vector subbundle consisting of vectors tangent to the Gorbits,i.e., its ﬁber at q ∈ Q equals Vq = {ξQ(q)  ξ ∈ g}, and ◦ denotes theannihilator for the natural duality pairing between T Qµ and T ∗Qµ.∼ϕµ : ((T ∗Q)µ, (ωQ)µ) −→ (T ∗Qµ, ωcan − Bµ) symplectic ⇐⇒ g = gµ.Let A ∈ Ω1(Q; g) be a principal connection on the Gprincipal bundleπQ,Q/G : Q → Q/G and B ∈ Ω2(Q; g) its curvature.1,Can choose αµ(q) := A(q)∗µ =⇒ dαµ = µ, B + 2 [A ∧ A] ∈ Ω2(Q).∗∗Recall: Bµ = πQµ βµ ∈ Ω2(T ∗Qµ), βµ = πQ,Qµ dαµ ∈ Ω2(Qµ).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201537Cotangent bundle reduction – ﬁbrationΦ : G × Q → Q left free proper actionCotangent Bundle Reduction—Bundle Version Reduced space(T ∗Q)µ → T ∗(Q/G) is a locally trivial ﬁber bundle, typical ﬁber Oµ.This is not good enough because it does not say anything aboutthe symplectic form on (T ∗Q)µ in terms of the symplectic structureof T ∗(Q/G) and the orbit symplectic structure on Oµ.Need to study ﬁrst the Poisson situation to ﬁx the setup, also easier.Let A ∈ Ω1(Q; g) be a principal connection on πQ,Q/G : Q → Q/G.Hq = {vq ∈ Tq Q  A(vq ) = 0} horizontal space at q ∈ QVq = {ξQ(q)  ξ ∈ g} vertical space at q ∈ QGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201538Tq QTq Qvq −→ verq (vq ) := [A(q)(vq )]Q(q) ∈ Vq vertical projectionvq −→ horq (vq ) := vq − verq (vq ) horizontal projectionTq πQ,Q/GHq : Hq → T[q](Q/G) isomorphism with inverseHorq := Tq πQ,Q/GHq−1: T[q](Q/G) → Hq , horizontal lift at q ∈ QπQ×g,Q/G : ˜ = (Q × g)/G → Q/G, the adjoint bundle; a vectorggbundle with ﬁbers isomorphic to g; πQ×g,˜ : Q × g → ˜ projectiongVector bundle isomorphismαA : T Q/G[vq ] −→ (Tq π(vq ), [q, A(q)(vq )]) ∈ T (Q/G) ⊕ gwith inverse−1αA : T (Q/G) ⊕ g−1(αA )∗ : T ∗Q/Gv[q], [q, ξ] −→ [Horq v[q] + ξQ(q)] ∈ T Q/G[αq ] −→ Hor∗ αq , [q, J(αq )] ∈ T ∗(Q/G) ⊕ g∗q∗∗where Hor∗ : Tq Q → T[q](Q/G) is dual to Horq : T[q](Q/G) → Tq Q.qGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201539A ∈ Ω1(Q; g) induces an aﬃne connection on T ∗(Q/G) ⊕ g∗ → Q/G.For f ∈ C ∞(T ∗(Q/G) ⊕ g∗), w = (α[q], [q, µ]) ∈ W := T ∗(Q/G) ⊕ g∗,vα[q] ∈ Tα[q] (T ∗(Q/G)), the exterior covariant derivative is:dAf (w) ∈ Tα[q] (T ∗(Q/G)),˜πQ/G : T ∗(Q/G) → Q/GdAf (w) vα[q] := df (w) vα[q] , T(q,µ)πQ×g,g Horq Tα[q] πQ/G vα[q]˜,0Push forward by (α−1)∗ Poisson bracket. If f, g ∈ C ∞ (T ∗(Q/G) ⊕ g∗)A{f, g}W (w) = ωQ/G α[q]dAf (w) , dAg(w)˜˜+ [q, µ], B α[q]dAf (w) , dAg(w)˜˜δf δg− w,,δw δwδfδw ∈ (T (Q/G) ⊕ g)α[q] is the ﬁber derivativeδfw,δwd:=f (w + tw ),dt t=0w, w ∈ T ∗(Q/G) ⊕ g∗ α[q]∗B := πQ/GB ∈ Ω2(T ∗(Q/G); g), B ∈ Ω2(T ∗(Q/G); g)B([q]) Tq πQ/Guq , Tq πQ/Gvq := [q, CurvA(q)(uq , vq )]Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201540Determine the symplectic leaves of the gauged LiePoisson bracketon T ∗(Q/G) ⊕ g∗? Solved by Perlmutter in his 1999 thesis and inﬁnal form MarsdenPerlmutter [2000].O ⊂ g∗ codajoint orbitO := (Q×O)/G → Q/G associated ﬁber bundle. πQ×O,O : Q×O → OT ∗(Q/G) ×Q/G O :=α[q], [q, ν]  q ∈ Q, ν ∈ O, α[q] ∈ Tα[q] (T ∗Q) is∗a ﬁber bundle over Q/G whose ﬁber at [q] ∈ Q/G is T[q](Q/G) × O[q](α−1)∗ J−1(O/G) = T ∗(Q/G) ×Q/G O ⊂ T ∗(Q/G) ⊕ gASo, reduced symplectic form ωO on (T ∗Q)O := J−1(O)/G pushesforward by (α−1)∗ to a symplectic form ωA on T ∗(Q/G) ×Q/G O:AGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201541ωA = ωQ/G − βwhere β ∈ Ω2 O is uniquely determined byπ∗Q×O,O+∗β = dα + πQ×O,O ωO ,α ∈ Ω1(Q × O),α(q, ν) uq , − ad∗ ν := ν, A(q)(uq ) ,ξq ∈ Q, uq ∈ Tq Q, ν ∈ O, ξ ∈ g.dα has the explicit expressiondα(q, ν)uq , − ad∗ ν , vq , − ad∗ νηξ= ν, [η , ξ] + [η, ξ ] + [ξ, η] + CurvA(q)(uq , vq )q ∈ Q, ν ∈ O, ξ, ξ , η, η ∈ g, uq , vq ∈ Tq Q, whereuq = ξQ(q) + horq uq ,vq = ηQ(q) + horq vqis the verticalhorizontal splitting on Tq Q given by A.∼So, (T ∗Q)O , (ωQ)O ←→ T ∗(Q/G) ×Q/G O, ωA symplectomorphismGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201542Reconstruction of dynamicsGiven: Integral curve cµ(t) of Xhµ ∈ X ((T ∗Q)µ). Let αq ∈ J−1(µ).Find integral curve c(t) of Xh ∈ X(T ∗Q) with initial condition αq .Solution: c(t) = g(t) · d(t). Let A ∈ Ω1(J−1(µ); gµ) be a connectionand take d(t) to be the horizontal lift through αq of cµ(t). Solveg(t) = TeLg(t)ξ(t), g(0) = e. So, it all comes down to:˙• Choice of a convenient connection A ∈ Ω1(J−1(µ); gµ).• Finding ξ(t) ⊂ gµ in terms of d(t).∼1.) Gµ = S 1 or R. Let ζ ∈ gµ be a basis. Identify R a ←→ aζ ∈ gµ.1Connection A = µ,ζ θµ ∈ Ω1(J−1(µ)), where θµ is the pull backto J−1(µ) of the canonical θQ ∈ Ω1(T ∗Q); ωQ = −dθQ canonical1symplectic form on T ∗Q. The curvature is CurvA = − µ,ζ ωµ ∈∂Ω2((T ∗Q)µ). Then ξ(t) = dh(Λ)(d(t)), where Λ = pi ∂p (uniquei∗ Q satisfying dθ (Λ, ·) = θ ).vector ﬁeld on TQQGeometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 2015432.) Let A ∈ Ω1(Q; gµ) be a connection on the left Gµprincipalbundle Q → Q/Gµ. A induces a connection A ∈ Ω1(J−1(µ); gµ) byA(αq ) Vαq := A(q) Tαq πQ Vαq,∗q ∈ Q, αq ∈ Tq Q, Vαq ∈ Tαq (T ∗Q).Then ξ(t) = A(q(t)) Fh(d(t) ⊂ gµ, Fh : T ∗Q → T Q ﬁber derivative,q(t) := πQ(d(t)) ⊂ Q.3.) Let (Q, ·, · ) be a Riemannian manifold and G act by isometries. The mechanical connection is deﬁned by requiring that itshorizontal bundle is the orthogonal to the vertical bundle.Amech(q)(uq ) := Iµ(q)−1J uq ,q ∈ Q, uq ∈ Tq Q∼∗uq := uq , · ∈ Tq Q, Iµ(q) : gµ −→ g∗ is the µlocked inertia tensorµdeﬁned for each q ∈ Q by Iµ(q)(ζ)(η) := ζQ(q), ηQ(q) . Specialsituation of 2.). Thenξ(t) = Amech(q(t)) d(t).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 2015444.) Simple mechanical systems. The Hamiltonian is of the formh = k + v ◦ πQ, where k is the kinetic energy of the cometric on T ∗Qdetermined by a Riemannian metric ·, · on Q and v ∈ C ∞(Q). Gacts by isometries on Q and the potential energy v is Ginvariant.The reconstruction method is quite explicit in this case.∗Given is αq ∈ J−1(µ) ⊂ Tq Q and the solution cµ(t) ⊂ (T ∗Q)µ of Xhµwith initial condition [αq ] ∈ (T ∗Q)µ.Step 1.) ϕµ : (T ∗Q)µ, (ωQ)µ → T ∗(Q/Gµ), ωQ/Gµ − Bµ , symplectic embedding onto a vector subbundle, Bµ induced by the mechanical connection. Then ϕµ(cµ(t)) is an integral curve of theHamiltonian system on T ∗(Q/Gµ), ωQ/Gµ − Bµ given by the kineticenergy of the quotient Riemannian metric on Q/Gµ and the quotient of the amended potential vµ := h ◦ αµ ∈ C ∞(Q). Compute thecurvesϕµ(cµ(t)) ⊂ T ∗(Q/Gµ) andqµ(t) := πQ/Gµ (ϕµ(cµ(t))) ⊂ Q/Gµ.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201545Step 2.) Using the mechanical connection Amech ∈ Ω1(Q; gµ),horizontally lift qµ(t) ∈ Q/Gµ to a curve qh(t) ⊂ Q with qh(0) = q.Step 3.) Determine ξ(t) ⊂ gµ from the algebraic equationξ(t)Q(qh(t)), ηQ(qh(t))= µ, η , ∀η ∈ gµ.So, qh(0) and ξ(0)Q(q) are the horizontal and vertical components˙of the vector αq ∈ Tq Q.Step 4.) Solve g(t) = TeLg(t)ξ(t) in Gµ with g(0) = e.˙Step 5.) With qh(t) from Step 2.) and g(t) from Step 4.),deﬁne q(t) := g(t) · qh(t). This is the base integral curve of thesimple mechanical system with Hamiltonian h = k + v ◦ πQ satisfyingq(0) = q. The curve q(t) ⊂ T ∗Q is the integral curve of Xh with˙q(t) (0) = αq .˙Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201546Interesting special cases(a) If Gµ is Abelian, equation in Step 4.) has the solutiontg(t) =0ξ(s)ds.(b) Gµ = S 1, ζ basis of gµ. Can solve for ξ(t) in Step 3.), namelyµ, ζξ(t) =ζζQ(qh(t)) 2and hencetq(t) = expµ, ζds0 ζQ (qh (s)) 2· qh(t)(c) If G is compact and (·, ·) is a positive deﬁnite metric, invariantunder the adoint Gaction on g, and satisfying(ζ, η) =ζQ(q), ηQ(q),∀q ∈ Q, ζ, η ∈ g,then ξ ∈ gµ is uniquely determined by (ξ, ·) = µgµ and g(t) = exp(tξ).Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201547(d) If G is solvable, let {ξ1, . . . , ξn} ⊂ g be a basis. Writeg(t) = exp(f1(t)ξ1) · · · exp(fn(t)ξn).Wei and Norman [1964] have shown that g(t) = TeLg(t)ξ(t) can be˙solved by quadratures for the all the functions f1(t), . . . , fn(t).˙(e) If ξ(t) = α(t)ξ(t) for a known function α(t), then g(t) =exp(f (t)ξ(t)) solves g(t) = TeLg(t)ξ(t), where˙tf (t) =sexp0tα(r)dr ds.The conditions in (c) are very strong, but they hold for the KaluzaKlein construction. Many of these formulas are very useful whenone wants to compute geometric phases.What happens if the action of G on Q is not free? Only partialresults of Perlmutter and Rodr´ıguezOlmos. General case is open.Geometric Science of Information, Ecole Polytechnique, ParisSaclay, October 2830, 201548
Dimension reduction on Riemannian manifolds (chaired by Xavier Pennec, Alain Trouvé)
This paper presents derivations of evolution equations for the family of paths that in the Diffusion PCA framework are used for approximating data likelihood. The paths that are formally interpreted as most probable paths generalize geodesics in extremizing an energy functional on the space of differentiable curves on a manifold with connection. We discuss how the paths arise as projections of geodesics for a (non bracketgenerating) subRiemannian metric on the frame bundle. Evolution equations in coordinates for both metric and cometric formulations of the subRiemannian geometry are derived. We furthermore show how rankdeficient metrics can be mixed with an underlying Riemannian metric, and we use the construction to show how the evolution equations can be implemented on finite dimensional LDDMM landmark manifolds.

Faculty of ScienceAnisotropic Distributions on Manifolds,Diffusion PCA, and Evolution EquationsGSI 2015, Paris, FranceStefan SommerDepartment of Computer Science, University of CopenhagenOctober 29, 2015Slide 1/21Intrinsic Statistics in Geometric SpacesStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 2/21Statistics on Manifolds1• Frechet mean: argminx ∈M N ∑N 1 d (x , yi )2´i=• PGA (Fletcher et al., ’04); GPCA (Huckeman et al.,’10); HCA (Sommer, ’13); PNS (Jung et al., ’12); BS(Pennec, ’15)PGAGPCAHCAStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 3/21Inﬁnitesimally deﬁned Distributions; MLE• aim: construct a family NM (µ, Σ) of anisotropicGaussianlike distributions; ﬁt by MLE/MAP• in Rn , Gaussian distributions are transitiondistributions of diffusion processesdXt = dWt• on (M , g ), Brownian motion is transition distribution ofstochastic process (EellsElworthyMalliavinconstruction), or solution to heat diffusion equation∂1p(t , x ) = ∆p(t , x )∂t2• inﬁnitesimal dXtvs.global pt (x ; y ) ∝ e− x −y2Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 4/21MLE of Diffusion Processes• EellsElworthyMalliavin construction gives map: FM → Dens(M )DiffDiff (FM ) = NM ⊂ Dens(M ): the set of (normalized)transition densities from FM diffusions• γ = Diff (x , Xα ) = pγ γ0 , the loglikelihood•Nln L (x , Xα ) = ln L (γ) =∑ ln pγ(yi )i =1• Estimated Template: argmax(x ,Xα )∈FM ln L (x , Xα )• MLE of data yi under the assumption y ∼ γ ∈ NM• Diffusion PCA (Sommer ’14): argmax ln L (x , Xα + εI )generalizing Probabilistic PCA (Tipping, Bishop, ’99;Zhang, Fletcher ’13)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 5/21Most Probable Paths to Samples• Euclidean:T−1• density pt (x ; y ) ∝ e−(x −y ) Σ (x −y )• transition density of diffusion processes withstationary generator• x − y most probable path from y to x• Manifolds:• which distributions correspond to anisotropicGaussian distributions N (x , Σ)?• what is the most probable path from y to x?Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 6/21Anisotropic Diffusions and Holonomy• driftless diffusion SDE in Rn , stationary generator:dXt = σdWt , σ ∈ M n×d• diffusion ﬁeld σ, inﬁnitesimal generator σσT• curvature: stationary ﬁeld/generator cannot bedeﬁned due to holonomyStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 7/21Stochastic Development:EellsElworthyMalliavin Construction• Xt : Rn valued Brownian motion (driving process)• Ut : FM valued (subelliptic) diffusion• Yt : M valued stochastic process (target process)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 8/21The Frame Bundle• the manifold and frames (bases) for the tangentspaces Tp M• F (M ) consists of pairs u = (x , Xα ), x ∈ M, Xα framefor Tx M• curves in the horizontal part of F (M ) correspond tocurves in M and parallel transport of framesStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 9/21Driving process, FM valued process andTarget process• Hi , i = 1 . . . , n horizontal vector ﬁelds on F (M ):Hi (u ) = π−1 (ui )∗• SDE in Rn (driving):dXt = Idn dBt , X0 = 0• SDE in FM:dUt = Hi (Ut ) ◦ dXti ,U0 = (x0 , Xα ) , Xα ∈ GL(Rn , Tx0 M)• Process on M (target):Yt = πFM (Ut )Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 10/21Ut : Frame Bundle DiffusionStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 11/21Estimated TemplatesMLE templateStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 12/21Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• OnsagerMachlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• OnsagerMachlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• OnsagerMachlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))• MPP for target processStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• OnsagerMachlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))• MPP for driving processR=0Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21Deﬁnition (MPPs for Driving Process)Let Xt be the driving process for the diffusion Yt and x ∈ M, i.e.Yt = π(φ(Xt )). Then σ is a most probable path for the drivingprocess if it satisﬁes1σ = argminc ∈H (Rd ),φ(c )(1)=x−L(ct )dt0PropositionLet Yα be a frame for Ty M, and let Yt = π(φ(y ,Yα ) (Xt )), i.e. Yt isthe development of Xt starting at (y , Yα ). Then MPPs for thedriving process Xt maps to geodesics of a lifted subRiemannianmetric on FM:−−˜˜w , w FM = Xα 1 π∗ w , Xα 1 π∗ w Rn .• isotropic case, MPPs for drv. process maps to geodesics1• if − ln L (x , Xα ) ≈ c + N ∑N 1 p(MPP(x , yi )). Then Frechet´i=mean ≈ MLE, isotropic caseStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 14/21MPPs on S2increasing anisotropy −→(a) cov. diag(1, 1) (b) cov. diag(2, .5) (c) cov. diag(4, .25)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 15/21SubRiemannian Geometry on FM• Xα : Rn → Tx M gives innerproduct−−v , w Xα = Xα 1 v , Xα 1 w Rn• optimal control problem with nonholonomicconstraints1˙ 2xt = arg min0 ct Xα,t dtct ,c0 =x ,c1 =y• let˜ ˜v,wHFM−−˜= Xα,t1 π∗ (˜), Xα,t1 π∗ (w )vRnon H(xt ,Xα,t ) FM. This deﬁnes a subRiemannianmetric G on TFM and equivalent problem1(xt , Xα,t ) =arg min(ct ,Cα,t ),c0 =x ,c1 =y0˙ ˙(ct , Cα,t )2HFM dt˙ ˙with horizontality constraint (ct , Cα,t ) ∈ H(ct ,Cα,t ) FMStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 16/21MPP Evolution Equations• subRiemannian HamiltonJacobi equationspq˙ytk = Gkj (yt )ξt ,j1 ∂G˙ξt ,k = −ξt ,p ξt ,q2 ∂y k,i• in coordinates (x i ) for M, Xα for Xα , and W encodingk lthe inner product W kl = δαβ Xα Xβ :jβ˙x i = W ij ξj − W ih Γh ξjβ,iijβαα˙iXα = −Γh W hj ξj + Γk W kh Γh ξjβ1 hγ kh kδhkk˙Γk ,i W Γh + Γk γ W kh Γhδi ξhγ ξkδξi = W hl Γl ,δ ξh ξkδ −i,2˙ξiα =hkΓk γiα W kh Γhδ ξhγ ξkδ,−12kk− W hl,iα Γl δ + W hl Γl ,δα ξh ξkδihγkW hk,iα ξh ξk + Γk W kh,iα Γhδ ξhγ ξkδStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 17/21Landmark LDDMM• Christoffel symbols (Michelli et al. ’08)1Γk ij = gir g kl g rs,l − g sl g rk,l − g rl g ks,l gsj2• mix of transported frame and cometric: F d M bundle˜of rank d linear maps Rd → Tx M, ξ, ξ ∈ T ∗ F d M,cometric gF d M + λgR :˜˜˜ξ, ξ = δαβ (ξπ−1 Xα )(ξπ−1 Xβ ) + λ ξ, ξ∗∗gR• the whole frame need not be transportedStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 18/21LDDMM Landmark MPPs+ horz. var.isotropic+ vert. var.Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 19/21Statistical Manifold: Geometry of Γ• DensitiesDens(M ) = {γ ∈ Ωn (M ) :γ = 1, γ > 0}MFR• FisherRao metric: Gγ (α, β) =• Γ ﬁnite dim. subset of Dens(M )αβM γ γγ: FM → Dens(M )Diff0• naturally deﬁned on bundle of symmetric positive T2tensorsStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 20/21Summary• inﬁnitesimal deﬁnition of anisotropic normaldistributions NM (µ, Σ) on M• diffusion map Diff : FM → Dens(M ) fromEellsElworthyMalliavin construction, stoch. develop.• MLE of template / covariance (in FM)• MPPs for driving processes generalize geodesicsbeing subRiemannian geodesics12345Sommer: Diffusion Processes and PCA on Manifolds, Oberwolfach extendedabstract (Asymptotic Statistics on Stratiﬁed Spaces), 2014.Sommer: Anisotropic Distributions on Manifolds: Template Estimation and MostProbable Paths, Information Processing in Medical Imaging (IPMI) 2015.Sommer: Evolution Equations with Anisotropic Distributions and Diffusion PCA,Geometric Science of Information (GSI) 2015.Svane, Sommer: Similarities, SDEs, and Most Probable Paths, SIMBAD15extended abstract.Sommer, Svane: Holonomy, Curvature, and Anisotropic Diffusions, MOTR15extended abstract.Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 21/21
This paper addresses the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. Current methods like Principal Geodesic Analysis (PGA) and Geodesic PCA (GPCA) minimize the distance to a “Geodesic subspace”. This allows to build sequences of nested subspaces which are consistent with a forward component analysis approach. However, these methods cannot be adapted to a backward analysis and they are not symmetric in the parametrization of the subspaces. We propose in this paper a new and more general type of family of subspaces in manifolds: barycentric subspaces are implicitly defined as the locus of points which are weighted means of k + 1 reference points. Depending on the generalization of the mean that we use, we obtain the Fréchet/Karcher barycentric subspaces (FBS/KBS) or the affine span (with exponential barycenter). This definition restores the full symmetry between all parameters of the subspaces, contrarily to the geodesic subspaces which intrinsically privilege one point. We show that this definition defines locally a submanifold of dimension k and that it generalizes in some sense geodesic subspaces. Like PGA, barycentric subspaces allow the construction of a forward nested sequence of subspaces which contains the Fréchet mean. However, the definition also allows the construction of backward nested sequence which may not contain the mean. As this definition relies on points and do not explicitly refer to tangent vectors, it can be extended to non Riemannian geodesic spaces. For instance, principal subspaces may naturally span over several strata in stratified spaces, which is not the case with more classical generalizations of PCA.

Xavier PennecAsclepios team, INRIA SophiaAntipolis –Mediterranée, FranceandCôte d’Azur University (UCA)Barycentric Subspaces andAffine Spans in ManifoldsGSI 30102015Statistical Analysis of Geometric FeaturesComputational Anatomy deals with noisyGeometric MeasuresTensors, covariance matricesCurves, tractsSurfaces, shapesImagesDeformationsData live on nonEuclidean manifoldsX. Pennec  GSI 20152Low dimensional subspace approximation?Manifold of cerebral ventriclesEtyngier, Keriven, Segonne 2007.Manifold of brain imagesS. Gerber et al, Medical Image analysis, 2009.Manifold dimension reductionWhen embedding structure is already manifold (e.g. Riemannian):Not manifold learning (LLE, Isomap,…) but submanifold learningX. Pennec  GSI 20153Barycentric Subspacesand Affine Spans in ManifoldsPCA in manifolds: tPCA / PGA / GPCA / HCAAffine span and barycentric subspacesConclusionX. Pennec  GSI 20154Bases of Algorithms in Riemannian ManifoldsExponential map (Normal coordinate system):Expx = geodesic shooting parameterized by the initial tangentLogx = development of the manifold in the tangent space along geodesics Geodesics = straight lines with Euclidean distance Local global domain: starshaped, limited by the cutlocus Covers all the manifold if geodesically completeReformulate algorithms with Expx and LogxVector > Bipoint (no more equivalence classes)5Statistical tools: MomentsFrechet / Karcher mean minimize the variance
We present a novel method that adaptively deforms a polysphere (a product of spheres) into a single high dimensional sphere which then allows for principal nested spheres (PNS) analysis. Applying our method to skeletal representations of simulated bodies as well as of data from real human hippocampi yields promising results in view of dimension reduction. Specifically in comparison to composite PNS (CPNS), our method of principal nested deformed spheres (PNDS) captures essential modes of variation by lower dimensional representations.

IntroductionDeformationSkeletal RepresentationsDimension Reduction on Polyspheres withApplication to Skeletal Representationsjoint work with Stephan Huckemann and Sungkyu JungBenjamin EltznerUniversity of Göttingenconference on Geometric Science of Information, 20151030Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Tangent space approaches ignore geometry and periodic topology.Intrinsic approaches rely on manifold geometry.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Tangent space approaches ignore geometry and periodic topology.Intrinsic approaches rely on manifold geometry. Two classes:Forward methods: Submanifold dimension d = 1, 2, 3, . . .Needs “good” geodesics and a construction scheme.Backward methods: d = D − 1, D − 2, D − 3, . . .Needs rich (parametric) set of submanifolds.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsPolysphere Dimension ReductionddKAlmost all geodesics of PD = Sr11 × · · · × SrK are dense in (S1 )K .Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsPolysphere Dimension ReductionddKAlmost all geodesics of PD = Sr11 × · · · × SrK are dense in (S1 )K .Low symmetry isom(PD ) = SO(d1 + 1) × · · · × SO(dK + 1), nogeneric rich set of submanifolds.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsConclusionDeformation for Unit SpheresDimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3Recursively deform polysphere to sphere f : PD → SD .Squared line elements of two unit spheres:d1ds21=sink=1d2k−12φ1,j dφ2 ,1,kj=1Deformation: ds2 = ds2 +2ds221k−12=sin φ2,j dφ22,kk=1d2j=1j=1sin2 φ2,j ds21S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319.S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83.3S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568.2Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsIntroductionDeformationSkeletal RepresentationsConclusionDeformation for Unit SpheresDimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3Recursively deform polysphere to sphere f : PD → SD .Squared line elements of two unit spheres:d1ds21=sink=1d2k−12φ1,j dφ2 ,1,kj=1Deformation: ds2 = ds2 +2ds22k−12=sin φ2,j dφ22,kk=1d2j=1j=1sin2 φ2,j ds21Degrees of freedom: Rotation and ordering of spheres.1S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319.S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83.3S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568.2Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsIntroductionDeformationSkeletal RepresentationsFixing Degrees of FreedomRotation:dEmbed Srii into Rdi +1 .Determine Fréchet mean µi and use rotation along a geodesic to move itˆto positive xi,di +1 direction (north pole).Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsConclusionFixing Degrees of FreedomRotation:dEmbed Srii into Rdi +1 .Determine Fréchet mean µi and use rotation along a geodesic to move itˆto positive xi,di +1 direction (north pole).Ordering:NData spread: si =d2 (ψi,n , µi )ˆn=1Choose permutation p such that sp−1 (1) is maximal and sp−1 (K) is minimal.Minimizes distortion due to factors sin2 φj , i. e. deviation from polyspheregeometry.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsIntroductionDeformationSkeletal RepresentationsMapping Data PointsdEmbedding S1i ⊂ Rdi +1 we get∀1 ≤ j ≤ d2 :yj = x2,j ,∀1 ≤ k ≤ d1 + 1 :Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representationsyd2 +k = x2,d1 +1 x1,jConclusionIntroductionDeformationSkeletal RepresentationsConclusionMapping Data PointsdEmbedding S1i ⊂ Rdi +1 we get∀1 ≤ j ≤ d2 :yj = x2,j ,∀1 ≤ k ≤ d1 + 1 :yd2 +k = x2,d1 +1 x1,jFor different radii, rescale∀1 ≤ j ≤ d1 + 1 :∀i > 1 ∀1 ≤ j ≤ di :x1,j → ˜1,j = R1 x1,j ,xxi,j → ˜i,j = Ri xi,jxand use ˜ in deﬁnition of y coordinates.xThis yields an ellipsoidd1 +1d22R−2 x2,k +2x ∈ Rd2 +d1 +1k=1R−2 (x2,d2 +1 x1,k )2 = 11k=1Normalize all yvectors to length R :=Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsKj=1Rj1Kas ﬁnal step..IntroductionDeformationSkeletal RepresentationsIllustration for Different Radii1. Map from bluepolysphere togreen ellipsoid.2. Map to redsphere.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsA Brief Review of Principal Nested Spheres (PNS)PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}.Recursively ﬁt small subsphere Sd−1 ⊂ Sd minimizing sum of squaredgeodesic projection distances.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsA Brief Review of Principal Nested Spheres (PNS)PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}.Recursively ﬁt small subsphere Sd−1 ⊂ Sd minimizing sum of squaredgeodesic projection distances.At every projection, save signed projection distance (residuals).Parameter space dimension for Sd−1 ⊂ Sd is p = d + 1, compared tolinear PCA where for Rd−1 ⊂ Rd it is p = d.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsSkeletal Representation (srep) Parameter SpaceSrep consists of1. A twodimensional mesh of m × nskeletal points.2. Spokes from mesh points to the surface.Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsSkeletal Representation (srep) Parameter SpaceSrep consists of1. A twodimensional mesh of m × nskeletal points.2. Spokes from mesh points to the surface.Parameters: Size of centered mesh, spoke lengths, normalizedmeshpoints, spoke directions:Q = R+ × RK × S3mn−1 × S2+Polysphere deformation on S3mn−1 × S2KKyieldsQ = S5mn+2m+2n−5Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsDimension Reduction for Real SrepsPNDS: Deform polysphere to sphere and apply PNS.CPNS: PNS on spheres individually and linear PCA on joint residuals.100PNDSCPNSVariances [%]80604020001020304050DimensionFigure : PNDS vs. CPNS: residual variances for sreps of 51 hippocampi5 .5S. M. Pizer et al. Ed. by M. Breuß, Bruckstein, and Maragos. Springer, Berlin, 2013, pp. 93–115.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusionIntroductionDeformationSkeletal RepresentationsConclusionDimension Reduction for Simulated Sreps0.40component 1variance = 92.02%components 2 and 1100components 3 and 11000.45component 1variance = 62.73%0.400.35components 2 and 1components 3 and 110010050500.350.3050500.300.250.250.2000−50−50−100−100000.200.150.100.150.00−100−50050−100100components 1 and 21.2−500500.05−100100component 2variance = 5.95%−500501000.00components 3 and 2100−50−50−100−1000.100.05−100−50050−100100components 1 and 20.5−50050−100100component 2variance = 32.10%−50050100components 3 and 21001001.0501000.4500.850500.300.60000.20.4−50−50−500.10.2−100−100−500501000.0components 1 and 3−50050−10010010050−100−100−100components 2 and 3100−50501.8−50050−100−100100component 3variance = 0.64%−500501000.0components 1 and 31.6−100−50050−100100components 2 and 31.210010050500−50−1001000.4−100500.6−5000.80−50component 3variance = 2.17%1.01.41.21.0000.8−500.6−50−1000.40.2−100−100−50050100−100−500501000.0−100−50050100−100−500501000.2−100−500501000.0−100−50050100Figure : PNDS vs. CPNS for simulated twisted ellipsoids: scatter plots of residualsigned distances for the ﬁrst three components.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsIntroductionDeformationSkeletal RepresentationsReﬂection on Parameter Space Dimension1.00.50.0−0.5Parameter space dimensions:−1.01.00.5−1.0−0.50.00.00.51.0−0.5−1.0Figure : Simulated twisted ellipsoiddata projected to the secondcomponent (a small twosphere) inPNDS with ﬁrst component (a smallcircle) inside.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsPNS on SD : p = 1 D(D + 3) − 1.2PCA on RD : p = 1 D(D + 1).2ConclusionIntroductionDeformationSkeletal RepresentationsConclusionWe propose a deformation procedure mapping data on a polysphere tosphere.The construction aims at minimizing geometric distortion.We achieve lower dimensional representations than CPNS.The success of our method is rooted in the higher parameter spacedimension.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion
This paper studies the affineinvariant Riemannian distance on the RiemannHilbert manifold of positive definite operators on a separable Hilbert space. This is the generalization of the Riemannian manifold of symmetric, positive definite matrices to the infinitedimensional setting. In particular, in the case of covariance operators in a Reproducing Kernel Hilbert Space (RKHS), we provide a closed form solution, expressed via the corresponding Gram matrices.

Afﬁneinvariant Riemanniandistance betweeninﬁnitedimensional covarianceoperatorsH` Quang MinhaIstituto Italiano di Tecnologia, ITALYAfﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.1/52From ﬁnite to inﬁnite dimensionsAfﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.2/52Outline1. Review of ﬁnitedimensional setting:Afﬁneinvariant Riemannian metric on the manifold ofsymmetric positive deﬁnite matrices2. Inﬁnitedimensional generalization: RiemannHilbertmanifold of positive deﬁnite unitized HilbertSchmidtoperators3. Afﬁneinvariant Riemannian distance betweenReproducing Kernel Hilbert Spaces (RKHS)covariance operatorsAfﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.3/52Positive deﬁnite matricesSym++ (n) = symmetric, positive deﬁnite n × n matricesHave been studied extensively mathematicallyNumerous practical applicationsBrain imaging (Arsigny et al 2005, Dryden et al2009, Qiu et al 2015)Computer vision: object detection (Tuzel et al 2008,Tosato et al 2013), image retrieval (Cherian et al2013), visual recognition (Jayasumana et al 2015)Radar signal processing: Barbaresco (2013),Formont et al 2013Machine learning: kernel learning (Kulis et al 2009)Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.4/52Positive deﬁnite matricesSym++ (n) = symmetric, positive deﬁnite n × n matricesDifferentiable manifold viewpointTangent space TP (Sym++ )(n) ∼ Sym(n) = vector space=of symmetric matricesAfﬁneinvariant Riemannian metric: on TP (Sym++ (n))A, BP= P −1/2 AP −1/2 , P −1/2 BP −1/2F= tr[P −1 AP −1 B]with the Frobenius inner product A, BF= tr(AT B)Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.5/52Positive deﬁnite matricesRiemannian metric: on TP (Sym++ (n)) = Sym(n)A, BP= P −1/2 AP −1/2 , P −1/2 BP −1/2with the Frobenius inner product A, BFF= tr(AT B)AfﬁneinvarianceCAC T , CBC TCP C T= A, BPfor any matrix C ∈ GL(n)Siegel (1943), Mostow (1955), Pennec et al 2006,Bhatia 2007, Moakher and Zéraï 2011, Bini andIannazzo 2013Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.6/52Positive deﬁnite matricesGeodesically complete, with nonpositive curvatureGeodesic joining P, Q ∈ Sym++ (n)γP Q (t) = P 1/2 (P −1/2 QP −1/2 )t P 1/2The exponential mapExpP : TP (Sym++ (n)) → Sym++ (n)ExpP (V ) = P 1/2 exp(P −1/2 V P −1/2 )P 1/2is deﬁned on all of TP (Sym++ (n)Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.7/52Positive deﬁnite matricesRiemannian distancedaiE (A, B) =  log(A−1/2 BA−1/2 Fwhere log(A) is the principal logarithm of AA = U DU T = U diag(λ1 , . . . , λn )U Tlog(A) = U log(D)U T = U (log λ1 , . . . , log λn )U TAfﬁneinvariancedaiE (CAC T , CBC T ) = daiE (A, B)for any matrix C ∈ GL(n).Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.8/52Positive deﬁnite matricesOther metrics/distancesLogEuclidean metric: biinvariant Riemannian metric(Arsigny et al 2007)dlogE (A, B) =  log(A) − log(B)FNonRiemannian metrics: Bregman divergences, e.g.Stein divergence, (its square root is a metric, Sra 2012)dstein (A, B) = logdetA+B2det(A) det(B)Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.9/52Covariance matricesρ= Borel probability distribution on Rn ,Mean vector (in Rn )Rnx2 dρ(x) < ∞xdρ(x)µ = Eρ [x] =RnCovariance matrix (n × n)C = Eρ [(x − µ)(x − µ)T ] = E[xxT ] − µµTFor ρ1 ∼ N (µ, C1 ), ρ2 ∼ N (µ, C2 )daiE (C1 , C2 ) = 2(FisherRao distance between ρ1 and ρ2 )Afﬁneinvariant Riemannian distance between inﬁnitedimensional covariance operators – p.10/52Empirical covariance matricesx = [x1 , . . . , xm ] = data matrix randomly sampled fromX = Rn , with m observations, xi ∈ RnEmpirical mean vector1µx
We develop a generic framework to build large deformations from a combination of base modules. These modules constitute a dynamical dictionary to describe transformations. The method, built on a coherent subRiemannian framework, defines a metric on modular deformations and characterises optimal deformations as geodesics for this metric. We will present a generic way to build local affine transformations as deformation modules, and display examples.

A subRiemannian modular approach fordiffeomorphic deformationsGSI 2015Barbara GrisAdvisors: Alain Trouvé (CMLA) and Stanley Durrleman (ICM)gris@cmla.enscachan.frOctober 30, 20151Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsIntroduction"Is it possible to mechanize human intuitive understanding of biologicalpictures that typically exhibit a lot of variability but also possesscharacteristic structure ?"Ulf GrenanderHands : a Pattern Theoric Study of Biological Shapes, 1991IntroductionStructure in dataIntroductionStructure in dataϕt = vt ◦ ϕt , ϕt=0 = Id˙IntroductionStructure in dataStructure in deformationsIntroductionStructure in dataStructure in deformationsType of vector ﬁeldsPrevious workslocally afﬁne deformationsPolyafﬁne[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyafﬁne transformation trees.Medical image analysis, 2012]Previous workslocally afﬁne deformationsPolyafﬁne[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyafﬁne transformation trees.Medical image analysis, 2012]v (x) =iwi (x)Ai (x)Previous workslocally afﬁne deformationsPolyafﬁne[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyafﬁne transformation trees.Medical image analysis, 2012]v (x) =iwi (x)Ai (x)Deformation structure does not evolve with the ﬂowPrevious worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higherorder momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higherordermomentum distributions and locally afﬁe lddmm registration. SIAM Journal on Imaging Sciences, 2013]Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higherorder momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higherordermomentum distributions and locally afﬁe lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal datadriven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123134. Springer, 2011]Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higherorder momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higherordermomentum distributions and locally afﬁe lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal datadriven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123134. Springer, 2011]Deformation structure evolves with ﬂowPrevious worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sousriemannienne en dimension inﬁnie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector ﬁeldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higherorder momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higherordermomentum distributions and locally afﬁe lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal datadriven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123134. Springer, 2011]Deformation structure evolves with ﬂowNo control on deformation structurePrevious worksConstraintsDiffeons[L. Younes. Constrained diffeomorphic shape evolution. Foundations of Computational Mathematics, 2012.]Our model : Deformation modulesPurpose :Our model : Deformation modulesPurpose :Incorporate constraints in the deformation modelOur model : Deformation modulesPurpose :Incorporate constraints in the deformation modelMerge different constraints in a complex oneSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsDeformation modulesDeﬁnition and ﬁrst examplesA deformation module :Deformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesDeformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesCan generate vector ﬁelds that :Deformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesCan generate vector ﬁelds that :are of a particular typeDeformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesCan generate vector ﬁelds that :are of a particular type−→ deformation structureDeformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesCan generate vector ﬁelds that :are of a particular type−→ deformation structuredepend on the state of the shapeDeformation modulesDeﬁnition and ﬁrst examplesA deformation module :Contains a space of shapesCan generate vector ﬁelds that :are of a particular type−→ deformation structuredepend on the state of the shape−→ the deformation structure evolves with the ﬂowSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsDeformation modulesDeﬁnition and ﬁrst examples : local translation of scale σExample of generated vector ﬁeldDeformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)Deformation modulesDeﬁnition and ﬁrst examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)There exists C > 0 :∀(o, h) ∈ O × H:ζ(o, h)2 ≤ C c(o, h)VDeformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σDeformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldDeformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldDeformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldDeformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldz3z2z1Deformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldd3z3z2d2z1d1Deformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σExample of generated vector ﬁeldd3z3z2d2z1d1Deformation modulesDeﬁnition and ﬁrst examples : local scaling of scale σDeformation modulesDeﬁnition and ﬁrst examples : local rotation of scale σIntroductionDeﬁnition and ﬁrst examples : local translation of scale σ and ﬁxed directionSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsDeformation modulesModular large deformationsM = (O, H, V , ζ, ξ, c)Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H).Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H).v = v ◦ ϕv , ϕv−→ Solutions of ϕt˙ttt=0 = Id exist.Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H).v = v ◦ ϕv , ϕv−→ Solutions of ϕt˙ttt=0 = Id exist.−→ ϕv = modular large deformation.Deformation modulesModular large deformations : an exampleSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsDeformation modulesCombinationDeformation modulesCombinationDeformation modulesCombinationFeatures :iiif coi (hi ) = ζoi (hi )2 i then co (h) =Viiζoi (hi )2 i = Vi2i ζoi (hi )VDeformation modulesCombinationFeatures :iiif coi (hi ) = ζoi (hi )2 i then co (h) =Viiζoi (hi )2 i = Vi2i ζoi (hi )VGeometrical descriptors are transported by the global vector ﬁeldDeformation modulesCombinationFeatures :iiif coi (hi ) = ζoi (hi )2 i then co (h) =Viiζoi (hi )2 i = Vi2i ζoi (hi )VGeometrical descriptors are transported by the global vector ﬁeldCoherent mathematical framework : possibility to combine anymodulesDeformation modulesCombination : Example of modular large deformationSommaire1Introduction2Deformation modulesDeﬁnition and ﬁrst examplesModular large deformationsCombining deformation modules3Numerical resultsDeformation modulesMatching problemDeformation modulesMatching problemDeformation modulesMatching problem10co (h) + g(ϕv · fsource , ftarget )t=1v = ζo (h)[N. Charon and A. Trouvé. The varifold representation of nonoriented shapes for diffeomorphic registration, 2013]Deformation modulesMatching problemDeformation modulesMatching problemDeformation modulesMatching problemDeformation modulesMatching problemDeformation modulesMatching problemDeformation modulesMatching problemConclusionWe have presentedConclusionWe have presented a coherent mathematical frameworkConclusionWe have presented a coherent mathematical framework to buildmodular large deformations.ConclusionWe have presented a coherent mathematical framework to buildmodular large deformations. We showed how easily incorporatingconstraints in a deformation modelConclusionWe have presented a coherent mathematical framework to buildmodular large deformations. We showed how easily incorporatingconstraints in a deformation model and merging different constraints ina global one.Conclusion"Is it possible to mechanize human intuitive understanding of biologicalpictures that typically exhibit a lot of variability but also possesscharacteristic structure ?"Ulf GrenanderHands : a Pattern Theoric Study of Biological Shapes, 1991Thank you for your attention !
Optimization on Manifold (chaired by PierreAntoine Absil, Rodolphe Sepulchre)
The Riemannian trustregion algorithm (RTR) is designed to optimize differentiable cost functions on Riemannian manifolds. It proceeds by iteratively optimizing local models of the cost function. When these models are exact up to second order, RTR boasts a quadratic convergence rate to critical points. In practice, building such models requires computing the Riemannian Hessian, which may be challenging. A simple idea to alleviate this difficulty is to approximate the Hessian using finite differences of the gradient. Unfortunately, this is a nonlinear approximation, which breaks the known convergence results for RTR. We propose RTRFD: a modification of RTR which retains global convergence when the Hessian is approximated using finite differences. Importantly, RTRFD reduces gracefully to RTR if a linear approximation is used. This algorithm is available in the Manopt toolbox.

Ditch the Hessian Hasslewith Riemannian Trust RegionsNicolas Boumal, Inria & ENS ParisGeometric Science of Information, GSI 2015Oct. 30, 2015, ParisThe goal is to optimizea smooth functionon a smooth manifoldThe Trust Region methodis like Newton’swith a safeguardOn the tangent space,optimize the model in a trust region