GSI2015

About

LIX Colloquium 2015 conferences

As for GSI’13, the objective of this SEE Conference GSI’15, hosted by Ecole Polytechnique, is to bring together pure/applied mathematicians and engineers, with common interest for Geometric tools and their applications for Information analysis.
It emphasizes an active participation of young researchers to discuss emerging areas of collaborative research on “Information Geometry Manifolds and Their Advanced Applications”.
Current and ongoing uses of Information Geometry Manifolds in applied mathematics are the following: Advanced Signal/Image/Video Processing, Complex Data Modeling and Analysis, Information Ranking and Retrieval, Coding, Cognitive Systems, Optimal Control, Statistics on Manifolds, Machine Learning, Speech/sound recognition, natural language treatment, etc., which are also substantially relevant for industry.
The Conference will be therefore held in areas of priority/focused themes and topics of mutual interest with the aim to:
  • Provide an overview on the most recent state-of-the-art
  • Exchange mathematical information/knowledge/expertise in the area
  • Identify research areas/applications for future collaboration
  • Identify academic & industry labs expertise for further collaboration
This conference will be an interdisciplinary event and will unify skills from Geometry, Probability and Information Theory. The conference proceedings are published in Springer's Lecture Note in Computer Science (LNCS) series. 

Authors will be solicited to submit a paper in a special Issue "Differential Geometrical Theory of Statistics” in ENTROPY Journal, an international and interdisciplinary open access journal of entropy and information studies published monthly online by MDPI

Provisional Topics of Special Sessions:

  • Manifold/Topology Learning
  • Riemannian Geometry in Manifold Learning
  • Optimal Transport theory and applications in Imagery/Statistics
  • Shape Space & Diffeomorphic mappings
  • Geometry of distributed optimization
  • Random Geometry/Homology
  • Hessian Information Geometry
  • Topology and Information
  • Information Geometry Optimization
  • Divergence Geometry
  • Optimization on Manifold
  • Lie Groups and Geometric Mechanics/Thermodynamics
  • Quantum Information Geometry
  • Infinite Dimensional Shape spaces
  • Geometry on Graphs
  • Bayesian and Information geometry for inverse problems
  • Geometry of Time Series and Linear Dynamical Systems
  • Geometric structure of Audio Processing  
  • Lie groups in Structural Biology
  • Computational Information Geometry

Committees

Secrétaire

Webmestre

Program chairs

Scientific committee

Sponsors and Organizers

Documents

XLS

Keynote speach Matilde Marcolli (chaired by Daniel Bennequin)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic 
 

From Geometry and Physics to ComputationalLinguisticsMatilde MarcolliGeometric Science of Information, Paris, October 2015Matilde MarcolliGeometry, Physics, Linguistics A Mathematical Physicist’s adventures in LinguisticsBased on:1Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark,Crystal Liang, Shival Dasu, Matilde Marcolli, PersistentTopology of Syntax, arXiv:1507.051342Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models ofSyntax and Language Evolution, arXiv:1508.005043Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun,Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence andrecoverability of syntactic parameters in sparse distributedmemories, arXiv:1510.063424Sharjeel Aziz, Vy-Luan Huynh, David Warrick, MatildeMarcolli, Syntactic Phylogenetic Trees, in preparation...coming soon to an arXiv near youMatilde MarcolliGeometry, Physics, Linguistics What is Linguistics?• Linguistics is the scientific study of language- What is Language? (langage, lenguaje, ...)- What is a Language? (lange, lengua,...)Similar to ‘What is Life?’ or ‘What is an organism?’ in biology• natural languageas opposed to artificial (formal, programming, ...) languages• The point of view we will focus on:Language is a kind of Structure- It can be approached mathematically and computationally, likemany other kinds of structures- The main purpose of mathematics is the understanding ofstructuresMatilde MarcolliGeometry, Physics, Linguistics • How are di↵erent languages related?What does it mean that they come in families?• How do languages evolve in time?Phylogenetics, Historical Linguistics, Etymology• How does the process of language acquisition work?(Neuroscience)• Semiotic viewpoint (mathematical theory of communication)• Discrete versus Continuum(probabilistic methods, versus discrete structures)• Descriptive or Predictive?to be predictive, a science needs good mathematical modelsMatilde MarcolliGeometry, Physics, Linguistics A language exists at many di↵erent levels of structureAn Analogy: Physics looks very di↵erent at di↵erent scales:General Relativity and Cosmology (1010 m)Classical Physics (⇠ 1 m)Quantum Physics ( 10Quantum Gravity (103510m)m)Despite dreams of a Unified Theory, we deal with di↵erentmathematical models for di↵erent levels of structureMatilde MarcolliGeometry, Physics, Linguistics Similarly, we view language at di↵erent “scales”:units of sound (phonology)words (morphology)sentences (syntax)global meaning (semantics)We expect to be dealing with di↵erent mathematical structuresand di↵erent models at these various di↵erent levelsMain level I will focus on: SyntaxMatilde MarcolliGeometry, Physics, Linguistics Linguistics view of syntax kind of looks like this...Alexander Calder, Mobile, 1960Matilde MarcolliGeometry, Physics, Linguistics Modern Syntactic Theory:• grammaticality: judgement on whether a sentence is well formed(grammatical) in a given language, i-language gives people thecapacity to decide on grammaticality• generative grammar: produce a set of rules that correctly predictgrammaticality of sentences• universal grammar: ability to learn grammar is built in thehuman brain, e.g. properties like distinction between nouns andverbs are universal ... is universal grammar a falsifiable theory?Matilde MarcolliGeometry, Physics, Linguistics Principles and Parameters (Government and Binding)(Chomsky, 1981)• principles: general rules of grammar• parameters: binary variables (on/o↵ switches) that distinguishlanguages in terms of syntactic structures• Example of parameter: head-directionality(head-initial versus head-final)English is head-initial, Japanese is head-finalVP= verb phrase, TP= tense phrase, DP= determiner phraseMatilde MarcolliGeometry, Physics, Linguistics ...but not always so clear-cut: German can use both structuresauf seine Kinder stolze Vater (head-final) orer ist stolz auf seine Kinder (head-initial)AP= adjective phrase, PP= prepositional phrase• Corpora based statistical analysis of head-directionality (HaitaoLiu, 2010): a continuum between head-initial and head-finalMatilde MarcolliGeometry, Physics, Linguistics Examples of ParametersHead-directionalitySubject-sidePro-dropNull-subjectProblems• Interdependencies between parameters• Diachronic changes of parameters in language evolutionMatilde MarcolliGeometry, Physics, Linguistics Dependent parameters• null-subject parameter: can drop subjectExample: among Latin languages, Italian and Spanish havenull-subject (+), French does not (-)it rains, piove, llueve, il pleut• pro-drop parameter: can drop pronouns in sentences• Pro-drop controls Null-subjectHow many independent parameters?Geometry of the space of syntactic parameters?Matilde MarcolliGeometry, Physics, Linguistics Persistent Topology of Syntax• Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark,Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topologyof Syntax, arXiv:1507.05134Databases of Syntactic Parameters of World Languages:1Syntactic Structures of World Languages (SSWL)http://sswl.railsplayground.net/2TerraLing http://www.terraling.com/3World Atlas of Language Structures (WALS)http://wals.info/Matilde MarcolliGeometry, Physics, Linguistics Persistent Topology of Data Setshow data cluster around topological shapes at di↵erent scalesMatilde MarcolliGeometry, Physics, Linguistics Vietoris–Rips complexes• set X = {x↵ } of points in Euclidean space EN , distancePd(x, y ) = kx y k = ( N (xj yj )2 )1/2j=1• Vietoris-Rips complex R(X , ✏) of scale ✏ over field K:Rn (X , ✏) is K-vector space spanned by all unordered (n + 1)-tuplesof points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distancesd(x↵i , x↵j )  ✏Matilde MarcolliGeometry, Physics, Linguistics • inclusion maps R(X , ✏1 ) ,! R(X , ✏2 ) for ✏1 < ✏2 induce maps inhomology by functoriality Hn (X , ✏1 ) ! Hn (X , ✏2 )barcode diagrams: births and deaths of persistent generatorsMatilde MarcolliGeometry, Physics, Linguistics Persistent Topology of Syntactic Parameters• Data: 252 languages from SSWL with 115 parameters• if consider all world languages together too much noise in thepersistent topology: subdivide by language families• Principal Component Analysis: reduce dimensionality of data• compute Vietoris–Rips complex and barcode diagramsPersistent H0 : clustering of data in components– language subfamiliesPersistent H1 : clustering of data along closed curves (circles)– linguistic meaning?Matilde MarcolliGeometry, Physics, Linguistics Sources of Persistent H1• “Hopf bifurcation” type phenomenon• two di↵erent branches of a tree closing up in a looptwo di↵erent types of phenomena of historical linguisticdevelopment within a language familyMatilde MarcolliGeometry, Physics, Linguistics Persistent Topology of Indo-European Languages• Two persistent generators of H0 (Indo-Iranian, European)• One persistent generator of H1Matilde MarcolliGeometry, Physics, Linguistics Persistent Topology of Niger–Congo Languages• Three persistent components of H0(Mande, Atlantic-Congo, Kordofanian)• No persistent H1Matilde MarcolliGeometry, Physics, Linguistics The origin of persistent H1 of Indo-European Languages?Naive guess: the Anglo-Norman bridge ... but lexical not syntacticMatilde MarcolliGeometry, Physics, Linguistics Answer: No, it is not the Anglo-Norman bridge!Persistent topology of the Germanic+Latin languagesMatilde MarcolliGeometry, Physics, Linguistics Answer: It’s all because of Ancient Greek!Persistent topology with Hellenic (and Indo-Iranic) branch removedMatilde MarcolliGeometry, Physics, Linguistics Syntactic Parameters as Dynamical Variables• Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSVVery uneven distribution across world languagesMatilde MarcolliGeometry, Physics, Linguistics • Word order distribution: a neuroscience explanation?- D. Kemmerer, The cross-linguistic prevalence of SOV and SVOword orders reflects the sequential and hierarchical representationof action in Broca’s area, Language and Linguistics Compass, 6(2012) N.1, 50–66.• Internal reasons for diachronic switch?- F.Antinucci, A.Duranti, L.Gebert, Relative clause structure,relative clause perception, and the change from SOV to SVO,Cognition, Vol.7 (1979) N.2 145–176.Matilde MarcolliGeometry, Physics, Linguistics Changes over time in Word Order• Ancient Greek: switched from Homeric to Classical- A. Taylor, The change from SOV to SVO in Ancient Greek,Language Variation and Change, 6 (1994) 1–37• Sanskrit: di↵erent word orders allowed, but prevalent one inVedic Sanskrit is SOV (switched at least twice by influence ofDravidian languages)- F.J. Staal, Word Order in Sanskrit and Universal Grammar,Springer, 1967• English: switched from Old English (transitional between SOVand SVO) to Middle English (SVO)- J. McLaughlin, Old English Syntax: a handbook, Walter deGruyter, 1983.Syntactic Parameters are Dynamical in Language EvolutionMatilde MarcolliGeometry, Physics, Linguistics Spin Glass Models of Syntax• Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models ofSyntax and Language Evolution, arXiv:1508.00504– focus on linguistic change caused by language interactions– think of syntactic parameters as spin variables– spin interaction tends to align (ferromagnet)– strength of interaction proportional to bilingualism (MediaLab)– role of temperature parameter: probabilistic interpretation ofparameters– not all parameters are independent: entailment relations– Metropolis–Hastings algorithm: simulate evolutionMatilde MarcolliGeometry, Physics, Linguistics The Ising Model of spin systems on a graph G• configurations of spins s : V (G ) ! {±1}• magnetic field B and correlation strength J: HamiltonianXXH(s) =Jsv sv 0Bsve2E (G ):@(e)={v ,v 0 }v 2V (G )• first term measures degree of alignment of nearby spins• second term measures alignment of spins with direction ofmagnetic fieldMatilde MarcolliGeometry, Physics, Linguistics Equilibrium Probability Distribution• Partition Function ZG ( )ZG ( ) =Xexp(H(s))s:V (G )!{±1}• Probability distribution on the configuration space: Gibbsmeasuree H(s)PG , (s) =ZG ( )• low energy states weight most• at low temperature (large ): ground state dominates; at highertemperature ( small) higher energy states also contributeMatilde MarcolliGeometry, Physics, Linguistics Average Spin MagnetizationMG ( ) =1#V (G )XXsv P(s)s:V (G )!{±1} v 2V (G )• Free energy FG ( , B) = log ZG ( , B)✓◆11 @FG ( , B)MG ( ) =|B=0#V (G )@BIsing Model on a 2-dimensional lattice• 9 critical temperature T = Tc where phase transition occurs• for T > Tc equilibrium state has m(T ) = 0 (computed withrespect to the equilibrium Gibbs measure PG ,• demagnetization: on average as many up as down spins• for T < Tc have m(T ) > 0: spontaneous magnetizationMatilde MarcolliGeometry, Physics, Linguistics Syntactic Parameters and Ising/Potts Models• characterize set of n = 2N languages Li by binary strings of Nsyntactic parameters (Ising model)• or by ternary strings (Potts model) if take values ±1 forparameters that are set and 0 for parameters that are not definedin a certain language• a system of n interacting languages = graph G with n = #V (G )• languages Li = vertices of the graph (e.g. language thatoccupies a certain geographic area)• languages that have interaction with each other = edges E (G )(geographical proximity, or high volume of exchange for otherreasons)Matilde MarcolliGeometry, Physics, Linguistics graph of language interaction (detail) from Global LanguageNetwork of MIT MediaLab, with interaction strengths Je on edgesbased on number of book translations (or Wikipedia edits)Matilde MarcolliGeometry, Physics, Linguistics • if only one syntactic parameter, would have an Ising model onthe graph G : configurations s : V (G ) ! {±1} set the parameterat all the locations on the graph• variable interaction energies along edges (some pairs oflanguages interact more than others) • magnetic field B andcorrelation strength J: HamiltonianH(s) =Xe2E (G ):@(e)={v ,v 0 }NXJe sv ,i sv 0 ,ii=1• if N parameters, configurationss = (s1 , . . . , sN ) : V (G ) ! {±1}N• if all N parameters are independent, then it would be like havingN non-interacting copies of a Ising model on the same graph G (orN independent choices of an initial state in an Ising model on G )Matilde MarcolliGeometry, Physics, Linguistics Metropolis–Hastings• detailed balance condition P(s)P(s ! s 0 ) = P(s 0 )P(s 0 ! s) forprobabilities of transitioning between states (Markov process)• transition probabilities P(s ! s 0 ) = ⇡A (s ! s 0 ) · ⇡(s ! s 0 ) with⇡(s ! s 0 ) conditional probability of proposing state s 0 given states and ⇡A (s ! s 0 ) conditional probability of accepting it• Metropolis–Hastings choice of acceptance distribution (Gibbs)⇢1if H(s 0 ) H(s)  00⇡A (s ! s ) =0)exp( (H(sH(s))) if H(s 0 ) H(s) > 0.satisfying detailed balance• selection probabilities ⇡(s ! s 0 ) single-spin-flip dynamics• ergodicity of Markov process ) unique stationary distributionMatilde MarcolliGeometry, Physics, Linguistics Example: Single parameter dynamics Subject-Verb parameterInitial configuration: most languages in SSWL have +1 forSubject-Verb; use interaction energies from MediaLab dataMatilde MarcolliGeometry, Physics, Linguistics Equilibrium: low temperature all aligned to +1; high temperature:Temperature: fluctuations in bilingual users between di↵erentstructures (“code-switching” in Linguistics)Matilde MarcolliGeometry, Physics, Linguistics Entailment relations among parameters• Example: {p1 , p2 } = {Strong Deixis, Strong Anaphoricity}`1`2`3`4p1+11+1+1p2+10+11{`1 , `2 , `3 , `4 } = {English, Welsh, Russian, Bulgarian}Matilde MarcolliGeometry, Physics, Linguistics Modeling Entailment• variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} andY`,p2 = |S`,p2 | 2 {0, 1}• Hamiltonian H = HE + HVXHE = Hp 1 + Hp 2 =J``0`,`0 2languagesHV =XHV ,` =`J` > 0 anti-ferromagneticX⇣J`S`,p1 ,S`0 ,p1+S`,p2 ,S`0 ,p2⌘X`,p1 ,Y`,p2`• two parameters: temperature as before and coupling energy ofentailment• if freeze p1 and evolution for p2 : Potts model with externalmagnetic fieldMatilde MarcolliGeometry, Physics, Linguistics Acceptance probabilities⇡A (s ! s ± 1 (mod 3)) =H⇢1exp(:= min{H(s + 1 (mod 3)), H(sH)ififHH1 (mod 3))}0> 0.H(s)Equilibrium configuration(p1 , p2 )`1`2`3`4HT/HE(+1, 0)(+1, 1)( 1, 0)(+1, +1)HT/LE(+1, 1)( 1, 1)( 1, +1)( 1, 1)Matilde MarcolliLT/HE(+1, +1)(+1, +1)(+1, +1)(+1, +1)LT/LE(+1, 1)(+1, 1)( 1, 0)( 1, 0)Geometry, Physics, Linguistics Average value of spinp1 left and p2 right in low entailment energy caseMatilde MarcolliGeometry, Physics, Linguistics Syntactic Parameters in Kanerva Networks• Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun,Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence andrecoverability of syntactic parameters in sparse distributedmemories, arXiv:1510.06342– Address two issues: relative prevalence of di↵erent syntacticparameters and “degree of recoverability” (as sign of underlyingrelations between parameters)– If corrupt information about one parameter in data of group oflanguages can recover it from the data of the other parameters?– Answer: di↵erent parameters have di↵erent degrees ofrecoverability– Used 21 parameters and 165 languages from SSWL databaseMatilde MarcolliGeometry, Physics, Linguistics Kanerva networks (sparse distributed memories)• P. Kanerva, Sparse Distributed Memory, MIT Press, 1988.• field F2 = {0, 1}, vector space FN large N2• uniform random sample of 2k hard locations with 2k << 2N• median Hamming distance between hard locations• Hamming spheres of radius slightly larger than median value(access sphere)• writing to network: storing datum X 2 FN , each hard location in2access sphere of X gets i-th coordinate (initialized at zero)incremented depending on i-th entry ot X• reading at a location: i-th entry determined by majority rule ofi-th entries of all stored data in hard locations within access sphereKanerva networks are good at reconstructing corrupted dataMatilde MarcolliGeometry, Physics, Linguistics Procedure• 165 data points (languages) stored in a Kanerva Network in F212(choice of 21 parameters)• corrupting one parameter at a time: analyze recoverability• language bit-string with a single corrupted bit used as readlocation and resulting bit string compared to original bit-string(Hamming distance)• resulting average Hamming distance used as score ofrecoverability (lowest = most easily recoverable parameter)Matilde MarcolliGeometry, Physics, Linguistics Parameters and frequencies01 Subject-Verb (0.64957267)02 Verb-Subject (0.31623933)03 Verb-Object (0.61538464)04 Object-Verb (0.32478634)05 Subject-Verb-Object (0.56837606)06 Subject-Object-Verb (0.30769232)07 Verb-Subject-Object (0.1923077)08 Verb-Object-Subject (0.15811966)09 Object-Subject-Verb (0.12393162)10 Object-Verb-Subject (0.10683761)11 Adposition-Noun-Phrase (0.58974361)12 Noun-Phrase-Adposition (0.2905983)13 Adjective-Noun (0.41025642)14 Noun-Adjective (0.52564102)15 Numeral-Noun (0.48290598)16 Noun-Numeral (0.38034189)17 Demonstrative-Noun (0.47435898)18 Noun-Demonstrative (0.38461539)19 Possessor-Noun (0.38034189)20 Noun-Possessor (0.49145299)A01 Attributive-Adjective-Agreement (0.46581197)Matilde MarcolliGeometry, Physics, Linguistics Matilde MarcolliGeometry, Physics, Linguistics Overall e↵ect related to relative prevalence of a parameterMatilde MarcolliGeometry, Physics, Linguistics More refined e↵ect after normalizing for prelavence (syntacticdependencies)Matilde MarcolliGeometry, Physics, Linguistics • Overall e↵ect relating recoverability in a Kanerva Network toprevalence of a certain parameter among languages (depends onlyon frequencies: see in random data with assigned frequencies)• Additional e↵ects (that deviate from random case) which detectpossible dependencies among syntactic parameters: increasedrecoverability beyond what e↵ect based on frequency• Possible neuroscience implications? Kanerva Networks as modelsof human memory (parameter prevalence linked to neurosciencemodels)• More refined data if divided by language families?Matilde MarcolliGeometry, Physics, Linguistics Phylogenetic Linguistics (WORK IN PROGRESS)• Constructing family trees for languages(sometimes possibly graphs with loops)• Main information about subgrouping: shared innovationa specific change with respect to other languages in the family thatonly happens in a certain subset of languages- Example: among Mayan languages: Huastecan branchcharacterized by initial w becoming voiceless before a vowel and tsbecoming t, q becoming k, ... Quichean branch by velar nasalbecoming velar fricative, ´ becoming ˇ (prepalatal a↵ricate toccpalato-alveolar)...Known result by traditional Historical Linguistics methods:Matilde MarcolliGeometry, Physics, Linguistics Mayan Language TreeMatilde MarcolliGeometry, Physics, Linguistics Computational Methods for Phylogenetic Linguistics• Peter Foster, Colin Renfrew, Phylogenetic methods and theprehistory of languages, McDonald Institute Monographs, 2006• Several computational methods for constructing phylogenetictrees available from mathematical and computational biology• Phylogeny Programshttp://evolution.genetics.washington.edu/phylip/software.html• Standardized lexical databases: Swadesh list(100 words, or 207 words)Matilde MarcolliGeometry, Physics, Linguistics • Use Swadesh lists of languages in a given family to look forcognates:- without additional etymological information (keep false positives)- with additional etymological information (remove false positives)• Two further choices about loan words:- remove loan words- keep loan words• Keeping loan words produces graphs that are not trees• Without loan words it should produce trees, but small loops stillappear due to ambiguities (di↵erent possible trees matching samedata)... more precisely: coding of lexical data ...Matilde MarcolliGeometry, Physics, Linguistics Coding of lexical data• After compiling lists of cognate words for pairs of languageswithin a given family(with/without lexical information and loan words)• Produce a binary string S(L1 , L2 ) = (s1 , . . . , sN ) for each pair oflanguages L1 , L2 , with entry 0 or 1 at the i-th word of the lexicallist of N words if cognates for that meaning exist in the twolanguages or not (important to pay attention to synonyms)• lexical Hamming distance between two languagesd(L1 , L2 ) = #{i 2 {1, . . . , N} | si = 1}counts words in the list that do not have cognates in L1 and L2Matilde MarcolliGeometry, Physics, Linguistics Distance-matrix method of phylogenetic inference• after producing a measure of “genetic distance”Hamming metric dH (La , Lb )• hierarchical data clustering: collecting objects in clustersaccording to their distance• simplest method of tree construction: neighbor joining(1) - create a (leaf) vertex for each index a(ranging over languages in given family)(2) - given distance matrix D = (Dab )distances between each pair Dab = dH (La , Lb )construct a new matrix Q-testQ = (Qab )withQab = (n2)DabnXDakk=1this matrix Q decides first pairs of vertices to joinMatilde MarcolliGeometry, Physics, LinguisticsnXk=1Dbk (3) - identify entries Qab with lowest values: join each such pair(a, b) of leaf vertices to a newly created vertex vab(4) - set distances to new vertex by11d(a, vab ) = Dab +22(n 2)d(b, vab )

Random Geometry/Homology (chaired by Laurent Decreusefond/Frédéric Chazal)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Let m be a random tessellation in R d , d ≥ 1, observed in the window W p = ρ1/d[0, 1] d , ρ > 0, and let f be a geometrical characteristic. We investigate the asymptotic behaviour of the maximum of f(C) over all cells C ∈ m with nucleus W p as ρ goes to infinity.When the normalized maximum converges, we show that its asymptotic distribution depends on the so-called extremal index. Two examples of extremal indices are provided for Poisson-Voronoi and Poisson-Delaunay tessellations.
 

Random tessellationsMain problemThe extremal index for a random tessellationNicolas ChenavierUniversité Littoral Côte d’OpaleOctober 28, 2015Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemPlan1Random tessellations2Main problem3Extremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexRandom tessellationsDefinitionA (convex) random tessellation m in Rd is a partition of the Euclideanspace into random polytopes (called cells).We will only consider the particular case where m is a :Poisson-Voronoi tessellation ;Poisson-Delaunay tessellation.Nicolas ChenavierThe extremal index for a random tessellation Random tessellationsMain problemExtremal indexPoisson-Voronoi tessellationX, Poisson point process in Rd ;∀x ∈ X, CX (x ) := {y ∈ Rd , |y − x | ≤ |y − x |, x ∈ X} (Voronoi cellwith nucleus x ) ;mPVT := {CX (x ), x ∈ X}, Poisson-Voronoi tessellation ;∀CX (x ) ∈ mPVT , we let z(CX (x )) := x .Mosaique de Poisson-VoronoiCX (x)xFigure: Poisson-Voronoi tessellation.Nicolas ChenavierThe extremal index for a random tessellation Random tessellationsMain problemPoisson-Delaunay tessellationX, Poisson point process in Rd ;∀x , x ∈ X, x and x define an edge if CX (x ) ∩ CX (x ) = ∅ ;mPDT , Poisson-Delaunay tessellation ;∀C ∈ mPDT , we let z(C ) as the circumcenter of C .Mosaique de Poisson-Delaunayxz(C)xFigure: Poisson-Delaunay tessellation.Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexTypical cellDefinitionLet m be a stationary random tessellation. The typical cell of m is arandom polytope C in Rd which distribution given as follows : for eachbounded translation-invariant function g : {polytopes} → R, we haveE [g(C)] :=1EN(B) C ∈m,g(C ) ,z(C )∈Bwhere :B ⊂ R is any Borel subset with finite and non-empty volume ;N(B) is the mean number of cells with nucleus in B.Nicolas ChenavierThe extremal index for a random tessellation Random tessellations1Random tessellations2Main problem3Main problemExtremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemMain problemFramework :m = mPVT , mPDT ;Wρ := [0, ρ]d , with ρ > 0 ;g : {polytopes} → R, geometrical characteristic.Aim : asymptotic behaviour, when ρ → ∞, ofMg,ρ = max g(C )?C ∈m,z(C )∈WρFigure: Voronoi cell maximizing the area in the square.Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemObjective and applicationsObjective : find ag,ρ > 0, bg,ρ ∈ R s.t. P Mg,ρ ≤ ag,ρ t + bg,ρconverges, as ρ → ∞, for each t ∈ R.Applications :regularity of the tessellation ;discrimination of point processes and tessellations ;Poisson-Voronoi approximation.Approximation de Poisson-VoronoiFigure: Poisson-Voronoi approximation.Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexAsymptotics under a local correlation conditionNotation : let vρ := ag,ρ t + bρ be a threshold such thatρd · P (g(C) > vρ ) −→ τ,ρ→∞for some τ := τ (t) ≥ 0.Local Correlation Condition (LCC)ρd· E(log ρ)d(C1 ,C2 )= ∈m2 ,1g(C1 )>vρ ,g(C2 )>vρ  −→ 0. ρ→∞z(C1 ),z(C2 )∈[0,log ρ]dTheoremUnder (LCC), we have :P (Mg,ρ ≤ vρ ) −→ e −τ .ρ→∞Nicolas ChenavierThe extremal index for a random tessellation Random tessellations1Random tessellations2Main problem3Main problemExtremal indexNicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexDefinition of the extremal indexProposition(τ )Assume that for all τ ≥ 0, there exists a threshold vρ depending on ρ(τ )such that ρd · P(g(C) > vρ ) −→ τ . Then there exists θ ∈ [0, 1] suchρ→∞that, for all τ ≥ 0,(τlim P(Mg,ρ ≤ vρ ) ) = e −θτ ,ρ→∞provided that the limit exists.DefinitionAccording to Leadbetter, we say that θ ∈ [0, 1] is the extremal index if,for each τ ≥ 0, we have :(τρd · P g(C) > vρ )(τ−→ τ and lim P(Mg,ρ ≤ vρ ) ) = e −θτ .ρ→∞Nicolas Chenavierρ→∞The extremal index for a random tessellation Random tessellationsMain problemExample 1Framework :m := mPVT : Poisson-Voronoi tessellation ;g(C ) := r (C ) : inradius of any cell C := CX (x ) with x ∈ X, i.e.r (C ) := r (CX (x )) := max{r ∈ R+ : B(x , r ) ⊂ CX (x )}.rmin,PVT (ρ) := minx ∈X∩Wρ r (CX (x )).Extremal index : θ = 1/2 for each d ≥ 1.Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexMinimum of inradius for a Poisson-Voronoi tessellation−1.0−0.5y0.00.51.0(b) Typical Poisson−Voronoï cell with a small inradii−1.0−0.50.00.51.0xNicolas ChenavierThe extremal index for a random tessellation Random tessellationsMain problemExample 2Framework :m := mPDT : Poisson-Delaunay tessellation ;g(C ) := R(C ) : circumradius of any cell C , i.e.R(C ) := min{r ∈ R+ : B(x , r ) ⊃ C }.Rmax,PDT (ρ) := maxC ∈mPDT :z(C )∈Wρ R(C ).Extremal index : θ = 1; 1/2; 35/128 for d = 1; 2; 3.Nicolas ChenavierThe extremal index for a random tessellationExtremal index Random tessellationsMain problemExtremal indexMaximum of circumradius for a Poisson-Delaunaytessellation−15−10−5y051015(d) Typical Poisson−Delaunay cell with a large circumradii−15−10−5051015xNicolas ChenavierThe extremal index for a random tessellation Random tessellationsMain problemExtremal indexWork in progressJoint work with C. Robert (ISFA, Lyon 1) :new characterization of the extremal index (not based onclassical block and run estimators appearing in the classical ExtremeValue Theory) ;simulation and estimation for the extremal index and clustersize distribution (for Poisson-Voronoi and Poisson-Delaunaytessellations).Nicolas ChenavierThe extremal index for a random tessellation

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A model of two-type (or two-color) interacting random balls is introduced. Each colored random set is a union of random balls and the interaction relies on the volume of the intersection between the two random sets. This model is motivated by the detection and quantification of co-localization between two proteins. Simulation and inference are discussed. Since all individual balls cannot been identified, e.g. a ball may contain another one, standard methods of inference as likelihood or pseudolikelihood are not available and we apply the Takacs-Fiksel method with a specific choice of test functions.
 

A testing procedureA model for co-localizationEstimationA two-color interacting random ballsmodel for co-localization analysis ofproteins.Frédéric Lavancier,Laboratoire de Mathématiques Jean Leray, NantesINRIA Rennes, Serpico teamJoint work with C. Kervrann (INRIA Rennes, Serpico team).GSI’15, 28-30 October 2015. A testing procedureA model for co-localizationEstimationIntroduction : some dataVesicular trafficking analysis and colocalization quantification byTIRF microscopy (1px = 100 nanometer) [SERPICO team, INRIA]?=⇒Langerin proteins (left) and Rab11 GTPase proteins (right).Is there colocalization ?⇔Is there some spatial dependencies between the two types of proteins ? A testing procedureA model for co-localizationImage pre-processingAfter segmentationSuperposition :?⇒After a Gaussian weights thresholding?⇒Superposition :Estimation A testing procedureA model for co-localizationThe problem of co-localization can be described as follows :We observe two binary images in a domain Ω :First image (green) : realization of a random set Γ1 ∩ ΩSecond image (red) : realization of a random set Γ2 ∩ Ω−→ Is there some dependencies between Γ1 and Γ2 ?−→ If so, can we quantify/model this dependency ?Estimation A testing procedureA model for co-localization1A testing procedure2A model for co-localization3Estimation problemEstimation A testing procedureA model for co-localization1A testing procedure2A model for co-localization3Estimation problemEstimation A testing procedureA model for co-localizationTesting procedureLet a generic point o ∈ Rd andp1 = P (o ∈ Γ1 ),p2 = P (o ∈ Γ2 ),p12 = P (o ∈ Γ1 ∩ Γ2 ).If Γ1 and Γ2 are independent, then p12 = p1 p2 .Estimation A testing procedureA model for co-localizationEstimationTesting procedureLet a generic point o ∈ Rd andp1 = P (o ∈ Γ1 ),p2 = P (o ∈ Γ2 ),p12 = P (o ∈ Γ1 ∩ Γ2 ).If Γ1 and Γ2 are independent, then p12 = p1 p2 .A natural measure of departure from independency isp12 − p1 p2ˆˆ ˆwherep1 = |Ω|−1ˆ1Γ1 (x),x∈Ωp2 = |Ω|−1ˆ1Γ2 (x),x∈Ωp12 = |Ω|−1ˆ1Γ1 ∩Γ2 (x).x∈Ω A testing procedureA model for co-localizationEstimationTesting procedureAssume Γ1 and Γ2 are m-dependent stationary random sets.If Γ1 is independent of Γ2 , then as |Ω| tends to infinity,p12 − p1 p2ˆˆ ˆT := |Ω|x∈Ωy∈ΩˆˆC1 (x − y)C2 (x − y)→ N (0, 1)ˆˆwhere C1 and C2 are the empirical covariance functions of Γ1 ∩ Ω andΓ2 ∩ Ω respectively.Hence to test the null hypothesis of independence between Γ1 and Γ2p-value = 2(1 − Φ(|T |))where Φ is the c.d.f. of the standard normal distribution. A testing procedureA model for co-localizationSome simulationsSimulations when Γ1 and Γ2 are union of random ballsEstimation A testing procedureA model for co-localizationSome simulationsSimulations when Γ1 and Γ2 are union of random ballsIndependent case (and each color ∼ Poisson)Number of p−values < 0.05 over 100 realizations : 4.Estimation A testing procedureA model for co-localizationSome simulationsDependent case (see later for the model)Number of p−values < 0.05 over 100 realizations : 100.Estimation A testing procedureA model for co-localizationSome simulationsIndependent case, larger radiiNumber of p−values < 0.05 over 100 realizations : 5.Estimation A testing procedureA model for co-localizationSome simulationsDependent case, larger radii and "small" dependenceNumber of p−values < 0.05 over 100 realizations : 97.Estimation A testing procedureA model for co-localizationReal DataDepending on the pre-processing :T = 9.9p − value = 0T = 17p − value = 0Estimation A testing procedureA model for co-localization1A testing procedure2A model for co-localization3Estimation problemEstimation A testing procedureA model for co-localizationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .Estimation A testing procedureA model for co-localizationEstimationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .The reference model is a two-type (two colors) Boolean modelwith equiprobable marks, where the radii follow somedistribution µ on [Rmin , Rmax ]. A testing procedureA model for co-localizationEstimationWe view each set Γ1 and Γ2 as a union of random balls.We model the superposition of the two images, i.e. Γ1 ∪ Γ2 .The reference model is a two-type (two colors) Boolean modelwith equiprobable marks, where the radii follow somedistribution µ on [Rmin , Rmax ].Notation :(ξ, R)i : ball centered at ξ with radius R and color i ∈ {1, 2}.→ viewed as a marked point, marked by R and i.xi : collection of all marked points with color i. HenceΓi =(ξ, R)i(ξ,R)i ∈xix = x1 ∪ x2 : collection of all marked points. A testing procedureA model for co-localizationExample : three realizations of the reference processEstimation A testing procedureA model for co-localizationEstimationThe modelWe consider a density on any bounded domain Ω with respect to thereference modeln nf (x) ∝ z1 1 z2 2 eθ |Γ1 ∩ Γ2 |where n1 : number of green balls and n2 : number of red balls.This density depends on 3 parametersz1 : rules the mean number of green ballsz2 : rules the mean number of red ballsθ : interaction parameter.If θ > 0 : attraction (co-localization) between Γ1 and Γ2If θ = 0 : back to the reference model, up to the intensities(independence between Γ1 and Γ2 ). A testing procedureA model for co-localizationSimulationRealizations can be generated by a standard birth-deathMetropolis-Hastings algorithm.Examples :Estimation A testing procedureA model for co-localization1A testing procedure2A model for co-localization3Estimation problemEstimation A testing procedureA model for co-localizationEstimation problemAim : Assume that the law µ of the radii is known. Given arealization of Γ1 ∪ Γ2 on Ω, estimate z1 , z2 and θ in1z n1 z n2 eθ |Γ1 ∩ Γ2 | ,c(z1 , z2 , θ) 1 2where c(z1 , z2 , θ) is the normalizing constant.f (x) =Estimation A testing procedureA model for co-localizationEstimation problemAim : Assume that the law µ of the radii is known. Given arealization of Γ1 ∪ Γ2 on Ω, estimate z1 , z2 and θ in1z n1 z n2 eθ |Γ1 ∩ Γ2 | ,c(z1 , z2 , θ) 1 2where c(z1 , z2 , θ) is the normalizing constant.f (x) =Issue :The number of balls n1 and n2 is not observed.⇒ likelihood or pseudo-likelihood based inference is not feasible.=Estimation A testing procedureA model for co-localizationEstimationAn equilibrium equationConsider, for any non-negative function h,C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h)whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈Ωand for i = 1, 2,RmaxIi (θ; h) =h((ξ, R)i , x)Rmin∗z1 ,∗z2Ωλ((ξ, R)i , x)dξ µ(dR).2zi∗Denoting byand θ the true unknown values of the parameters,we know from the Georgii-Nguyen-Zessin equation that for any h∗ ∗E(C(z1 , z2 , θ∗ ; h)) = 0. A testing procedureA model for co-localizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the Takacs-Fiksel estimator isdefined byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .k=1(1) A testing procedureA model for co-localizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the Takacs-Fiksel estimator isdefined byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012. A testing procedureA model for co-localizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the Takacs-Fiksel estimator isdefined byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012.Recall that C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h) whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈ΩTo be able to compute (1), we must find test functions hk such thatS(h) is computable A testing procedureA model for co-localizationEstimationTakacs Fiksel estimatorGiven K test functions (hk )1≤k≤K , the Takacs-Fiksel estimator isdefined byK(ˆ1 , z2 , θ) := arg minz ˆ ˆz1 ,z2 ,θC(z1 , z2 , θ; hk )2 .(1)k=1Consistency and asymptotic normality studied in Coeurjolly et al. 2012.Recall that C(z1 , z2 , θ; h) = S(h) − z1 I1 (θ; h) − z2 I2 (θ; h) whereS(h) =h((ξ, R), x\(ξ, R))(ξ,R)∈x,ξ∈ΩTo be able to compute (1), we must find test functions hk such thatS(h) is computableHow many ? At least K = 3 because 3 parameters to estimate. A testing procedureA model for co-localizationEstimationA first possibility :h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}where S(ξ, R) is the sphere {y, ||y − ξ|| = R}.⇓⇓⇓⇓ A testing procedureWhat about S(h1 ) =A model for co-localization(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?Estimation A testing procedureA model for co-localizationWhat about S(h1 ) ==(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?Estimation A testing procedureA model for co-localizationWhat about S(h1 ) =(ξ,R)∈x,ξ∈Ω h1 ((ξ, R), x\(ξ, R)) ?=S(h1 ) = P(Γ1 )⇒(the perimeter of Γ1 )Estimation A testing procedureA model for co-localizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the Takacs-Fiksel contrast function C(z1 , z2 , θ; h1 ) is computable. A testing procedureA model for co-localizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the Takacs-Fiksel contrast function C(z1 , z2 , θ; h1 ) is computable.Similarly,Let h2 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ2 )c 1{i=2} thenS(h2 ) = P(Γ2 ). A testing procedureA model for co-localizationEstimationSo, for h1 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 )c 1{i=1}S(h1 ) = P(Γ1 )and the Takacs-Fiksel contrast function C(z1 , z2 , θ; h1 ) is computable.Similarly,Let h2 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ2 )c 1{i=2} thenS(h2 ) = P(Γ2 ).Let h3 ((ξ, R)i , x) = Length S(ξ, R) ∩ (Γ1 ∪ Γ2 )c thenS(h3 ) = P(Γ1 ∪ Γ2 ). A testing procedureA model for co-localizationEstimationSimulations with test functions h1 , h2 and h3 over 100 realizationsθ = 0.05 (and large radii)30Frequency102010050Frequency154020θ = 0.2 (and small radii)0.150.200.250.300.000.010.020.030.040.050.060.07 A testing procedureA model for co-localizationReal DataWe assume the law of the radii is uniform on [Rmin , Rmax ].(each image is embedded in [0, 250] × [0, 280])Rmin = 0.5, Rmax = 2.5ˆθ = 0.45Rmin = 0.5, Rmax = 10ˆθ = 0.03Estimation A testing procedureA model for co-localizationEstimationConclusionThe testing procedureallows to detect co-localization between two binary imagesis easy and fast to implementdoes not depend too much on the image pre-processingThe model for co-localizationrelies on geometric features (area of intersection)can be fitted by the Takacs-Fiksel methodallows to compare the degree of co-localization θ between twopairs of images if the laws of radii are similar

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The characteristic independence property of Poisson point processes gives an intuitive way to explain why a sequence of point processes becoming less and less repulsive can converge to a Poisson point process. The aim of this paper is to show this convergence for sequences built by superposing, thinning or rescaling determinantal processes. We use Papangelou intensities and Stein’s method to prove this result with a topology based on total variation distance.
 

I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-Applications2nd conference on Geometric Science of InformationAurélien VASSEURAsymptotics of some Point Processes TransformationsEcole Polytechnique, Paris-Saclay, October 28, 20151/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-Applications300020001000100020003000Mobile network in Paris - Motivation−2000020004000−2000020004000Figure: On the left, positions of all BS in Paris. On the right, locationsof BS for one frequency band.2/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsTable of ContentsI-Generalities on point processesCorrelation function, Papangelou intensity and repulsivenessDeterminantal point processesII-Kantorovich-Rubinstein distanceConvergence dened by dKRdKR (PPP, Φ) ≤ "nice" upper boundIII-Applications to transformations of point processesSuperpositionThinningRescaling3/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processFrameworkY a locally compact metric spaceµ a diuse and locally nite measure of reference on YNY the space of congurations on YNY the space of nite congurations on Y4/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processCorrelation function - Papangelou intensityCorrelation function+∞f (α)] =E[α∈NYα⊂Φk=01k!ρ of a point process Φ:ˆYkf · ρ({x1 , . . . , xk })µ(dx1 ) . . . µ(dxk )ρ(α) ≈ probability of nding a point in at least each point of αof a point process Φ:Papangelou intensity cˆE[x∈ΦE[c(x, Φ)f (x, Φ)]µ(dx)f (x, Φ \ {x})] =Yc(x, ξ) ≈ conditionnal probability of nding a point in x given ξ5/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processPoint processPropertiesIntensity measure: A ∈ FY →´A ρ({x})µ(dx)ρ({x}) = E[c(x, Φ)]If Φ is nite, then:ˆI (|Φ|P= 1) =c(x, ∅)µ(dx)I (|Φ|P= 0).Y6/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processPoisson point processPropertiesΦ PPP with intensity M(dy ) = m(y )dyCorrelation function: ρ(α) =m(x)x∈αPapangelou intensity: c(x, ξ) = m(x)7/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processRepulsive point processDenitionPoint process repulsive ifφ ⊂ ξ =⇒ c(x, ξ) ≤ c(x, φ)Point process weakly repulsive ifc(x, ξ) ≤ c(x, ∅)8/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processDeterminantal point processDenitionDeterminantal point process DPP(K , µ):ρ({x1 , · · · , xk }) = det(K (xi , xj ), 1 ≤ i, j ≤ k)PropositionPapangelou intensity of DPP(K , µ):c(x0 , {x1 , · · · , xk }) =det(J(xi , xj ), 0 ≤ i, j ≤ k)det(J(xi , xj ), 1 ≤ i, j ≤ k)where J = (I − K )−1 K .9/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsFrameworkDeterminantal point processGinibre point processDenitionGinibre point process on B(0, R):1 − 1 (|x|2 +|y |2 ) xye 2e 1{x∈B(0,R)} 1{y ∈B(0,R)}πβ -Ginibre point process on B(0, R):K (x, y ) =Kβ (x, y ) =11 − 21 (|x|2 +|y |2 ) β xye βeπ1{x∈B(0,R)} 1{y ∈B(0,R)}10/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-Applicationsβ -GinibreFrameworkDeterminantal point processpoint processes11/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsKantorovich-Rubinstein distanceTotal variation distance:supdTV (ν1 , ν2 ) :=A∈FYν1 (A),ν2 (A)<∞|ν1 (A) − ν2 (A)|F : NY → IR is 1-Lipschitz (F ∈ Lip1 ) if|F (φ1 ) − F (φ2 )| ≤ dTV (φ1 , φ2 )for allφ1 , φ2 ∈ NYKantorovich-Rubinstein distance:ˆdKR (IP1 , IP2 ) = supF ∈Lip1NYˆF (φ) IP1 (dφ) −NYF (φ) IP2 (dφ)Convergence in K.-R. distance =⇒ Convergence in lawstrictly12/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsUpper bound theoremTheorem (L. Decreusefond, AV)Φ a nite point process on YζM a PPP with nite control measure M(dy ) = m(y )µ(dy ).Then, we have:ˆ ˆdKR (IPΦ ,I ζM )P|m(y ) − c(y , φ)|IPΦ (dφ)µ(dy ).≤YNY13/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningSuperposition of weakly repulsive point processesΦn,1 , . . . , Φn,n : n independent, nite andpoint processes on Yweakly repulsivenΦn :=Φn,ii=1Rn :=´nY|ρn,i (x) − m(x)|µ(dx)i=1ζM a PPP with control measure M(dx) = m(x)µ(dx)14/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningSuperposition of weakly repulsive point processesProposition (LD, AV)nΦn =Φn,ii=1ζM a PPP with control measure M(dx) = m(x)µ(dx)ˆdKR (IPΦn , IPζM ) ≤ Rn + max1≤i≤n Yρn,i (x)µ(dx)15/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningConsequenceCorollary (LD, AV)f pdf on [0; 1] such that f (0+ ) := limx→0+ f (x) ∈ IRΛ compact subset ofI +R1 1X1 , . . . , Xn i.i.d. with pdf fn = n f ( n ·)Φn = {X1 , . . . , Xn } ∩ ΛˆdKR (Φn , ζ) ≤Λ11fx − f (0+ ) dx +nnˆfΛ1x dxnwhere ζ is the PPP(f (0+ )) reduced to Λ.16/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-Applicationsβ -GinibreApplication to superpositionApplication to β -Ginibre point processesApplication to thinningpoint processesProposition (LD, AV)Φn the βn -Ginibre process reduced to a compact set Λζ the PPP with intensity 1/π on ΛdKR (IPΦn , IPζ ) ≤ C βn17/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningKallenberg's theoremTheorem (O. Kallenberg)Φn a nite point process on Yuniformlypn : Y → [0; 1) − − − 0−−→Φn the pn -thinning of ΦnγM a Cox processlawlaw(pn Φn ) −→ M ⇐⇒ (Φn ) −→ γM−−18/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningPolish distance(fn ) a sequence in the space of real continuous functions withcompact support generating FYd ∗ (ν1 , ν2 ) =n≥11xΨ(|ν1 (fn ) − ν2 (fn )|) with Ψ(x) =n21+x∗dKR the Kantorovich-Rubinstein distance associated to thedistance d ∗19/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningThinned point processesProposition (LD, AV)Φn a nite point process on Ypn : Y → [0; 1)Φn the pn -thinning of ΦnγM a Cox processThen, we have:∗dKR (IPΦn , IPγM ) ≤ 2E[2∗pn (x)] + dKR (IPM , IPpn Φn ).x∈Φn20/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsApplication to superpositionApplication to β -Ginibre point processesApplication to thinningReferencesL.Decreusefond, and A.Vasseur, Asymptotics of superpositionof point processes, 2015.H.O. Georgii, and H.J. Yoo, Conditional intensity andgibbsianness of determinantal point processes, J. Statist.Phys. (118), January 2004.J.S. Gomez, A. Vasseur, A. Vergne, L. Decreusefond, P.Martins, and Wei Chen, A Case Study on Regularity in CellularNetwork Deployment, IEEE Wireless Communications Letters,2015.A.F. Karr, Point Processes and their Statistical Inference, Ann.Probab. 15 (1987), no. 3, 12261227.21/22Aurélien VASSEURTélécom ParisTech I-Generalities on point processesII-Kantorovich-Rubinstein distanceIII-ApplicationsThank you ...... for your attention. Questions?22/22Aurélien VASSEURTélécom ParisTech

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Random polytopes have constituted some of the central objects of stochastic geometry for more than 150 years. They are in general generated as convex hulls of a random set of points in the Euclidean space. The study of such models requires the use of ingredients coming from both convex geometry and probability theory. In the last decades, the study has been focused on their asymptotic properties and in particular expectation and variance estimates. In several joint works with Tomasz Schreiber and J. E. Yukich, we have investigated the scaling limit of several models (uniform model in the unit-ball, uniform model in a smooth convex body, Gaussian model) and have deduced from it limiting variances for several geometric characteristics including the number of k-dimensional faces and the volume. In this paper, we survey the most recent advances on these questions and we emphasize the particular cases of random polytopes in the unit-ball and Gaussian polytopes.
 

Asymptotic properties of random polytopesPierre Calka2nd conference on Geometric Science of Information´Ecole Polytechnique, Paris-Saclay, 28 October 2015 defaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsSketch of proof: Gaussian caseJoint work with Joseph Yukich (Lehigh University, USA) & TomaszSchreiber (Toru´ University, Poland)n defaultOutlineRandom polytopes: an overviewUniform polytopesGaussian polytopesExpectation asymptoticsMain results: variance asymptoticsSketch of proof: Gaussian case defaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 50 , K balln≥1K 50 , K square defaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 100 , K balln≥1K 100 , K square defaultUniform polytopesBinomial modelK := convex body of Rd(Xk ,k ∈ N∗ ):= independent and uniformly distributed in KK n := Conv(X1 , · · · , Xn ),K 500 , K balln≥1K 500 , K square defaultUniform polytopesPoissonian modelK := convex body of RdPλ ,λ > 0:=Poisson point process of intensity measure λdxKλ := Conv(Pλ ∩ K )K 500 , K ballK 500 , K square defaultGaussian polytopesBinomial modelΦd (x) :=(Xk ,1e− x(2π)d/2k ∈ N∗ ):=2 /2, x ∈ Rd ,d ≥2independent and with density ΦdK n := Conv(X1 , · · · , Xn )Poissonian modelPλ ,λ > 0:=Poisson point process of intensity measure λΦd (x)dxKλ := Conv(Pλ ) defaultGaussian polytopesK 50K 100K 500 defaultGaussian polytopes: spherical shapeK 50K 100K 500 defaultAsymptotic spherical shape of the Gaussian polytopeGeffroy (1961) :dH (K n , B(0,2 log(n))) → 0 a.s.n→∞K 50000 defaultExpectation asymptoticsConsidered functionalsfk (·) := number of k-dimensional faces, 0 ≤ k ≤ dVol(·) := volumeB. Efron’s relation (1965): Ef0(K n ) = n 1 −EVol(K n−1 )Vol(K )1Uniform polytope, K smoothλ→∞d−1κsd+1 ds λ d+1E[fk (Kλ )] ∼ cd,k∂Kκs := Gaussian curvature of ∂KUniform polytope, K polytope′E[fk (Kλ )] ∼ cd,k F (K ) logd−1 (λ)λ→∞F (K ) := number of flags of KGaussian polytope′′E[fk (Kλ )] ∼ cd,k logλ→∞d−12(λ)A. R´nyi & R. Sulanke (1963), H. Raynaud (1970), R. Schneider & J. Wieacker (1978), F. Affentranger & R. Schneider (1992)e defaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsUniform model, K smoothUniform model, K polytopeGaussian modelSketch of proof: Gaussian case defaultUniform model, K smoothK := convex body of Rd with volume 1 and with a C 3 boundaryκ := Gaussian curvature of ∂Klim λ−(d−1)/(d+1) Var[fk (Kλ )] = ck,dλ→∞′lim λ(d+3)/(d+1) Var [Vol(Kλ )] = cdλ→∞′(ck,d , cd explicit positive constants)M. Reitzner (2005): Var[fk (Kλ )] = Θ(λ(d −1)/(d +1) )κ(z)1/(d+1) dz∂Kκ(z)1/(d+1) dz∂K defaultUniform model, K polytopeK := simple polytope of Rd with volume 1i.e. each vertex of K is included in exactly d facets.lim log−(d−1) (λ)Var[fk (Kλ )] = cd,k f0 (K )λ→∞′lim λ2 log−(d−1) (λ)Var[Vol(Kλ )] = cd,k f0 (K )λ→∞′(ck,d , ck,d explicit positive constants)I. B´r´ny & M. Reitzner (2010): Var[fk (Kλ )] = Θ(log(d −1) (λ))aa defaultGaussian modellim log−d−12λ→∞lim log−k+λ→∞EVol(Kλ )Vol(B(0, 2 log(n)))d+32(λ)Var[fk (Kλ )] = ck,d′(λ)Var[Vol(Kλ )] = ck,d= 1−λ→∞d log(log(λ))+O4 log(λ)′(ck,d , ck,d explicit positive constants)D. Hug & M. Reitzner (2005), I. B´r´ny & V. Vu (2007): Var[fk (Kλ )] = Θ(log(d −1)/2 (λ))aa1log(λ) defaultOutlineRandom polytopes: an overviewMain results: variance asymptoticsSketch of proof: Gaussian caseCalculation of the expectation of fk (Kλ )Calculation of the variance of fk (Kλ )Scaling transform defaultCalculation of the expectation of fk (Kλ )1. Decomposition:E[fk (Kλ )] = E x∈Pλξ(x, Pλ ) :=1k+1 #k-faceξ(x, Pλ )containing x0if x extremeif not2. Mecke-Slivnyak formulaE[fk (Kλ )] = λE[ξ(x, Pλ ∪ {x})]Φd (x)dx3. Limit of the expectation of one score defaultCalculation of the variance of fk (Kλ)Var[fk (Kλ )]= Eξ 2 (x, Pλ ) +x∈Pλ=λx=y∈Pλ2ξ(x, Pλ )ξ(y , Pλ ) − (E[fk (Kλ )])E[ξ 2 (x, Pλ ∪ {x})]Φd (x)dx+ λ2− λ2=λE[ξ(x, Pλ ∪ {x, y })ξ(y , Pλ ∪ {x, y })]Φd (x)Φd (y )dxdyE[ξ(x, Pλ ∪ {x})]E[ξ(y , Pλ ∪ {y })]Φd (x)Φd (y )dxdyE[ξ 2 (x, Pλ ∪ {x})]Φd (x)dx+ λ2”Cov”(ξ(x, Pλ ∪ {x}), ξ(y , Pλ ∪ {y }))Φd (x)Φd (y )dxdy defaultScaling transformQuestion : Limits of E[ξ(x, Pλ )] and ”Cov”(ξ(x, Pλ ), ξ(y , Pλ )) ?Answer : definition of limit scores in a new space◮ Critical radius Rλ :=2 log λ − log(2 · (2π)d · log λ)◮ Scaling transform :λT :Rd \ {0} −→ Rd−1 × Rx−→Rλ exp−1d−1x|x| ,2Rλ (1 −|x|Rλ )expd−1 : Rd−1 ≃ Tu0 Sd−1 → Sd−1 exponential map at u0 ∈ Sd−1◮ Image of a score : ξ (λ) (T λ (x), T λ (Pλ )) := ξ(x, Pλ )D◮ Convergence of Pλ : T λ (Pλ ) → P o`uP : Poisson point process in Rd−1 × R of intensity measure e h dv dh defaultAction of the scaling transformΠ↑ := {(v , h) ∈ Rd−1 × R : h ≥v2Π↓ := {(v , h) ∈ Rd−1 × R : h ≤ −Half-spaceSphere containing OConvexityExtreme pointk-face of KλRλ Vol2}v22}Translate of Π↓Translate of ∂Π↑Parabolic convexity(x + Π↑ ) not fully coveredParabolic k-faceVol defaultLimiting pictureΨ :=x∈P (x+ Π↑ )In red : image of the balls of diameter [0, x] where x is extreme defaultLimiting pictureΦ :=x∈Rd−1 ×R:x+Π↓ ∩P=∅ (x+ Π↓ )In green : image of the boundary of the convex hull Kλ defaultThank you for your attention!

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Asymmetric information distances are used to define asymmetric norms and quasimetrics on the statistical manifold and its dual space of random variables. Quasimetric topology, generated by the Kullback-Leibler (KL) divergence, is considered as the main example, and some of its topological properties are investigated.
 

Asymmetric Topologies on Statistical ManifoldsRoman V. BelavkinSchool of Science and TechnologyMiddlesex University, London NW4 4BT, UKGSI2015, October 28, 2015Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20151 / 16 Sources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20152 / 16 Sources and Consequences of AsymmetrySources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20153 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}qRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]Roman Belavkin (Middlesex University)Asymmetric TopologiesqOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)Roman Belavkin (Middlesex University)Asymmetric TopologiesqOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1}Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p − q| = inf{α−1 > 0 : D[q + α(p − q), q] ≤ 1}sup{Ep−q {x} : Eq {ex − 1 − x} ≤ 1}xRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryAsymmetric Information DistancesKullback-Leibler divergenceD[p, q] = Eq {ln(p/q)}D[p1 ⊗p2 , q1 ⊗q2 ] = D[p1 , q1 ]+D[p2 , q2 ]ln : (R+ , ×) → (R, +)qAsymmetry of the KL-divergenceD[p, q] = D[q, p]D[q + (p − q), q] = D[q − (p − q), q]p−q= inf{α−1 > 0 : D[q + α|(p − q)|, q] ≤ 1}sup{Ep−q {x} : Eq {e|x| − 1 − |x|} ≤ 1}xRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20154 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.An asymmetric seminormed space may fail to be a topological vectorspace, because y → αy can be discontinuous (Borodin, 2001).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryFunctional Analysis in Asymmetric SpacesTheorem (e.g. Theorem 1.5 in Fletcher and Lindgren (1982))Every topological space with a countable base is quasi-pseudometrizable.An asymmetric seminormed space can be T0 , but not T1 (and hencenot Hausdorff T2 ).Dual quasimetrics ρ(x, y) and ρ−1 (x, y) = ρ(y, x) induce twodifferent topologies.There are 7 notions of Cauchy sequences: left (right) Cauchy, left(right) K-Cauchy, weakly left (right) K-Cauchy, Cauchy.This gives 14 notions of completeness (with respect to ρ or ρ−1 ).Compactness is related to outer precompactness or precompactness,which are strictly weaker properties than total boundedness.An asymmetric seminormed space may fail to be a topological vectorspace, because y → αy can be discontinuous (Borodin, 2001).Practically all other results have to be reconsidered (e.g. Bairecategory theorem, Alaoglu-Bourbaki, etc). (Cobzas, 2013).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20155 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesMOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesMOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:µM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M = {u : D[(1 + u)z, z] ≤ 1}Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M = {u : D[(1 + u)z, z] ≤ 1}D = (1 + u) ln(1 + u) − u, zRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M◦M = {u : D[(1 + u)z, z] ≤ 1}{y : D∗ [x, 0] ≤ 1}D = (1 + u) ln(1 + u) − u, zRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryRandom Variables as the Source of AsymmetryM ◦ := {x : x, y ≤ 1, ∀ y ∈ M }MMinkowski functional:Support functionµM ◦ (x) = inf{α > 0 : x/α ∈ M ◦ }= sM (x) = sup{ x, y : y ∈ M }M◦M = {u : D[(1 + u)z, z] ≤ 1}{y : D∗ [x, 0] ≤ 1}D∗ [x, 0]=ex− 1 − x, zRoman Belavkin (Middlesex University)D = (1 + u) ln(1 + u) − u, zAsymmetric TopologiesOctober 28, 20156 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )Roman Belavkin (Middlesex University)→∞Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−bRoman Belavkin (Middlesex University)22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.Eq {x} =∞n nn=1 (2 /2 )→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.∞n nn=1 (2 /2 )Eq {x} =→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.12a−b22∈ dom Eq⊗p {ex }, − 1 a − b/2Roman Belavkin (Middlesex University)Asymmetric Topologies22∈ dom Eq⊗p {ex }October 28, 20157 / 16 Sources and Consequences of AsymmetryExamplesExample (St. Peterbourgh lottery)x = 2n , q = 2−n , n ∈ N.∞n nn=1 (2 /2 )Eq {x} =→∞Ep {x} < ∞ for all biased p = 2−(1+α)n , α > 0.2n ∈ dom Eq {ex }, −2n ∈ dom Eq {ex }/0 ∈ Int(dom Eq {ex })/Example (Error minimization)Minimize x =12a−b22subject to DKL [w, q ⊗ p] ≤ λ, a, b ∈ Rn .Ew {x} < ∞ minimized at w ∝ e−βx q ⊗ p.Maximization of x has no solution.12a−b22∈ dom Eq⊗p {ex }, − 1 a − b/222∈ dom Eq⊗p {ex }0 ∈ Int(dom Eq⊗p {ex })/Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20157 / 16 Method: Symmetric SandwichSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20158 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sARoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sAµM ◦ ≤ µ(−M ◦ ) ∨ µM ◦Roman Belavkin (Middlesex University)µ(−M )co ∧ µM ≤ µMAsymmetric TopologiesOctober 28, 20159 / 16 Method: Symmetric SandwichMethod: Symmetric Sandwichs[−A ∩ A] ≤ sA ≤ s[−A ∪ A]µco [−A◦ ∪ A◦ ] ≤ µA◦ ≤ µ[−A◦ ∩ A◦ ]s[−A ∩ A] = s(−A)co ∧ sA = inf{sA(z) + sA(z − y) : z ∈ Y }s[−A ∪ A] = s(−A) ∨ sAµ(−M ◦ )co ∧ µM ◦ ≤ µM ◦Roman Belavkin (Middlesex University)Asymmetric TopologiesµM ≤ µ(−M ) ∨ µMOctober 28, 20159 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−1012ϕ∗ (x) = ex − 1 − xRoman Belavkin (Middlesex University)−2−1012ϕ(u) = (1 + u) ln(1 + u) − uAsymmetric TopologiesOctober 28, 201510 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+∗012ϕ(u) = (1 + u) ln(1 + u) − u= ϕ (|x|) ∈ ∆2/Roman Belavkin (Middlesex University)−1ϕ+ (u) = ϕ(|u|) ∈ ∆2Asymmetric TopologiesOctober 28, 201510 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−−1012ϕ(u) = (1 + u) ln(1 + u) − u∗ϕ+ (u) = ϕ(|u|) ∈ ∆2∗ϕ− (u) = ϕ(−|u|) ∈ ∆2/= ϕ (|x|) ∈ ∆2/= ϕ (−|x|) ∈ ∆2Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x|∗ϕ= ϕ (|x|) ∈ ∆2/12ϕ+ (u) = ϕ(|u|) ∈ ∆2∗= ϕ (−|x|) ∈ ∆2ϕ− (u) = ϕ(−|u|) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}Roman Belavkin (Middlesex University)0ϕ(u) = (1 + u) ln(1 + u) − u∗∗−1u|ϕ = µ{u : ϕ(u), z ≤ 1}Asymmetric TopologiesOctober 28, 201510 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x|∗ϕ012ϕ(u) = (1 + u) ln(1 + u) − u∗= ϕ (|x|) ∈ ∆2/ϕ+ (u) = ϕ(|u|) ∈ ∆2∗= ϕ (−|x|) ∈ ∆2∗−1ϕ− (u) = ϕ(−|u|) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}u|ϕ = µ{u : ϕ(u), z ≤ 1}Proposition·∗ ,ϕ+·∗ϕ−are Luxemburg norms and x∗ϕ−≤ x|∗ ≤ xϕ∗ϕ+·ϕ+ ,·ϕ−are Luxemburg norms and uϕ+≤ u|ϕ ≤ uϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16 Method: Symmetric SandwichLower and upper Luxemburg (Orlicz) norms−2−101−22ϕ∗ (x) = ex − 1 − xϕ∗ (x)+ϕ∗ (x)−x|∗ϕ012ϕ(u) = (1 + u) ln(1 + u) − u∗= ϕ (|x|) ∈ ∆2/ϕ+ (u) = ϕ(|u|) ∈ ∆2∗= ϕ (−|x|) ∈ ∆2∗−1ϕ− (u) = ϕ(−|u|) ∈ ∆2/= µ{x : ϕ (x), z ≤ 1}u|ϕ = µ{u : ϕ(u), z ≤ 1}Proposition·∗ ,ϕ+·∗ϕ−are Luxemburg norms and x∗ϕ−≤ x|∗ ≤ xϕ∗ϕ+·ϕ+ ,·ϕ−are Luxemburg norms and uϕ+≤ u|ϕ ≤ uϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201510 / 16 ResultsSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201511 / 16 ResultsKL Induces Hausdorff (T2 ) Asymmetric TopologyTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) is Hausdorff.ϕRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201512 / 16 ResultsKL Induces Hausdorff (T2 ) Asymmetric TopologyTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) is Hausdorff.ϕProof.u ϕ+ ≤ u|ϕ (resp. x ϕ− ≤ x|ϕ ) implies (Y, · |ϕ ) (resp. (X, · |∗ )) isϕfiner than normed space (Y, · ϕ+ ) (resp. (X, · ∗ )).ϕ−Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201512 / 16 ResultsSeparable SubspacesTheorem(Y, · ϕ+ ) (resp. (X, ·(resp. (X, · |∗ )).ϕRoman Belavkin (Middlesex University)∗ ))ϕ−is a separable Orlicz subspace of (Y, · |ϕ )Asymmetric TopologiesOctober 28, 201513 / 16 ResultsSeparable SubspacesTheorem(Y, · ϕ+ ) (resp. (X, ·(resp. (X, · |∗ )).ϕ∗ ))ϕ−is a separable Orlicz subspace of (Y, · |ϕ )Proof.ϕ+ (u) = (1 + |u|) ln(1 + |u|) − |u| ∈ ∆2 (resp.ϕ∗ (x) = e−|x| − 1 + |x| ∈ ∆2 ). Note that ϕ− ∈ ∆2 and ϕ∗ ∈ ∆2 ./−+ /Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201513 / 16 ResultsCompletenessTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) isϕ1ρsBi-Complete: ρs -Cauchy yn → y.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201514 / 16 ResultsCompletenessTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) isϕρs1Bi-Complete: ρs -Cauchy yn → y.2ρ-sequentially complete: ρs -Cauchy yn → y.Roman Belavkin (Middlesex University)ρAsymmetric TopologiesOctober 28, 201514 / 16 ResultsCompletenessTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) isϕρs1Bi-Complete: ρs -Cauchy yn → y.2ρ-sequentially complete: ρs -Cauchy yn → y.3Right K-sequentially complete: right K-Cauchy yn → y.ρRoman Belavkin (Middlesex University)ρAsymmetric TopologiesOctober 28, 201514 / 16 ResultsCompletenessTheorem(Y, · |ϕ ) (resp. (X, · |∗ )) isϕρs1Bi-Complete: ρs -Cauchy yn → y.2ρ-sequentially complete: ρs -Cauchy yn → y.3Right K-sequentially complete: right K-Cauchy yn → y.ρρProof.ρs (y, z) = z − y|ϕ ∨ y − z|ϕ ≤ y − z ϕ− , where (Y, · ϕ− ) is Banach.Then use theorems of Reilly et al. (1982) and Chen et al. (2007).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201514 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Hausdorff.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Hausdorff.Bi-complete, ρ-sequentially complete and right K-sequentially complete.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Hausdorff.Bi-complete, ρ-sequentially complete and right K-sequentially complete.Contain a separable Orlicz subspace.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Hausdorff.Bi-complete, ρ-sequentially complete and right K-sequentially complete.Contain a separable Orlicz subspace.Total boundedness, compactness?Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ResultsSummary and Further QuestionsTopologies induced by asymmetric information divergences may nothave the same properties as their symmetrized counterparts (e.g.Banach spaces), and therefore many properties have to bere-examined.We have proved that topologies induced by the KL-divergence are:Hausdorff.Bi-complete, ρ-sequentially complete and right K-sequentially complete.Contain a separable Orlicz subspace.Total boundedness, compactness?Other asymmetric information distances (e.g. Renyi divergence).Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201515 / 16 ReferencesSources and Consequences of AsymmetryMethod: Symmetric SandwichResultsRoman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201516 / 16 ResultsBorodin, P. A. (2001). The Banach-Mazur theorem for spaces withasymmetric norm. Mathematical Notes, 69(3–4), 298–305.Chen, S.-A., Li, W., Zou, D., & Chen, S.-B. (2007, Aug). Fixed pointtheorems in quasi-metric spaces. In Machine learning andcybernetics, 2007 international conference on (Vol. 5, p. 2499-2504).IEEE.Cobzas, S. (2013). Functional analysis in asymmetric normed spaces.Birkh¨user.aFletcher, P., & Lindgren, W. F. (1982). Quasi-uniform spaces (Vol. 77).New York: Marcel Dekker.Reilly, I. L., Subrahmanyam, P. V., & Vamanamurthy, M. K. (1982).Cauchy sequences in quasi-pseudo-metric spaces. Monatshefte f¨ruMathematik, 93, 127–140.Roman Belavkin (Middlesex University)Asymmetric TopologiesOctober 28, 201516 / 16

Computational Information Geometry (chaired by Frank Nielsen, Paul Marriott)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce a new approach to goodness-of-fit testing in the high dimensional, sparse extended multinomial context. The paper takes a computational information geometric approach, extending classical higher order asymptotic theory. We show why the Wald – equivalently, the Pearson X2 and score statistics – are unworkable in this context, but that the deviance has a simple, accurate and tractable sampling distribution even for moderate sample sizes. Issues of uniformity of asymptotic approximations across model space are discussed. A variety of important applications and extensions are noted.
 

IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryGeometry of Goodness-of-Fit Testing in High Dimensional LowSample Size ModellingR. Sabolová1 , P. Marriott2 , G. Van Bever1 & F. Critchley1 .1The Open University (EPSRC grant EP/L010429/1), United Kingdom2 University of Waterloo, CanadaGSI 2015, October 28th 2015Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryKey pointsIn CIG, the multinomial model∆k =(π0 , . . . , πk ) : πi ≥ 0,πi = 1iprovides a universal model.12goodness-of-fit testing in large sparse extended multinomial contextsCressie-Read power divergence λ-family - equivalent to Amari’s α-familyasymptotic properties of two test statistics: Pearson’s χ2 -test and deviancesimulation study for other statistics within power divergence family3k-asymptotics instead of N -asymptoticsRadka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryOutline1Introduction2Pearson’s χ2 versus the deviance3Other test statistics from power divergence family4SummaryRadka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryBig dataStatistical Theory and Methods for Complex, High-Dimensional Data programme,Isaac Newton Institute (2008):. . . the practical environment has changed dramatically over the last twentyyears, with the spectacular evolution of computing facilities and theemergence of applications in which the number of experimental units isrelatively small but the underlying dimension is massive. . . . Areas ofapplication include image analysis, microarray analysis, finance, documentclassification, astronomy and atmospheric science.continuous data - High dimensional low sample size data (HDLSS)discrete datadatabasesimage analysisSparsity (N << k) changes everything!Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryImage analysis - exampleFigure: m1 = 10, m2 = 10Dimension of a state space: k = 2m1 m2 − 1Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummarySparsity changes everythingS. Fienberg, A. Rinaldo (2012): Maximum Likelihood Estimation in Log-LinearModelsDespite the widespread usage of these [log-linear] models, the applicabilityand statistical properties of log-linear models under sparse settings are stillvery poorly understood. As a result, even though high-dimensionalsparse contingency tables constitute a type of data that is common inpractice, their analysis remains exceptionally difficult.Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryOutline1Introduction2Pearson’s χ2 versus the deviance3Other test statistics from power divergence family4SummaryRadka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryExtended multinomial distributionLetn = (ni ) ∼ Mult(N, (πi )), i = 0, 1, . . . , k,where each πi ≥0.Goodness-of-fit testH0 : π = π ∗ .Pearson’s χ2 test (Wald, score statistic)kW :=i=0∗(πi − ni /N )21≡ 2∗πiNki=0n2i− 1.∗πiRule of thumb (for accuracy of χ2 asymptotic approximation)kN πi ≥ 5Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryPerformance of Pearson’s χ2 test on the boundary - example(b) Sample of Wald Statistic20004000Wald Statistic0.030.020.0100.00Cell probability60000.0480000.05(a) Null distribution050100150200Rank of cell probability0200400600800IndexFigure: N = 50, k = 200, exponentially decreasing πiRadka SabolováGeometry of GOF Testing in HDLSS Modelling1000 IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryPerformance of Pearson’s χ2 test on the boundary - theoryTheoremFor k > 1 and N ≥ 6, the first three moments of W are:E(W )=k,Nπ (−1) − (k + 1)2 + 2k(N − 1)var(W ) =N3and E[{W − E(W )}3 ] given byπ (−2) − (k + 1)3 − (3k + 25 − 22N ) π (−1) − (k + 1)2 + g(k, N )N5where g(k, N ) = 4(N − 1)k(k + 2N − 5) > 0 and π (a) :=In particular, for fixed k and N , as πmin → 0iaπi .var(W ) → ∞ and γ(W ) → +∞where γ(W ) := E[{W − E(W )}3 ]/{var(W )}3/2 .Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryThe deviance statisticDefine the deviance D viaD/2==={0≤i≤k:ni >0}{0≤i≤k:ni >0}{0≤i≤k:ni >0}{ni log(ni /N ) − log(πi )}ni log(ni /N ) + log1πini log(ni /µi ),where µi := E(ni ) = N πi .Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryDistribution of deviancelet {n∗ , i = 0, . . . , k} be mutually independent, with n∗ ∼ P o(µi )iithen N ∗ := k n∗ ∼ P o(N ) and ni = (n∗ |N ∗ = N ) ∼ M ult(N, πi )ii=0 idefinekn∗N∗iS ∗ :==n∗ log(n∗ /µi )D∗ /2iii=0Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryDistribution of deviancelet {n∗ , i = 0, . . . , k} be mutually independent, with n∗ ∼ P o(µi )iithen N ∗ := k n∗ ∼ P o(N ) and ni = (n∗ |N ∗ = N ) ∼ M ult(N, πi )ii=0 idefinekn∗N∗iS ∗ :==n∗ log(n∗ /µi )D∗ /2iii=0define ν, τ and ρ viaNνN·:= E(S ∗ ) =√ρτ Nτ2ki=0NE(n∗ log {n∗ /µi })ii:= cov(S ∗ ) =where Ci := Cov(n∗ , n∗ log(n∗ /µi )) and Vi := Viii,ki=0 Ci,ki=0 Viar(n∗ log(n∗ /µi )).iiN·Then under equicontinuityDD/2 − − → N1 (ν, τ 2 (1 − ρ2 )).−−k→∞Radka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryUniformity near the boundary(b) Sample of Wald Statistic1109080Deviance70601500Wald Statistic25001000.050.040.030.025000.005000.01Cell probability(c) Sample of Deviance Statistic3500(a) Null distribution0501001502000200Rank of cell probability400600Index800100002004006008001000IndexFigure: Stability of sampling distributions - Pearson’s χ2 and deviance statistic, N = 50,k = 200, exponentially decreasing πiRadka SabolováGeometry of GOF Testing in HDLSS Modelling IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryAsymptotic approximationsnormal approximation can be improvedχ2 approximation, correction for skewnesssymmetrised deviance statisticsChi−squared ApproximationSymmetrised Deviance6080Deviance quantiles10012090806050504070Normal quantiles90807060Chi−squared quantiles80706050Normal quantiles90100Normal Approximation6080100Deviance quantiles120406080100Symmetric Deviance quantilesFigure: Quality of k-asymptotics approximations near the boundaryRadka SabolováGeometry of GOF Testing in HDLSS Modelling120 IntroductionPearson’s χ2 versus the devianceOther test statistics from power divergence familySummaryUniformity and higher momentsdoes k-asymptotic approximation hold uniformly across the simplex?rewrite deviance asD∗ /2={0≤i≤k:n∗ >0}in∗ log(n∗ /µi ) = Γ∗ + ∆∗iiwherekΓ∗ :=αi n∗ and ∆∗ :

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Local mixture models give an inferentially tractable but still flexible alternative to general mixture models. Their parameter space naturally includes boundaries; near these the behaviour of the likelihood is not standard. This paper shows how convex and differential geometries help in characterising these boundaries. In particular the geometry of polytopes, ruled and developable surfaces is exploited to develop efficient inferential algorithms.
 

Computing Boundaries in Local Mixture ModelsComputing Boundaries in Local MixtureModelsVahed Maroufy&Paul MarriottDepartment of Statistics and Actuarial ScienceUniversity of WaterlooOctober 28GSI 2015, Paris Computing Boundaries in Local Mixture ModelsOutlineOutline1Influence of boundaries on parameter inference2Local mixture models (LMM)3Parameter space and boundariesHard boundaries and Soft boundaries4Computing the boundaries for LMMs5Summary and future direction Computing Boundaries in Local Mixture ModelsBoundary influenceWhen boundary exits:MLE does not exist =⇒ find the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,log-linear and graphical modelsGeyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture ModelsBoundary influenceWhen boundary exits:MLE does not exist =⇒ find the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,log-linear and graphical modelsGeyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture ModelsBoundary influenceWhen boundary exits:MLE does not exist =⇒ find the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,log-linear and graphical modelsGeyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture ModelsBoundary influenceWhen boundary exits:MLE does not exist =⇒ find the Extended MLEMLE exists, but does not satisfy the regular propertiesExamplesBinomial distribution, logistic regression, contingency table,log-linear and graphical modelsGeyer (2009), Rinaldo et al. (2009), Anaya-Izquierdo et al. (2013)Computing boundary is a hard problem, Fukuda (2004)Many mathematical results in the literaturepolytope approximation, Boroczky and Fodor (2008), Barvinok (2013)smooth surface approximation, Batyrev (1992), Ghomi (2001, 2004) Computing Boundaries in Local Mixture ModelsLMMsLocal Mixture ModelsDefinitionMarriott (2002)g (x; µ, λ) = f (x; µ) +Propertieskj=2λj f (j) (x; µ),λ ∈ Λµ ⊂ R k−1Anaya-Izquierdo and Marriott (2007)g is identifiable in all parameters and the parametrization (µ, λ) isorthogonal at λ = 0The log likelihood function of g is a concave function of λ at afixed µ0Λµ is convexApproximate continuous mixture models when mixing is “small”f (x, µ) dQ(µ)MFamily of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture ModelsLMMsLocal Mixture ModelsDefinitionMarriott (2002)g (x; µ, λ) = f (x; µ) +Propertieskj=2λj f (j) (x; µ),λ ∈ Λµ ⊂ R k−1Anaya-Izquierdo and Marriott (2007)g is identifiable in all parameters and the parametrization (µ, λ) isorthogonal at λ = 0The log likelihood function of g is a concave function of λ at afixed µ0Λµ is convexApproximate continuous mixture models when mixing is “small”f (x, µ) dQ(µ)MFamily of LMMs is richer that Family of mixtures Computing Boundaries in Local Mixture ModelsExample and MotivationExampleLMM of Normalf (x; µ) = φ(x; µ, σ 2 ), (σ 2 is known).g (x; µ, λ) = φ(x; µ, σ 2 ) 1 +kj=2λj pj (x) ,λ ∈ Λµpj (x) polynomial of degree j .Why we care about λ and Λµ ?They are interpretable (2) µg = σ 2 + 2λ2(3)µ = 6λ3 g(4) (4)µg = µφ + 12σ 2 λ2 + 24λ4λ represents the mixing distribution Q via its moments inf (x, µ) dQ(µ)M(1) Computing Boundaries in Local Mixture ModelsExample and MotivationExampleLMM of Normalf (x; µ) = φ(x; µ, σ 2 ), (σ 2 is known).g (x; µ, λ) = φ(x; µ, σ 2 ) 1 +kj=2λj pj (x) ,λ ∈ Λµpj (x) polynomial of degree j .Why we care about λ and Λµ ?They are interpretable (2) µg = σ 2 + 2λ2(3)µ = 6λ3 g(4) (4)µg = µφ + 12σ 2 λ2 + 24λ4λ represents the mixing distribution Q via its moments inf (x, µ) dQ(µ)M(1) Computing Boundaries in Local Mixture ModelsExample and MotivationThe costs for all these good properties and flexibility areHard boundary =⇒ Positivity (boundary of Λµ )Soft boundary =⇒ Mixture behaviorWe compute them for two models here:PoissonandNormalWe fix k = 4 Computing Boundaries in Local Mixture ModelsBoundariesHard boundaryΛµ = λ | 1 +kj=2λj qj (x; µ) ≥ 0, ∀x ∈ S ,Λµ is intersection of half-spaces so convexHard boundary is constructed by a set of (hyper-)planesSoft boundaryDefinitionFor a density function f (x; µ) with k finite moments let,Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).and for compact M defineC = convhull{Mr (f )|µ ∈ M}Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture ModelsBoundariesHard boundaryΛµ = λ | 1 +kj=2λj qj (x; µ) ≥ 0, ∀x ∈ S ,Λµ is intersection of half-spaces so convexHard boundary is constructed by a set of (hyper-)planesSoft boundaryDefinitionFor a density function f (x; µ) with k finite moments let,Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).and for compact M defineC = convhull{Mr (f )|µ ∈ M}Then, the boundary of C is called the soft boundary. Computing Boundaries in Local Mixture ModelsComputing hard boundaryPoisson modelΛµ = λ | A2 (x) λ2 + A3 (x)λ3 + A4 (x) λ4 + 1 ≥ 0, ∀x ∈ Z+ ,Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3.TheoremFor a LMM of a Poisson distribution, for each µ, the space Λµ can bearbitrarily well approximated, as measured by volume for example, by afinite polytope. Computing Boundaries in Local Mixture ModelsComputing hard boundaryPoisson modelΛµ = λ | A2 (x) λ2 + A3 (x)λ3 + A4 (x) λ4 + 1 ≥ 0, ∀x ∈ Z+ ,Figure : Left: slice through λ2 = −0.1; Right: slice through λ3 = 0.3.TheoremFor a LMM of a Poisson distribution, for each µ, the space Λµ can bearbitrarily well approximated, as measured by volume for example, by afinite polytope. Computing Boundaries in Local Mixture ModelsComputing hard boundaryNormal modellet y = x−µσ2Λµ = {λ | (y 2 − 1)λ2 + (y 3 − 3y )λ3 + (y 4 − 6y 2 + 3)λ4 + 1 ≥ 0, ∀y ∈ R}.We need a more geometric tools to compute this boundary. Computing Boundaries in Local Mixture ModelsRuled and developable surfacesRuled and developable surfacesDefinitionRuled surface:Γ(x, γ) = α(x) + γ · β(x),x ∈ I ⊂ R, γ ∈ R kDevelopable surface: β(x), α (x) and β (x) are coplanar for all x ∈ I . Computing Boundaries in Local Mixture ModelsRuled and developable surfacesDefinitionThe family of planes,A = {λ ∈ R3 | a(x) · λ + d(x) = 0, x ∈ R},each determined by an x ∈ R, is called a one-parameter infinite family ofplanes. Each element of the set{λ ∈ R3 |a(x) · λ + d(x) = 0, a (x) · λ + d (x) = 0, x ∈ R}is called a characteristic line of the surface at x and the union is calledthe envelope of the family.A characteristic line is the intersection of two consecutiveplanesThe envelope is a developable surface Computing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMHard boundary of for Normal LMM(y 2 − 1)λ2 + (y 3 − 3y )λ3 + (y 4 − 6y 2 + 3)λ4 + 1 = 0, ∀y ∈ R .λ2λ2λ3λ4λ4λ3Figure : Left: The hard boundary for the normal LMM (shaded) as a subset ofa self intersecting ruled surface (unshaded); Right: slice through λ4 = 0.2. Computing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMSoft boundary of for Normal LMMrecap : Mk (f ) := (Ef (X ), Ef (X 2 ), · · · , Ef (X k )).For visualization purposes let k = 3, (µ ∈ M, fix σ)M3 (f )=(µ, µ2 + σ 2 , µ3 + 3µσ 2 ),M3 (g )=(µ, µ2 + σ 2 + 2λ2 , µ3 + 3µσ 2 + 6µλ2 + 6λ3 ).Figure : the 3-D curve ϕ(µ); Middle: the bounding ruled surface γa (µ, u); Right: theconvex subspace restricted to soft boundary. Computing Boundaries in Local Mixture ModelsRuled and developable surfacesBoundaries for Normal LMMRuled surface parametrizationTwo boundary surfaces, each constructed by a curve and a set of lines attachedto it.γa (µ, u) = ϕ(µ) + u La (µ)γb (µ, u) = ϕ(µ) + u Lb (µ)wherefor M = [a, b] and ϕ(µ) = M3 (f )La (µ): lines between ϕ(a) and ϕ(µ)Lb (µ): lines between ϕ(µ) and ϕ(b) Computing Boundaries in Local Mixture ModelsSummarySummaryUnderstanding these boundaries is important if we want to exploitthe nice statistical properties of LMMThe boundaries described in this paper have both discrete aspectsand smooth aspectsThe two example discussed represent the structure for almost allexponential family modelsIt is a interesting problem to design optimization algorithms onthese boundaries for finding boundary maximizers of likelihood Computing Boundaries in Local Mixture ModelsReferencesAnaya-Izquierdo, K., Critchley, F., and Marriott, P. (2013). when are first order asymptotics adequate? adiagnostic. Stat, 3(1):17–22.Anaya-Izquierdo, K. and Marriott, P. (2007). Local mixture models of exponential families. Bernoulli, 13:623–640.Barvinok, A. (2013). Thrifty approximations of convex bodies by polytopes. International Mathematics ResearchNotices, rnt078.Batyrev, V. V. (1992). Toric varieties and smooth convex approximations of a polytope. RIMS Kokyuroku, 776:20.Boroczky, K. and Fodor, F. (2008). Approximating 3-dimensional convex bodies by polytopes with a restrictednumber of edges. Contributions to Algebra and Geometry, 49(1):177–193.Fukuda, K. (2004). From the zonotope construction to the minkowski addition of convex polytopes. Journal ofSymbolic Computation, 38(4):1261–1272.Geyer, C. J. (2009). Likelihood inference in exponential familes and direction of recession. Electronic Journal ofStatistics, 3:259–289.Ghomi, M. (2001). Strictly convex submanifolds and hypersurfaces of positive curvature. Journal of DifferentialGeometry, 57(2):239–271.Ghomi, M. (2004). Optimal smoothing for convex polytopes. Bulletin of the London Mathematical Society,36(4):483–492.Marriott, P. (2002). On the local geometry of mixture models. Biometrika, 89:77–93.Rinaldo, A., Fienberg, S. E., and Zhou, Y. (2009). On the geometry of discrete exponential families withapplication to exponential random graph models. Electronic Journal of Statistics, 3:446–484. Computing Boundaries in Local Mixture ModelsENDThank You

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We generalize the O(dnϵ2)-time (1 + ε)-approximation algorithm for the smallest enclosing Euclidean ball [2,10] to point sets in hyperbolic geometry of arbitrary dimension. We guarantee a O(1/ϵ2) convergence time by using a closed-form formula to compute the geodesic α-midpoint between any two points. Those results allow us to apply the hyperbolic k-center clustering for statistical location-scale families or for multivariate spherical normal distributions by using their Fisher information matrix as the underlying Riemannian hyperbolic metric.
 

Approximating Covering and Minimum EnclosingBalls in Hyperbolic GeometryFrank Nielsen1Ga¨tan Hadjeres2e´Ecole Polytechnique1Sony Computer Science Laboratories, Inc1,2Conference on Geometric Science of Informationc 2015 Frank Nielsen - Ga¨tan Hadjerese1 The Minimum Enclosing Ball problemFinding the Minimum Enclosing Ball (or the 1-center) of a finitepoint set P = {p1 , . . . , pn } in the metric space (X , dX (., .))consists in finding c ∈ X such thatc = argminc∈Xmax dX (c , p)p∈PFigure : A finite point set P and its minimum enclosing ball MEB(P)c 2015 Frank Nielsen - Ga¨tan Hadjerese2 The approximating minimum enclosing ball problemIn a euclidean setting, this problem iswell-defined: uniqueness of the center c ∗ and radius R ∗ of theMEBcomputationally intractable in high dimensions.We fix an > 0 and focus on the Approximate Minimum EnclosingBall problem of finding an -approximation c ∈ X of MEB(P) suchthatdX (c, p) ≤ (1 + )R ∗ ∀p ∈ P.c 2015 Frank Nielsen - Ga¨tan Hadjerese3 The approximating minimum enclosing ball problem: priorworkApproximate solution in the euclidean case are given by Badoiuand Clarkson’s algorithm [Badoiu and Clarkson, 2008]:Initialize center c1 ∈ PRepeat 1/2times the following update:ci+1 = ci +fi − cii +1where fi ∈ P is the farthest point from ci .How to deal with point sets whose underlying geometry is noteuclidean ?c 2015 Frank Nielsen - Ga¨tan Hadjerese4 The approximating minimum enclosing ball problem: priorworkThis algorithm has been generalized todually flat manifolds [Nock and Nielsen, 2005]Riemannian manifolds [Arnaudon and Nielsen, 2013]Applying these results to hyperbolic geometry give the existenceand uniqueness of MEB(P), butgive no explicit bounds on the number of iterationsassume that we are able to precisely cut geodesics.c 2015 Frank Nielsen - Ga¨tan Hadjerese5 The approximating minimum enclosing ball problem: ourcontributionWe analyze the case of point sets whose underlying geometry ishyperbolic.Using a closed-form formula to compute geodesic α-midpoints, weobtaina intrinsic (1 + )-approximation algorithm to the approximateminimum enclosing ball problema O(1/ 2 ) convergence time guaranteea one-class clustering algorithm for specific subfamilies ofnormal distributions using their Fisher information metricc 2015 Frank Nielsen - Ga¨tan Hadjerese6 Model of d-dimensional hyperbolic geometry: ThePoincar´ ball modeleThe Poincar´ ball model (Bd , ρ(., .)) consists in the open unit balleBd = {x ∈ Rd : x < 1} together with the hyperbolic distanceρ (p, q) = arcosh 1 +2 p−q 2(1 − p 2 ) (1 − q 2 ),∀p, q ∈ Bd .This distance induces on the metric space (Bd , ρ) a Riemannianstructure.c 2015 Frank Nielsen - Ga¨tan Hadjerese7 Geodesics in the Poincar´ ball modeleShorter paths between two points (geodesics) are exactlystraight (euclidean) lines passing through the origincircle arcs orthogonal to the unit sphereFigure : “Straight” lines in the Poincar´ ball modelec 2015 Frank Nielsen - Ga¨tan Hadjerese8 Circles in the Poincar´ ball modeleCircles in the Poincar´ ball modelelook like euclidean circlesbut with different centerFigure : Difference between euclidean MEB (in blue) and hyperbolicMEB (in red) for the set of blue points in hyperbolic Poincar´ disk (ineblack). The red cross is the hyperbolic center of the red circle while thepink one is its euclidean center.c 2015 Frank Nielsen - Ga¨tan Hadjerese9 Translations in the Poincar´ ball modeleTp (x) =1− px + x 2 + 2 x, p + 1 pp 2 x 2 + 2 x, p + 12Figure : Tiling of the hyperbolic plane by squaresc 2015 Frank Nielsen - Ga¨tan Hadjerese10 Closed-form formula for computing α-midpointsA point m is the α-midpoint p#α q of two points p, q for α ∈ [0, 1]ifm belongs to the geodesic joining the two points p, qm verifiesρ (p, mα ) = αρ (p, q) .c 2015 Frank Nielsen - Ga¨tan Hadjerese11 Closed-form formula for computing α-midpointsA point m is the α-midpoint p#α q of two points p, q for α ∈ [0, 1]ifm belongs to the geodesic joining the two points p, qm verifiesρ (p, mα ) = αρ (p, q) .For the special case p = (0, . . . , 0), q = (xq , 0, . . . , 0), we havep#α q := (xα , 0, . . . , 0)withcα,q − 1xα =,cα,q + 1c 2015 Frank Nielsen - Ga¨tan Hadjeresewherecα,q := eαρ(p,q)=1 + xq1 − xqα.11 Closed-form formula for computing α-midpointsNoting thatp#α q = Tp (T−p (p) #α T−p (q))∀p, q ∈ Bdwe obtaina closed-form formula for computing p#α qhow to compute p#α q in linear time O(d)that these transformations are exact.c 2015 Frank Nielsen - Ga¨tan Hadjerese12 (1+ )-approximation of an hyperbolic enclosing ball offixed radiusFor a fixed radius r > R ∗ , we can find c ∈ Bd such thatρ (c, P) ≤ (1 + )r∀p ∈ PwithAlgorithm 1: (1 + )-approximation of EHB(P, r )1: c0 := p12: t := 03: while ∃p ∈ P such that p ∈ B (ct , (1 + ) r ) do/4:let p ∈ P be such a point5:α := ρ(ct ,p)−rρ(ct ,p)6:ct+1 := ct #α p7:t := t+18: end while9: return ctc 2015 Frank Nielsen - Ga¨tan Hadjerese13 Idea of the proofptBy the hyperbolic law ofcosines :rch (ρt ) ≥ ch (h) ch (ρt+1 )TTch (ρ1 ) ≥ ch (h) ≥ ch ( r ) .ct+1h> rr ≤rθ ρt+1θρtc∗ctFigure : Update of ctc 2015 Frank Nielsen - Ga¨tan Hadjerese14 (1+ )-approximation of an hyperbolic enclosing ball offixed radiusThe EHB(P, r ) algorithm is a O(1/ 2 )-time algorithm whichreturnsthe center of a hyperbolic enclosing ball with radius(1 + )rin less than 4/c 2015 Frank Nielsen - Ga¨tan Hadjerese2iterations.15 (1+ )-approximation of an hyperbolic enclosing ball offixed radiusThe EHB(P, r ) algorithm is a O(1/ 2 )-time algorithm whichreturnsthe center of a hyperbolic enclosing ball with radius(1 + )rin less than 4/2iterations.Our error with the true MEHB center c ∗ verifiesρ (c, c ∗ ) ≤ arcoshc 2015 Frank Nielsen - Ga¨tan Hadjeresech ((1 + ) r )ch (R ∗ )15 (1 + + 2 /4)-approximation of MEHB(P)In fact, as R ∗ is unknown in general, the EHB algorithm returnsfor any r :an (1 + )-approximation of EHB(P) if r ≥ R ∗the fact that r < R ∗ if the result obtained after more than4/ 2 iterations is not good enough.c 2015 Frank Nielsen - Ga¨tan Hadjerese16 (1 + + 2 /4)-approximation of MEHB(P)In fact, as R ∗ is unknown in general, the EHB algorithm returnsfor any r :an (1 + )-approximation of EHB(P) if r ≥ R ∗the fact that r < R ∗ if the result obtained after more than4/ 2 iterations is not good enough.This suggests to implement a dichotomic search in order tocompute an approximation of the minimal hyperbolic enclosingball. We obtaina O(1 + +in ON2logc 2015 Frank Nielsen - Ga¨tan Hadjerese2 /4)-approximation1of MEHB(P)iterations.16 (1 + + 2 /4)-approximation of MEHB(P) algorithmAlgorithm 2: (1 + )-approximation of MEHB(P)1: c := p12: rmax := ρ (c, P); rmin = rmax ; tmax := +∞23: r := rmax ;4: repeat5:ctemp := Alg1 P, r , 2 , interrupt if t > tmax in Alg16:if call of Alg1 has been interrupted then7:rmin := r8:else9:rmax := r ; c := ctemp10:end if11:dr := rmax −rmin ; r := rmin + dr ;2log(ch(1+ /2)r )−log(ch(rmin ))tmax :=log(ch(r /2))12: until 2dr < rmin 213: return cc 2015 Frank Nielsen - Ga¨tan Hadjerese17 Experimental resultsThe number of iterations does not depend on d.Figure : Number of α-midpoint calculations as a function oflogarithmic scale for different values of d.c 2015 Frank Nielsen - Ga¨tan Hadjeresein18 Experimental resultsThe running time is approximately O( dn ) (vertical translation2in logarithmic scale).Figure : execution time as a function ofdifferent values of d.c 2015 Frank Nielsen - Ga¨tan Hadjeresein logarithmic scale for19 ApplicationsHyperbolic geometry arises when considering certain subfamilies ofmultivariate normal distributions.For instance, the following subfamiliesN µ, σ 2 In of n-variate normal distributions with scalarcovariance matrix (In is the n × n identity matrix),22N µ, diag σ1 , . . . , σn of n-variate normal distributions withdiagonal covariance matrixN(µ0 , Σ) of d-variate normal distributions with fixed mean µ0and arbitrary positive definite covariance matrix Σare statistical manifolds whose Fisher information metric ishyperbolic.c 2015 Frank Nielsen - Ga¨tan Hadjerese20 ApplicationsIn particular, our results apply to the two-dimensionallocation-scale subfamily:Figure : MEHB (D) of probability density functions (left) in the (µ, σ)superior half-plane (right). P = {A, B, C }.c 2015 Frank Nielsen - Ga¨tan Hadjerese21 OpeningsPlugging the EHB and MEHB algorithms to compute clusterscenters in the approximation algorithm by [Gonzalez, 1985], weobtain approximate algorithms forcovering in hyperbolic spacesthe k-center problem in Oc 2015 Frank Nielsen - Ga¨tan HadjeresekNd2log122 Algorithm 3: Gonzalez farthest-first traversal approximation algorithm1: C1 := P,i =02: while i ≤ k do3:∀j ≤ i, compute cj := MEB(Cj )4:∀j ≤ i, set fj := argmaxp∈P ρ(p, cj )5:find f ∈ {fj } whose distance to its cluster center is maximal6:create cluster Ci containing f7:add to Ci all points whose distance to f is inferior to thedistance to their cluster center8:increment i9: end while10: return {Ci }ic 2015 Frank Nielsen - Ga¨tan Hadjerese23 OpeningsThe computation of the minimum enclosing hyperbolic ball doesnot necessarily involve all points p ∈ P.Core-sets in hyperbolic geometrythe MEHB obtained by the algorithm is an -core-setdifferences with the euclidean setting: core-sets are of size atmost 1/ [Badoiu and Clarkson, 2008]c 2015 Frank Nielsen - Ga¨tan Hadjerese24 Thank you!c 2015 Frank Nielsen - Ga¨tan Hadjerese25 Bibliography IArnaudon, M. and Nielsen, F. (2013).On approximating the Riemannian 1-center.Computational Geometry, 46(1):93–104.Badoiu, M. and Clarkson, K. L. (2008).Optimal core-sets for balls.Comput. Geom., 40(1):14–22.Gonzalez, T. F. (1985).Clustering to minimize the maximum intercluster distance.Theoretical Computer Science, 38:293–306.Nock, R. and Nielsen, F. (2005).Fitting the smallest enclosing Bregman ball.In Machine Learning: ECML 2005, pages 649–656. Springer.c 2015 Frank Nielsen - Ga¨tan Hadjerese26

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Brain Computer Interfaces (BCI) based on electroencephalography (EEG) rely on multichannel brain signal processing. Most of the state-of-the-art approaches deal with covariance matrices, and indeed Riemannian geometry has provided a substantial framework for developing new algorithms. Most notably, a straightforward algorithm such as Minimum Distance to Mean yields competitive results when applied with a Riemannian distance. This applicative contribution aims at assessing the impact of several distances on real EEG dataset, as the invariances embedded in those distances have an influence on the classification accuracy. Euclidean and Riemannian distances and means are compared both in term of quality of results and of computational load.
 

From Euclidean to Riemannian Means:Information Geometry for SSVEP ClassificationEmmanuel K. Kalunga, Sylvain Chevallier, Quentin Barthélemy et al.F’SATI - Tshawne University of Technology (South Africa)LISV - Université de Versailles Saint-Quentin (France)Mensia Technologies (France)sylvain.chevallier@uvsq.fr28 October 2015 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCerebral interfacesContext Rehabilitation and disability compensation) Out-of-the-lab solutions) Open to a wider populationProblem Intra-subject variabilities) Online methods, adaptative algorithmsInter-subject variabilities) Good generalization, fast convergenceOpportunities New generation of BCI (Congedo & Barachant)• Growing interest in EEG community• Large community, available datasets• Challenging situations and problemsS. Chevallier28/10/2015GSI2 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesOutlineBrain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesS. Chevallier28/10/2015GSI3 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction based on brain activityBrain-Computer Interface (BCI) for non-muscular communication• Medical applications• Possible applications for wider populationRecording at what scale ?• Neuron• Neuronal group• BrainS. Chevallier!LFP!ECoG!SEEG!EEG!MEG!IRMf!TEP28/10/2015GSI4 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction loopBCI loop1Acquisition2Preprocessing3Translation4User feedbackS. Chevallier28/10/2015GSI5 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesElectroencephalographyMost BCI rely on EEG) Efficient to capture brainwaves• Lightweight system• Low cost• Mature technologies• High temporal resolution• No trepanationS. Chevallier28/10/2015GSI6 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesOrigins of EEG• Local field potentials• Electric potential difference betweendendrite and soma• Maxwell’s equation• Quasi-static approximation• Volume conduction effect• Sensitive to conductivity of brain skull• Sensitive to tissue anisotropiesS. Chevallier28/10/2015GSI7 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesExperimental paradigmsDifferent brain signals for BCI :• Motor imagery : (de)synchronization in premotor cortex• Evoked responses : low amplitude potentials induced by stimulusSteady-State Visually Evoked Potentials8 electrodes in occipital regionSSVEP stimulation LEDs13 Hz 17 Hz21 Hz• Neural synchronization with visual stimulation• No learning required, based on visual attention• Strong induced activationS. Chevallier28/10/2015GSI8 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesBCI ChallengesLimitations• Data scarsity) A few sources are non-linearly mixed on all electrodes• Individual variabilities) Effect of mental fatigue• Inter-session variabilities) Electronic impedances, localizations of electrodes• Inter-individual variabilities) State of the art approaches fail with 20% of subjectsDesired properties :• Online systems) Continously adapt to the user’s variations• No calibration phase) Non negligible cognitive load, raises fatigue• Generic model classifiers and transfert learning) Use data from one subject to enhance the results for anotherS. Chevallier28/10/2015GSI9 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesSpatial covariance matricesCommon approach : spatial filtering• Efficient on clean datasets• Specific to each user and session) Require user calibration• Two step training with feature selection) Overfitting risk, curse of dimensionalityWorking with covariance matrices• Good generalization across subjects• Fast convergence• Existing online algorithms• Efficient implementationsS. Chevallier28/10/2015GSI10 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCovariance matrices for EEG• An EEG trial : X 2 RC ⇥N , C electrodes, N time samples• Assuming that X ⇠ N (0, ⌃)• Covariance matrices ⌃ belong toMC = ⌃ 2 RC ⇥C : ⌃ = ⌃| and x | ⌃x > 0, 8x 2 RC \0• Mean of the set {⌃i }i=1,...,I¯ = argmin⌃2M PI d m (⌃i , ⌃)is ⌃Ci=1• Each EEG class is representedby its mean• Classification based on thosemeans• How to obtain a robust andefficient algorithm ?Congedo, 2013S. Chevallier28/10/2015GSI11 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesMinimum distance to Riemannian meanSimple and robust classifier(k)• Compute the center ⌃Eof each of the K classesˆ• Assign a given unlabelled ⌃ to the closest class(k)ˆk ⇤ = argmin (⌃, ⌃E )k¯Trajectories on tangent space at mean of all trials ⌃µ6Resting class13Hz class21Hz class17Hz class42Delay0−2−4−4S. Chevallier−228/10/2015024GSI12 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesRiemannian potatoRemoving outliers and artifacts¯Reject any ⌃i that lies too far from the mean of all trials ⌃µz( i ) =¯is d(⌃i , ⌃), µ andI{ i }i=1µ> zth ,are the mean and standard deviation of distancesiRaw matricesS. Chevallieri28/10/2015Riemannian potato filteringGSI13 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesCovariance matrices for EEG-based BCIRiemannian approaches in BCI :• Achieve state of the art results! performing like spatial filtering or sensor-space methods• Rely on simpler algorithms! less error-prone, computationally efficientWhat are the reason of this success ?• Invariances embedded with Riemannian distances! invariance to rescaling, normalization, whitening! invariance to electrode permutation or positionning• Equivalent to working in an optimal source space! spatial filtering are sensitive to outliers and user-specific! no question on "sensors or sources" methods) What are the most desirable invariances for EEG ?S. Chevallier28/10/2015GSI14 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesConsidered distances and divergencesEuclidean dE (⌃1 , ⌃2 ) = k⌃1⌃2 kFLog-Euclidean dLE (⌃1 , ⌃2 ) = klog(⌃1 )log(⌃2 )kFV. Arsigny et al., 2006, 2007Affine-invariant dAI (⌃1 , ⌃2 ) = klog(⌃1 1 ⌃2 )kFT. Fletcher & S. Joshi, 2004 , M. Moakher, 2005↵-divergence d↵ D (⌃1 , ⌃2 ) =1<↵<141 ↵2Bhattacharyya dB (⌃1 , ⌃2 ) = log28/10/2015det(⌃1 )12↵det(⌃2 )1+↵2Z. Chebbi & M. Moakher, 2012⇣S. Chevallierlogdet( 1 2 ↵ ⌃1 + 1+↵ ⌃2 )2det 1 (⌃1 +⌃2 )2(det(⌃1 ) det(⌃2 ))1/2⌘1/2Z. Chebbi & M. Moakher, 2012GSI15 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesExperimental results• Euclidean distances yield the lowest results! Usually attributed to the invariance under inversion that is notguaranteed! Displays swelling effect• Riemannian approaches outperform state-of-the-art methods(CCA+SVM)• Bhattacharyya has the lowestcomputational cost and a goodaccuracyS. Chevallier28/10/2015900.780Accuracy (%)performances! but requires a costlyoptimisation to find the best ↵value0.6700.5600.4500.3400.230CPU time (s)• ↵-divergence shows the best0.120−1−0.50Alpha values (α)GSI0.50116 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesConclusionWorking with covariance matrices in BCI• Achieves very good results• Simple algorithms work well : MDM, Riemannian potato• Need for robust and online methodsInteresting applications for IG :• Many freely available datasets• Several competitions• Many open source toolboxes for manipulating EEGSeveral open questions :• Handling electrodes misplacements and others artifacts• Missing data and covariance matrices of lower rank• Inter- and intra-individual variabilitiesS. Chevallier28/10/2015GSI17 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesThank you !S. Chevallier28/10/2015GSI18 / 19 Brain-Computer InterfacesSpatial covariance matrices for BCIExperimental assessment of distancesInteraction loopBCI loop1Acquisition2Preprocessing3Translation4User feedbackFirst systems in early ’70S. Chevallier28/10/2015GSI19 / 19

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We consider the geodesic equation on the elliptical model, which is a generalization of the normal model. More precisely, we characterize this manifold from the group theoretical view point and formulate Eriksen’s procedure to obtain geodesics on normal model and give an alternative proof for it.
 

Group Theoretical Study on Geodesics for the EllipticalModelsHiroto InoueKyushu University, JapanOctober 28, 2015´GSI2015, Ecole Polytechnique, Paris-Saclay, FranceHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20151 / 14 Overview1Eriksen’s construction of geodesics on normal modelProblem2Reconsideration of Eriksen’s argumentEmbedding Nn → Sym+ (R)n+13Geodesic equation on Elliptical model4Future workHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20152 / 14 Eriksen’s construction of geodesics on normal modelLet Sym+ (R) be the set of n-dimensional positive-definite matrices.nThe normal model Nn = (M, ds 2 ) is a Riemannian manifold defined byM = (µ, Σ) ∈ Rn × Sym+ (R) ,n1ds 2 = (tdµ)Σ−1 (dµ) + tr((Σ−1 dΣ)2 ).2The geodesic equation on Nn isµ − ΣΣ−1 µ = 0,¨ ˙˙(1)¨˙˙Σ + µtµ − ΣΣ−1 Σ = 0.˙ ˙The solution of this geodesic equation has been obtained by Eriksen.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20153 / 14 Theorem ([Eriksen 1987])For any x ∈ Rn , B ∈ Symn (R), define a matrix exponential Λ(t) by∆ δ ΦB x0tγ  := exp(−tA),Λ(t) =  tδA := tx 0 −tx  ∈ Mat2n+1 .tΦ γ Γ0 −x −B(2)−1 δ, ∆−1 ) is the geodesic on NThen, the curve (µ(t), Σ(t)) := (−∆nsatisfiying the initial condition(µ(0), Σ(0)) = (0, In ),˙(µ(0), Σ(0)) = (x, B).˙(proof)We see that by the definition, (µ(t), Σ(t)) satisfies the geodesic equation.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20154 / 14 Problem1Explain Eriksen’s theorem, to clarify the relation between thenormal model and symmetric spaces.2Extend Eriksen’s theorem to the elliptical model.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20155 / 14 Reconsideration of Eriksen’s argumentSym+ (R)n+1Notice that the positive-definite symmetric matrices Sym+ (R) is an+1symmetric space byG /KSym+ (R)n+1gK →g · tg ,where G = GLn+1 (R), K = O(n + 1). This space G /K has theG -invariant Riemannian metricds 2 =Hiroto Inoue (Kyushu Uni.)1tr (S −1 dS)2 .2Group Theoretical Study on GeodesicsOctober 28, 20156 / 14 Embedding Nn → Sym+ (R)n+1Put an affine subgroupP µ0 1GA :=P ∈ GLn (R), µ ∈ Rn⊂ GLn+1 (R).Define a Riemannian submanifold as the orbitGA · In+1 = {g · tg | g ∈ GA } ⊂ Sym+ (R).n+1Theorem (Ref. [Calvo, Oller 2001])We have the following isometryNn(Σ, µ)Hiroto Inoue (Kyushu Uni.)∼− GA · In+1 ⊂ Sym+ (R),→n+1→(3)Σ + µtµ µ.tµ1Group Theoretical Study on GeodesicsOctober 28, 20157 / 14 Embedding Nn → Sym+ (R)n+1By using the above embedding, we get a simpler expression of the metricand the geodesic equation.Nn(Σ, µ)coordinatemetricgeodesic eq.∼=→S=⇔ds 2 =⇔˙(In , 0)(S −1 S) = (B, x)ds 2 = (tdµ)Σ−1 (dµ)+ 1 tr((Σ−1 dΣ)2 )2µ − ΣΣ−1 µ = 0,¨ ˙˙¨˙˙Σ + µtµ − ΣΣ−1 Σ = 0˙ ˙Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsGA · In+1 ⊂ Sym+ (R)n+1Σ + µtµ µtµ112tr (S −1 dS)2October 28, 20158 / 14 Reconsideration of Eriksen’s argumentWe can interpret the Eriksen’s argument as follows.BA = tx00−tx −Bx0−x−→Differential equation˙Λ−1 Λ = −A∆ δ ∗∗e −tA =  tδ∗ ∗ ∗−→−→{Λ : JΛJ = Λ−1 }∩sym2n+1 (R)−→∩Sym+ (R)2n+1expIn1Geodesic equation˙(In , 0)(S −1 S) = (B, x)S :=∆tδδ−1∈∈∈{A : JAJ = −A}Here J = In−→−→Nn ∼ GA · In+1=−→∩Sym+ (R)n+1Essential!projection.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 20159 / 14 Geodesic equation on Elliptical modelDefinitionLet us define a Riemannian manifold En (α) = (M, ds 2 ) byM = (µ, Σ) ∈ Rn × Sym+ (R) ,nds 2 = (tdµ)Σ−1 (dµ) +112tr((Σ−1 dΣ)2 )+ dα tr(Σ−1 dΣ) .22(4)where dα = (n + 1)α2 + 2α, α ∈ C. Then En (0) = Nn .The geodesic equation on En (α) is µ − ΣΣ−1 µ = 0,˙ ¨ ˙ Σ + µtµ − ΣΣ−1 Σ−˙˙˙ ˙ ¨dα t −1µΣ µΣ = 0.˙˙ndα + 1(5)This is equivalent to the geodesic equation on the elliptical model.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201510 / 14 Geodesic equation on Elliptical modelThe manifold En (α) is also embedded into positive-definite symmetricmatrices Sym+ (R), ref. [Calvo, Oller 2001], and we have simplern+1expression of the geodesic equation.En (α)coordinate∼=(Σ, µ) →∃ G (α)A· In+1 ⊂ Sym+ (R)n+1S = |Σ|αds 2 =12Σ + µtµ µtµ1tr (S −1 dS)2metric(4)⇔geodesic eq.(5)˙⇔ (In , 0)(S −1 S) = (C , x) − α(log |S|) (In , 0)|A| = det AHiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201511 / 14 Geodesic equation on Elliptical modelBut, in general, we do not ever construct any submanifoldN ⊂ Sym+ (R) such that its projection is En (α):2n+1Geodesic equation˙(In , 0)(S −1 S) = (C , x) − α(log |S|) (In , 0)Λ(t)−→S(t)−→En (α) ∼ GA (α) · In+1=∩Sym+ (R)n+1N∩Sym+ (R)2n+1∈−→∈Differential equation˙Λ−1 Λ = −A−→projectionThe geodesic equation on elliptical model has not been solved.Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201512 / 14 Future work1Extend Eriksen’s theorem for elliptical models (ongoing)2Find Eriksen type theorem for general symmetric spaces G /KSketch of the problem:For a projection p : G /K → G /K ,find a geodesic submanifold N ⊂ G /K ,such that p|N maps all the geodesics to the geodesics:∀ Λ(t):Np(N)∈p(Λ(t)): Geodesic∈−→−→Geodesicp|N∩G /KHiroto Inoue (Kyushu Uni.)−→p:projectionGroup Theoretical Study on Geodesics∩G /KOctober 28, 201513 / 14 ReferencesCalvo, M., Oller, J.M.A distance between elliptical distributions based in an embedding into the Siegelgroup,J. Comput. Appl. Math. 145, 319–334 (2002).Eriksen, P.S.Geodesics connected with the Fisher metric on the multivariate normal manifold,pp. 225–229. Proceedings of the GST Workshop, Lancaster (1987).Hiroto Inoue (Kyushu Uni.)Group Theoretical Study on GeodesicsOctober 28, 201514 / 14

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce a class of paths or one-parameter models connecting arbitrary two probability density functions (pdf’s). The class is derived by employing the Kolmogorov-Nagumo average between the two pdf’s. There is a variety of such path connectedness on the space of pdf’s since the Kolmogorov-Nagumo average is applicable for any convex and strictly increasing function. The information geometric insight is provided for understanding probabilistic properties for statistical methods associated with the path connectedness. The one-parameter model is extended to a multidimensional model, on which the statistical inference is characterized by sufficient statistics.
 

Path connectedness on a space ofprobability density functionsOsamu Komori1 , Shinto Eguchi2University of Fukui1 , JapanThe Institute of Statistical Mathematics2 , JapanEcole Polytechnique, Paris-Saclay (France)October 28, 2015Komori, O. (University of Fukui)GSI2015October 28, 20151 / 18 Contents1Kolmogorov-Nagumo (K-N) average2parallel displacement At3U -divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕ-pathGSI2015October 28, 20152 / 18 SettingTerminologyX : data spaceP : probability measure on X.FP : space of probability density functions associated with PWe consider a path connecting fand g, where f, g ∈ FP , andinvestigate the property from aviewpoint of informationgeometry.Komori, O. (University of Fukui)GSI2015.October 28, 20153 / 18 Kolmogorov-Nagumo (K-N) averageLet ϕ : (0, ∞) → R be an monotonic increasing and concavecontinuous function. Then for f and g in F pThe Kolmogorov-Nagumo (K-N) average()ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)).for 0 ≤ t ≤ 1.Remark 1−1ϕ.is monotone increasing, convex and continuous on (0, ∞)..Komori, O. (University of Fukui)GSI2015October 28, 20154 / 18 ϕ-pathBased on K-N average, we consider ϕ-path connecting f and g inFP :ϕ-path()ft (x, ϕ) = ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) − κt ,where κt ≤ 0 is a normalizing factor, where the equality holds ift = 0 or t = 1...Komori, O. (University of Fukui)GSI2015October 28, 20155 / 18 Existence of κtTheorem 1There uniquely exists κt such that∫X()ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) − κt dP(x) = 1.Proo fFrom the convexity of ϕ−1 , we have∫0≤∫().ϕ−1 (1 − t)ϕ( f (x)) + tϕ(g(x)) dP(x) ≤ {(1 − t) f (x) + tg(x)}dP(x) ≤ 1And we observe that limc→∞ ϕ−1 (c) = +∞ since ϕ−1 is monotone increasing.Hence the continuity of ϕ−1 leads to the existence of κt satisfying the equationabove.Komori, O. (University of Fukui)GSI2015October 28, 20156 / 18 Illustration of ϕ-pathKomori, O. (University of Fukui)GSI2015October 28, 20157 / 18 Examples of ϕ-pathExample 11ϕ0 (x) = log(x). The ϕ0 -path is given byft (x, ϕ0 ) = exp((1 − t) log f (x) + t log g(x) − κt ),where κt = log2∫exp((1 − t) log f (x) + t log g(x))dP(x).ϕη (x) = log(x + η) with η ≥ 0. The ϕη -path is given by[]ft (x, ϕη ) = exp (1 − t) log{ f (x) + η} + t log{g(x) + η} − κt ,where κt = log3[∫]exp{(1 − t) log{ f (x) + η} + t log{g(x) + η}}dP(x) − η .ϕβ (x) = (xβ − 1)/β with β ≤ 1. The ϕβ -path is given byft (x, ϕβ ) = {(1 − t) f (x)β + tg(x)β − κt } β ,1.where κt does not have an explicit form.Komori, O. (University of Fukui)GSI2015October 28, 20158 / 18 Contents1Kolmogorov-Nagumo (K-N) average2parallel displacement At3U -divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕ-pathGSI2015October 28, 20159 / 18 Extended expectationFor a function a(x): X → R, we considerExtended expectation∫E(ϕ) {a(X)}f=1Xϕ′ ( f (x))∫Xa(x)dP(x)1dP(x)′ ( f (x))ϕ,.where ϕ: (0, ∞) → R is a generator function.Remark 2If ϕ(t) = log t, then E(ϕ) reduces to the usual expectation..Komori, O. (University of Fukui)GSI2015October 28, 201510 / 18 Properties of extended expectationWe note that1E(ϕ) (c) = c for any constant c.f2E(ϕ) {ca(X)} = cE(ϕ) {a(X)} for any constant c.ff3E(ϕ) {a(X) + b(X)} = E(ϕ) {a(X)} + E(ϕ) {b(X)}.fff4E(ϕ) {a(X)2 } ≥ 0 with equality if and only if a(x) = 0 forfP-almost everywhere x in X.Remark 3If we define f (ϕ) (x) = 1/ϕ′ ( f (x))/E(ϕ) {a(X)} = E f (ϕ) {a(X)}.fKomori, O. (University of Fukui)∫X1/ϕ′ ( f (x))dP(x), thenGSI2015October 28, 201511 / 18 Tangent space of FPLet H f be a Hilbert space with the inner product defined by⟨a, b⟩ f = E(ϕ) {a(X)b(X)}, and the tangent spacefTangent space associated with extended expectationT f = {a ∈ H f : ⟨a, 1⟩ f = 0}.For a statistical model M = { fθ (x)}θ∈Θ we have.E(ϕ) {∂i ϕ( fθ (X))} = 0fθ.for all θ of Θ, where ∂i = ∂/∂θi with θ = (θi )i=1,··· ,p . Further,E(ϕ) {∂i ∂ j ϕ( fθ (X))}fθKomori, O. (University of Fukui)=E(ϕ)fθ{ ϕ′′ ( fθ (X))ϕ′ ( fθ (X))GSI2015}∂i ϕ( fθ (X))∂i ϕ( fθ (X)) .2October 28, 201512 / 18 (ϕ)Parallel displacement At(ϕ)Define At (x) in T ft by the solution for a differential equation{ϕ′′ ( ft ) }˙tA(ϕ) (x) − E(ϕ) A(ϕ) f˙ ′= 0,ttftϕ ( ft )where ft is a path connecting f and g such that f0 = f and f1 = g.˙tA(ϕ) (x) is the derivative of At(ϕ) (x) with respect to t.Theorem 2The geodesic curve { ft }0≤t≤1 by the parallel displacement At is theϕ-path.(ϕ)Komori, O. (University of Fukui)GSI2015October 28, 201513 / 18 Contents1Kolmogorov-Nagumo (K-N) average2parallel displacement At3U -divergence and its associated geodesic(ϕ)Komori, O. (University of Fukui)characterizing ϕ-pathGSI2015October 28, 201514 / 18 U -divergenceAssume that U(s) is a convex and increasing function of a scalar sand let ξ(t) = argmax s {st − U(s)} . Then we haveU -divergence∫DU ( f, g) =∫{U(ξ(g)) − f ξ(g)}dP −{U(ξ( f )) − f ξ( f )}dP.In fact, U -divergence is the difference of the cross entropy CU ( f, g).with the diagonal entropy CU ( f, f ), where∫CU ( f, g) = {U(ξ(g)) − f ξ(g)}dP..Komori, O. (University of Fukui)GSI2015October 28, 201515 / 18 Connections based on U -divergenceFor a manifold of finite dimension M = { fθ (x) : θ ∈ Θ} and vectorfields X and Y on M , the Riemannian metric is∫G(U)(X, Y)( f ) =X f Yξ( f )dPfor f ∈ M and linear connections ∇(U) and ∇∗ (U) are∫G(U)(∇(U) Y, Z)( f )X=andG(U)(∇∗ (U) Y, Z)( f )XXY f Zξ( f )dP∫=Z f XYξ( f )dP.See Eguchi (1992) for details.Komori, O. (University of Fukui)GSI2015October 28, 201516 / 18 Equivalence between ∇∗ -geodesic andξ-pathLet ∇(U) and ∇∗ (U) be linear connections associated withU -divergence DU , and let C (ϕ) = { ft (x, ϕ) : 0 ≤ t ≤ 1} be the ϕ pathconnecting f and g of FP . Then, we haveTheorem 3A ∇(U) -geodesic curve connecting f and g is equal to C (id) , whereid denotes the identity function; while a ∇∗ (U) -geodesic curveconnecting f and g is equal to C (ξ) , whereξ(t) = argmax s {st − U(s)}.Komori, O. (University of Fukui)GSI2015October 28, 201517 / 18 Summary1234We consider ϕ-path based on Kolmogorov-Nagumo average.The relation between U -divergence and ϕ-path wasinvestigated (ϕ corresponds to ξ ).The idea of ϕ-path can be applied to probability densityestimation as well as classification problems.Divergence associated with ϕ-path can be considered, wherea special case would be Bhattacharyya divergence.Komori, O. (University of Fukui)GSI2015October 28, 201518 / 18

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo

Computational Information Geometry......in mixture modellingComputational Information Geometry: mixture modellingGermain Van Bever1 , R. Sabolová1 , F. Critchley1 & P. Marriott2 .1The Open University (EPSRC grant EP/L010429/1), United Kingdom2 University of Waterloo, USAGSI15, 28-30 October 2015, ParisGermain Van BeverCIG for mixtures1/19 Computational Information Geometry......in mixture modellingOutline1Computational Information Geometry...Information GeometryCIG2...in mixture modellingIntroductionLindsay’s convex geometry(C)IG for mixture distributionsGermain Van BeverCIG for mixtures2/19 Computational Information Geometry......in mixture modellingInformation GeometryCIGOutline1Computational Information Geometry...Information GeometryCIG2...in mixture modellingIntroductionLindsay’s convex geometry(C)IG for mixture distributionsGermain Van BeverCIG for mixtures3/19 Computational Information Geometry......in mixture modellingInformation GeometryCIGGeneralitiesThe use of geometry in statistics gave birth to many different approaches.Traditionally, Information geometry refers to the application of differential geometry tostatistical theory and practice.The main ingredients of IG in exponential families (Amari, 1985) are1the manifold of parameters M ,2the Riemannian (Fisher information) metric g, and3the set of affine connections {connections).−1,+1} (mixture and exponentialThese allow to define notions of curvature, dimension reduction or information lossand invariant higher order expansions. Two affine structures (maps on M ) are usedsimultaneously:-1: Mixture affine geometry on probability measures: λf (x) + (1 − λ)g(x).+1: Exponential affine geometry on probability measures: C(λ)f (x)λ g(x)(1−λ)Germain Van BeverCIG for mixtures4/19 Computational Information Geometry......in mixture modellingInformation GeometryCIGComputational Information GeometryThis talk is about Computational Information Geometry (CIG, Critchley and Marriott,2014).1In CIG, the multinomial model provides, modulo, discretization, a universalmodel. It therefore moves from the manifold-based systems to simplex-basedgeometries and allows for different supports in the extended simplex.2It provides a unifying framework for different geometries.3Tractability of the geometry allows for efficient algorithms in a computationalframework.It is inherently finite and discrete. The impact of discretization is studied. A workingmodel will be a subset of the simplex.Germain Van BeverCIG for mixtures5/19 Computational Information Geometry......in mixture modellingInformation GeometryCIGMultinomial distributionsX ∼ Mult(π0 , . . . , πk ), π = (π0 , . . . , πk ) ∈ int(∆k ), withk∆k :=π : πi ≥ 0,πi = 1 .i=0In this case, π (0) = (π 1 , . . . , π k ) is the mean parameter, while η = log(π (0) /π0 ) isthe natural parameter. Studying limits gives extended exponential families on theclosed simplex (Csiszár and Matúš, 2005).mixed geodesics in +1-space00.0-60.2-4-20.4π2η20.6240.861.0mixed geodesics in -1-space0.00.20.4π10.60.81.0Germain Van Bever-6-4CIG for mixtures-20η12466/19 Computational Information Geometry......in mixture modellingInformation GeometryCIGRestricting to the multinomials familiesUnder regular exponential families with compact support, the cost ofdiscretization on the components of Information Geometry is bounded!The same holds true for the MLE and the log-likelihood function.The log-likelihood (x, π)

Bayesian and Information Geometry for Inverse Problems (chaired by Ali Mohammad-Djafari, Olivier Swander)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We review the manifold projection method for stochastic nonlinear filtering in a more general setting than in our previous paper in Geometric Science of Information 2013. We still use a Hilbert space structure on a space of probability densities to project the infinite dimensional stochastic partial differential equation for the optimal filter onto a finite dimensional exponential or mixture family, respectively, with two different metrics, the Hellinger distance and the L2 direct metric. This reduces the problem to finite dimensional stochastic differential equations. In this paper we summarize a previous equivalence result between Assumed Density Filters (ADF) and Hellinger/Exponential projection filters, and introduce a new equivalence between Galerkin method based filters and Direct metric/Mixture projection filters. This result allows us to give a rigorous geometric interpretation to ADF and Galerkin filters. We also discuss the different finite-dimensional filters obtained when projecting the stochastic partial differential equation for either the normalized (Kushner-Stratonovich) or a specific unnormalized (Zakai) density of the optimal filter.
 

Stochastic PDE projection on manifolds:Assumed-Density and Galerkin FiltersGSI 2015, Oct 28, 2015, ParisDamiano BrigoDept. of Mathematics, Imperial College, Londonwww.damianobrigo.it—Joint work with John ArmstrongDept. of Mathematics, King’s College, London—Full paper to appear in MCSS, see also arXiv.orgD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20151 / 37 Inner Products, Metrics and ProjectionsSpaces of densitiesSpaces of probability densitiesConsider a parametric family of probability densitiesS = {p(·, θ), θ ∈ Θ ⊂ Rm },S 1/2 = {p(·, θ), θ ∈ Θ ⊂ Rm }.If S (or S 1/2 ) is a subset of a function space having an L2 structure (⇒inner product, norm & metric), then we may ask whetherp(·, θ) → θRm ,(p(·, θ) → θ respectively)is a Chart of a m-dim manifold (?) S (S 1/2 ).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20152 / 37 Inner Products, Metrics and ProjectionsSpaces of densitiesSpaces of probability densitiesConsider a parametric family of probability densitiesS = {p(·, θ), θ ∈ Θ ⊂ Rm },S 1/2 = {p(·, θ), θ ∈ Θ ⊂ Rm }.If S (or S 1/2 ) is a subset of a function space having an L2 structure (⇒inner product, norm & metric), then we may ask whetherp(·, θ) → θRm ,(p(·, θ) → θ respectively)is a Chart of a m-dim manifold (?) S (S 1/2 ). The topology & differentialstructure in the chart is the L2 structure, but two possibilities:S : d2 (p1 , p2 ) = p1 − p2(L2 direct distance), p1,2 ∈ L2√ √√√S 1/2 : dH ( p1 , p2 ) =p1 − p2where ·(Hellinger distance),p1,2 ∈ L1is the norm of Hilbert space L2 .D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20152 / 37 Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37 Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.The inner product of 2 basis elements is defined (L2 structure)∂p(·, θ) ∂p(·, θ)∂p(x, θ) ∂p(x, θ)1=1dx = 4 γij (θ) .4∂θi∂θj∂θi∂θj√√∂ p ∂ p1∂p(x, θ) ∂p(x, θ)1=1dx = 4 gij (θ) .4∂θi ∂θjp(x, θ) ∂θi∂θjγ(θ): direct L2 matrix (d2 ); g(θ): famous Fisher-Rao matrix (dH )D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37 Inner Products, Metrics and ProjectionsManifolds, Charts and Tangent VectorsTangent vectors, metrics and projectionIf ϕ : θ → p(·, θ) (θ →p(·, θ) resp.) is the inverse of a chart then{∂ϕ(·, θ)∂ϕ(·, θ),··· ,}∂θ1∂θmare linearly independent L2 (λ) vector that span Tangent Space at θ.The inner product of 2 basis elements is defined (L2 structure)∂p(·, θ) ∂p(·, θ)∂p(x, θ) ∂p(x, θ)1=1dx = 4 γij (θ) .4∂θi∂θj∂θi∂θj√√∂ p ∂ p1∂p(x, θ) ∂p(x, θ)1=1dx = 4 gij (θ) .4∂θi ∂θjp(x, θ) ∂θi∂θjγ(θ): direct L2 matrix (d2 ); g(θ): famous Fisher-Rao matrix (dH )mm∂p(·, θ) ∂p(·, θ)γd2 ort. projection: Πθ [v ] =[γ ij (θ) v ,]∂θj∂θii=1 j=1√(dH proj. analogous inserting · and replacing γ with g)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20153 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdXt= ft (Xt ) dt + σt (Xt ) dWt , X0 , (signal)(1)dYt= bt (Xt ) dt + dVt , Y0 = 0 (noisy observation)ˆˆThese are Ito SDE’s. We use both Ito and Stratonovich (Str) SDE’s. StrˆSDE’s are necessary to deal with manifolds, since second order Itoterms not clear in terms of manifolds [16], although we are working ona direct projection of Ito equations with good optimality properties(John Armstrong)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20154 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdXt= ft (Xt ) dt + σt (Xt ) dWt , X0 , (signal)(1)dYt= bt (Xt ) dt + dVt , Y0 = 0 (noisy observation)ˆˆThese are Ito SDE’s. We use both Ito and Stratonovich (Str) SDE’s. StrˆSDE’s are necessary to deal with manifolds, since second order Itoterms not clear in terms of manifolds [16], although we are working ona direct projection of Ito equations with good optimality properties(John Armstrong)The nonlinear filtering problem consists in finding the conditionalprobability distribution πt of the state Xt given the observations up totime t, i.e. πt (dx) := P[Xt ∈ dx | Yt ], where Yt := σ(Ys , 0 ≤ s ≤ t).Assume πt has a density pt : then pt satisfies the Str SPDE:D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20154 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdpt =L∗ ptt1dt − pt [|bt |2 − Ept {|bt |2 }] dt +2with the forward operator L∗ φ = −tD. Brigo and J. Armstrong (ICL and KCL)n∂i=1 ∂xiSPDE Projection Filtersdpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]GSI 20155 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdpt =L∗ ptt1dt − pt [|bt |2 − Ept {|bt |2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞-dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any finite dim p(·, θ) [19].D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdpt =L∗ ptt1dt − pt [|bt |2 − Ept {|bt |2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞-dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any finite dim p(·, θ) [19].We need finite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdpt =L∗ ptt1dt − pt [|bt |2 − Ept {|bt |2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞-dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any finite dim p(·, θ) [19].We need finite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).Projection transforms the SPDE to a finite dimensional SDE for θ viathe chain rule (hence Str calculus): dp(·, θt ) = m ∂p(·,θ) ◦ dθj (t).j=1 ∂θjD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20155 / 37 Nonlinear Projection FilteringNonlinear filtering problemThe nonlinear filtering problem for diffusion signalsdpt =L∗ ptt1dt − pt [|bt |2 − Ept {|bt |2 }] dt +2with the forward operator L∗ φ = −tn∂i=1 ∂xidpt [btk − Ept {btk }] ◦ dYtk .k=1[fti φ] +12n∂2i,j=1 ∂xi ∂xjij[at φ]∞-dimensional SPDE. Solutions for even toy systems the like cubicsensor, f = 0, σ = 1, b = x 3 , do not belong in any finite dim p(·, θ) [19].We need finite dimensional approximations. We can project SPDEaccording to either the L2 direct metric (γ(θ)) or, by deriving the√analogous equation for pt , according to the Hellinger metric (g(θ)).Projection transforms the SPDE to a finite dimensional SDE for θ viathe chain rule (hence Str calculus): dp(·, θt ) = m ∂p(·,θ) ◦ dθj (t).j=1 ∂θjWith Ito calculus we would have termsD. Brigo and J. Armstrong (ICL and KCL)∂ 2 p(·,θ)∂θi ∂θj dSPDE Projection Filtersθi , θj (not tang vec)GSI 20155 / 37 Nonlinear Projection FilteringProjection FiltersProjection filter in the metrics h (L2) and g (Fisher)mdθti = γ ij (θt )L∗ p(x, θt )tj=1∂p(x, θt )dx −∂θjdmγ ij (θt )j=1m+γ ij (θt )[k=1j=1btk (x)1∂pdx  dt|bt (x)|22∂θj∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjThe above is the projected equation in d2 metric and Πγ .D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20156 / 37 Nonlinear Projection FilteringProjection FiltersProjection filter in the metrics h (L2) and g (Fisher)mdθti = γ ij (θt )L∗ p(x, θt )tj=1∂p(x, θt )dx −∂θjdmγ ij (θt )j=1m+γ ij (θt )[k=1btk (x)j=11∂pdx  dt|bt (x)|22∂θj∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjThe above is the projected equation in d2 metric and Πγ .Instead, using the Hellinger distance & the Fisher metric with projection Πgmm∗Lt p(x, θt ) ∂p(x, θt )1∂p dθti = g ij (θt )dx −g ij (θt )|bt (x)|2dx dtp(x, θt )∂θj2∂θjj=1j=1d+g ij (θt )[k =1D. Brigo and J. Armstrong (ICL and KCL)mj=1SPDE Projection Filtersbtk (x)∂p(x, θt )idx] ◦ dYtk , θ0 .∂θjGSI 20156 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y -function b among c(x) exponentsmakes filter correction step (projection of dY term) exactD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y -function b among c(x) exponentsmakes filter correction step (projection of dY term) exactOne can define both a local and global filtering error through dHD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y -function b among c(x) exponentsmakes filter correction step (projection of dY term) exactOne can define both a local and global filtering error through dHAlternative coordinates, expectation param., η = Eθ [c] = ∂θ ψ(θ).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyExponential FamiliesChoosing the family/manifold: ExponentialIn past literature and in several papers in Bernoulli, IEEE AutomaticControl etc, B. Hanzon and LeGland have developed a theory for theprojection filter using the Fisher metric g and exponential familiesp(x, θ) := exp[θT c(x) − ψ(θ)]. Good combination:The tangent space has a simple structure: square roots do notcomplicate issues thanks to the exponential structure.2The Fisher matrix has a simple structure: ∂θi ,θj ψ(θ) = gij (θ)The structure of the projection Πg is simple for exp familiesSpecial exp family with Y -function b among c(x) exponentsmakes filter correction step (projection of dY term) exactOne can define both a local and global filtering error through dHAlternative coordinates, expectation param., η = Eθ [c] = ∂θ ψ(θ).Projection filter in η coincides with classical approx filter: assumeddensity filter (based on generalized “moment matching”)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20157 / 37 Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the filter equations are simpler?D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37 Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the filter equations are simpler?The answer is affirmative, and this is the mixture family.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37 Choice of the familyMixture FamiliesMixture familiesHowever, exponential families do not couple as well with themetric γ(θ). Is there some important family for which the metricγ(θ) is preferable to the classical Fisher metric g(θ), in that themetric, the tangent space and the filter equations are simpler?The answer is affirmative, and this is the mixture family.We define a simple mixture family as follows. Given m + 1 fixedsquared integrable probability densities q = [q1 , q2 , . . . , qm+1 ]T , defineˆθ(θ) := [θ1 , θ2 , . . . , θm , 1 − θ1 − θ2 − . . . − θm ]Tˆˆfor all θ ∈ Rm . We write θ instead of θ(θ). Mixture family (simplex):ˆS M (q) = {θ(θ)T q, θi ≥ 0 for all i, θ1 + · · · + θm < 1}D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20158 / 37 Choice of the familyMixture FamiliesMixture familiesIf we consider the L2 / γ(θ) distance, the metric γ(θ) itself and therelated projection become very simple. Indeed,∂p(·, θ)= qi − qm+1 and γij (θ) =∂θi(qi (x) − qm (x))(qj (x) − qm (x))dx(NO inline numeric integr).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20159 / 37 Choice of the familyMixture FamiliesMixture familiesIf we consider the L2 / γ(θ) distance, the metric γ(θ) itself and therelated projection become very simple. Indeed,∂p(·, θ)= qi − qm+1 and γij (θ) =∂θi(qi (x) − qm (x))(qj (x) − qm (x))dx(NO inline numeric integr). The L2 metric does not depend on thespecific point θ of the manifold. The same holds for the tangent spaceat p(·, θ), which is given byspan{q1 − qm+1 , q2 − qm+1 , · · · , qm − qm+1 }Also the L2 projection becomes particularly simple.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 20159 / 37 Mixture Projection FilterMixture Projection FilterArmstrong and B. (MCSS 2016 [3]) show that the mixture family +metric γ(θ) lead to a Projection filter that is the same asapproximate filtering via Galerkin [5] methods.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201510 / 37 Mixture Projection FilterMixture Projection FilterArmstrong and B. (MCSS 2016 [3]) show that the mixture family +metric γ(θ) lead to a Projection filter that is the same asapproximate filtering via Galerkin [5] methods.See the full paper for the details. Summing up:Family →Metric↓ExponentialBasic MixtureHellinger dHFisher g(θ)Good∼ADF ≈ localmoment matchingNothing specialDirect L2 d2matrix γ(θ)Nothing specialGood(∼Galerkin)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201510 / 37 Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, filter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiD. Brigo and J. Armstrong (ICL and KCL)iSPDE Projection FiltersGSI 201511 / 37 Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, filter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201511 / 37 Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, filter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.Specifically, we consider a mixture of GAUSSIAN DENSITIES withMEANS AND VARIANCES in each component not fixed. Forexample for a mixture of two Gaussians we have 5 parameters.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x),D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection Filtersparam.θ, µ1 , v1 , µ2 , v2GSI 201511 / 37 Mixture Projection FilterMixture Projection FilterHowever, despite the simplicity above, the mixture family has animportant drawback: for all θ, filter mean is constrainedmin mean of qi ≤ mean of p(·, θ) ≤ max mean of qiiiAs a consequence, we are going to enrich our family to a mixturewhere some of the parameters are also in the core densities q.Specifically, we consider a mixture of GAUSSIAN DENSITIES withMEANS AND VARIANCES in each component not fixed. Forexample for a mixture of two Gaussians we have 5 parameters.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x),param.θ, µ1 , v1 , µ2 , v2We are now going to illustrate the Gaussian mixture projectionfilter (GMPF) in a fundamental example.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201511 / 37 Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXtdYtD. Brigo and J. Armstrong (ICL and KCL)= σdWt= X 2 dt + σdVt .SPDE Projection FiltersGSI 201512 / 37 Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37 Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37 Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalWe expect a bimodal distributionD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37 Mixture Projection FilterThe quadratic sensorThe quadratic sensorConsider the quadratic sensordXt= σdWtdYt= X 2 dt + σdVt .The measurements tell us nothing about the sign of XOnce it seems likely that the state has moved past the origin, thedistribution will become nearly symmetricalWe expect a bimodal distributionθpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(pink)vs EKF (N ) (blue)vs exact (green, finite diff. method, grid 1000 state & 5000 time)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201512 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 01ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201513 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 11ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201514 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 21ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201515 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 31ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201516 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 41ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201517 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 51ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201518 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 61ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201519 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 71ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201520 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 81ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201521 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 91ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201522 / 37 Mixture Projection FilterThe quadratic sensorSimulation for the Quadratic SensorDistribution at time 101ProjectionExactExtended KalmanExponential0.80.60.40.20-8-6-4D. Brigo and J. Armstrong (ICL and KCL)-20X2SPDE Projection Filters468GSI 201523 / 37 Mixture Projection FilterThe quadratic sensorComparing local approximation errors (L2 residuals) εtε2 =t(pexact,t (x) − papprox,t (x))2 dxpapprox,t (x): three possible choices.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(blue)vs EKF (N ) (green)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201524 / 37 Mixture Projection FilterThe quadratic sensorL2 residuals for the quadratic sensorResiduals0.7Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)0.60.50.40.30.20.100246810TimeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201525 / 37 Mixture Projection FilterThe quadratic sensorComparing local approx errors (Prokhorov residuals) εtεt = inf{ : Fexact,t (x − ) − ≤ Fapprox,t (x) ≤ Fexact,t (x + ) +∀x}with F the CDF of p’s.Levy-Prokhorov metric works well with singular densities like particleswhere L2 metric not ideal.θpN (µ1 ,v1 ) (x) + (1 − θ)pN (µ2 ,v2 ) (x) (red)vs eθ1 x+θ2 x2 +θ3x3 +θ x 4 −ψ(θ)4(green)vs best three particles (blue)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201526 / 37 Mixture Projection FilterThe quadratic sensor´Levy residuals for the quadratic sensorProkhorovResiduals0.14Prokhorov Residual (L2NM)Prokhorov Residual (HE)Best possible residual (3Deltas)0.120.10.080.060.040.020012D. Brigo and J. Armstrong (ICL and KCL)345Time6SPDE Projection Filters78910GSI 201527 / 37 Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37 Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeAs one approaches the boundary γij becomes singularD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37 Mixture Projection FilterCubic sensorsCubic sensorsResiduals2Projection Residual (L2 norm)Extended Kalman Residual (L2 norm)Hellinger Projection Residual (L2 norm)1.81.61.41.210.80.60.40.200246810TimeQualitatively similar results up to a stopping timeAs one approaches the boundary γij becomes singularThe solution is to dynamically change the parameterization andeven the dimension of the manifold.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201528 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structureD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler filter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler filter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsFurther investigation: convergence, more on optimality?D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesConclusionsApproximate finite-dimensional filtering by rigorous projection on achosen manifold of densitiesProjection uses overarching L2 structure√Two different metrics: direct L2 and Hellinger/Fisher (L2 on .)Fisher works well with exponential families:multimodality,correction step exact,simplicity of implementationequivalence with Assumed Density Filters “moment matching”Direct L2 works well with mixture familieseven simpler filter equations, no inline numerical integrationbasic version equivalent to Galerkin methodssuited also for multimodality (quadratic sensor tests, L2 global error)comparable with particle methodsFurther investigation: convergence, more on optimality?Optimality: introducing new projections (forthcoming J. Armstrong)D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201529 / 37 Conclusions and ReferencesThanksWith thanks to the organizing committee.Thank you for your attention.Questions and comments welcomeD. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201530 / 37 Conclusions and ReferencesReferences I[1]J. Aggrawal: Sur l’information de Fisher. In: Theories del’Information (J. Kampe de Feriet, ed.), Springer-Verlag,Berlin–New York 1974, pp. 111-117.[2]Amari, S. Differential-geometrical methods in statistics, Lecturenotes in statistics, Springer-Verlag, Berlin, 1985[3]Armstrong, J., and Brigo, D. (2016). Nonlinear filtering viastochastic PDE projection on mixture manifolds in L2 direct metric,Mathematics of Control, Signals and Systems, 2016, accepted.[4]Beard, R., Kenney, J., Gunther, J., Lawton, J., and Stirling, W.(1999). Nonlinear Projection Filter based on Galerkinapproximation. AIAA Journal of Guidance Control and Dynamics,22 (2): 258-266.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201531 / 37 Conclusions and ReferencesReferences II[5]Beard, R. and Gunther, J. (1997). Galerkin Approximations of theKushner Equation in Nonlinear Estimation. Working Paper,Brigham Young University.[6]Barndorff-Nielsen, O.E. (1978). Information and ExponentialFamilies. John Wiley and Sons, New York.[7]Brigo, D. Diffusion Processes, Manifolds of Exponential Densities,and Nonlinear Filtering, In: Ole E. Barndorff-Nielsen and Eva B.Vedel Jensen, editor, Geometry in Present Day Science, WorldScientific, 1999[8]Brigo, D, On SDEs with marginal laws evolving infinite-dimensional exponential families, STAT PROBABIL LETT,2000, Vol: 49, Pages: 127 – 134D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201532 / 37 Conclusions and ReferencesReferences III[9]Brigo, D. (2011). The direct L2 geometric structure on a manifoldof probability densities with applications to Filtering. Available onarXiv.org and damianobrigo.it[10] Brigo, D, Hanzon, B, LeGland, F, A differential geometricapproach to nonlinear filtering: The projection filter, IEEE TAUTOMAT CONTR, 1998, Vol: 43, Pages: 247 – 252[11] Brigo, D, Hanzon, B, Le Gland, F, Approximate nonlinear filteringby projection on exponential manifolds of densities, BERNOULLI,1999, Vol: 5, Pages: 495 – 534[12] D. Brigo, Filtering by Projection on the Manifold of ExponentialDensities, PhD Thesis, Free University of Amsterdam, 1996.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201533 / 37 Conclusions and ReferencesReferences IV[13] Brigo, D., and Pistone, G. (1996). Projecting the Fokker-PlanckEquation onto a finite dimensional exponential family. Available atarXiv.org[14] Crisan, D., and Rozovskii, B. (Eds) (2011). The Oxford Handbookof Nonlinear Filtering, Oxford University Press.[15] M. H. A. Davis, S. I. Marcus, An introduction to nonlinear filtering,in: M. Hazewinkel, J. C. Willems, Eds., Stochastic Systems: TheMathematics of Filtering and Identification and Applications(Reidel, Dordrecht, 1981) 53–75.[16] Elworthy, D. (1982). Stochastic Differential Equations onManifolds. LMS Lecture Notes.D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201534 / 37 Conclusions and ReferencesReferences V[17] Hanzon, B. A differential-geometric approach to approximatenonlinear filtering. In C.T.J. Dodson, Geometrization of StatisticalTheory, pages 219 – 223,ULMD Publications, University ofLancaster, 1987.[18] B. Hanzon, Identifiability, recursive identification and spaces oflinear dynamical systems, CWI Tracts 63 and 64, CWI,Amsterdam, 1989[19] M. Hazewinkel, S.I.Marcus, and H.J. Sussmann, Nonexistence offinite dimensional filters for conditional statistics of the cubicsensor problem, Systems and Control Letters 3 (1983) 331–340.[20] J. Jacod, A. N. Shiryaev, Limit theorems for stochastic processes.Grundlehren der Mathematischen Wissenschaften, vol. 288(1987), Springer-Verlag, Berlin,D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201535 / 37 Conclusions and ReferencesReferences VI[21] A. H. Jazwinski, Stochastic Processes and Filtering Theory,Academic Press, New York, 1970.[22] M. Fujisaki, G. Kallianpur, and H. Kunita (1972). Stochasticdifferential equations for the non linear filtering problem. Osaka J.Math. Volume 9, Number 1 (1972), 19-40.[23] Kenney, J., Stirling, W. Nonlinear Filtering of Convex Sets ofProbability Distributions. Presented at the 1st InternationalSymposium on Imprecise Probabilities and Their Applications,Ghent, Belgium, 29 June - 2 July 1999[24] R. Z. Khasminskii (1980). Stochastic Stability of DifferentialEquations. Alphen aan den Reijn[25] R.S. Liptser, A.N. Shiryayev, Statistics of Random Processes I,General Theory (Springer Verlag, Berlin, 1978).D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201536 / 37 Conclusions and ReferencesReferences VII[26] M. Murray and J. Rice - Differential geometry and statistics,Monographs on Statistics and Applied Probability 48, Chapmanand Hall, 1993.[27] D. Ocone, E. Pardoux, A Lie algebraic criterion for non-existenceof finite dimensionally computable filters, Lecture notes inmathematics 1390, 197–204 (Springer Verlag, 1989)[28] Pistone, G., and Sempi, C. (1995). An Infinite DimensionalGeometric Structure On the space of All the Probability MeasuresEquivalent to a Given one. The Annals of Statistics 23(5), 1995D. Brigo and J. Armstrong (ICL and KCL)SPDE Projection FiltersGSI 201537 / 37

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Clustering, classification and Pattern Recognition in a set of data are between the most important tasks in statistical researches and in many applications. In this paper, we propose to use a mixture of Student-t distribution model for the data via a hierarchical graphical model and the Bayesian framework to do these tasks. The main advantages of this model is that the model accounts for the uncertainties of variances and covariances and we can use the Variational Bayesian Approximation (VBA) methods to obtain fast algorithms to be able to handle large data sets.
 

.Variational Bayesian Approximation method forClassification and Clustering with a mixture ofStudent-t modelAli Mohammad-DjafariLaboratoire des Signaux et Syst`mes (L2S)eUMR8506 CNRS-CentraleSup´lec-UNIV PARIS SUDeSUPELEC, 91192 Gif-sur-Yvette, Francehttp://lss.centralesupelec.frEmail: djafari@lss.supelec.frhttp://djafari.free.frhttp://publicationslist.org/djafariA. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 1/20 Contents1. Mixture models2. Different problems related to classification and clusteringTrainingSupervised classificationSemi-supervised classificationClustering or unsupervised classification3. Mixture of Student-t4. Variational Bayesian Approximation5. VBA for Mixture of Student-t6. ConclusionA. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 2/20 Mixture modelsGeneral mixture modelKak pk (xk |θ k ),p(x|a, Θ, K ) =0 < ak < 1k=1Same family pk (xk |θ k ) = p(xk |θ k ), ∀kGaussian p(xk |θ k ) = N (xk |µk , Σk ) with θ k = (µk , Σk )Data X = {xn , n = 1, · · · , N} where each element xn can bein one of these classes cn .ak = p(cn = k), a = {ak , k = 1, · · · , K },Θ = {θ k , k = 1, · · · , K }Np(Xn , cn = k|a, θ) =p(xn , cn = k|a, θ).n=1A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 3/20 Different problemsTraining:Given a set of (training) data X and classes c, estimate theparameters a and Θ.Supervised classification:Given a sample xm and the parameters K , a and Θ determineits classk ∗ = arg max {p(cm = k|xm , a, Θ, K )} .kSemi-supervised classification (Proportions are not known):Given sample xm and the parameters K and Θ, determine itsclassk ∗ = arg max {p(cm = k|xm , Θ, K )} .kClustering or unsupervised classification (Number of classes Kis not known):Given a set of data X, determine K and c.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 4/20 TrainingGiven a set of (training) data X and classes c, estimate theparameters a and Θ.Maximum Likelihood (ML):(a, Θ) = arg max {p(X, c|a, Θ, K )} .(a,Θ)Bayesian: Assign priors p(a|K ) and p(Θ|K ) = K p(θ k )k=1and write the expression of the joint posterior laws:p(a, Θ|X, c, K ) =p(X, c|a, Θ, K ) p(a|K ) p(Θ|K )p(X, c|K )wherep(X, c|K ) =p(X, c|a, Θ|K )p(a|K ) p(Θ|K ) da dΘInfer on a and Θ either as the Maximum A Posteriori (MAP)or Posterior Mean (PM).A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 5/20 Supervised classificationGiven a sample xm and the parameters K , a and Θ determinep(cm = k|xm , a, Θ, K ) =p(xm , cm = k|a, Θ, K )p(xm |a, Θ, K )where p(xm , cm = k|a, Θ, K ) = ak p(xm |θ k ) andKp(xm |a, Θ, K ) =ak p(xm |θ k )k=1Best class k ∗ :k ∗ = arg max {p(cm = k|xm , a, Θ, K )}kA. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 6/20 Semi-supervised classificationGiven sample xm and the parameters K and Θ (not theproportions a), determine the probabilitiesp(cm = k|xm , Θ, K ) =p(xm , cm = k|Θ, K )p(xm |Θ, K )wherep(xm , cm = k|Θ, K ) =andp(xm , cm = k|a, Θ, K )p(a|K ) daKp(xm |Θ, K ) =p(xm , cm = k|Θ, K )k=1Best class k ∗ , for example the MAP solution:k ∗ = arg max {p(cm = k|xm , Θ, K )} .kA. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 7/20 Clustering or non-supervised classificationGiven a set of data X, determine K and c.Determination of the number of classes:p(K = L|X) =p(X|K = L) p(K = L)p(X, K = L)=p(X)p(X)andL0p(X) =p(K = L) p(X|K = L),L=1where L0 is the a priori maximum number of classes andLp(X|K = L) =ak p(xn , cn = k|θ k )p(a|K ) p(Θ|K ) da dΘn k=1When K and c are determined, we can also determine thecharacteristics of those classes a and Θ.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 8/20 Mixture of Student-t modelStudent-t and its Infinite Gaussian Scaled Model (IGSM):∞T (x|ν, µ, Σ) =0ν νN (x|µ, z −1 Σ) G(z| , ) dz2 2where11N (x|µ, Σ)= |2πΣ|− 2 exp − 2 (x − µ) Σ−1 (x − µ)11= |2πΣ|− 2 exp − 2 Tr (x − µ)Σ−1 (x − µ)andG(z|α, β) =β α α−1zexp [−βz] .Γ(α)Mixture of Student-t:Kp(x|{νk , ak , µk , Σk , k = 1, · · · , K }, K ) =ak T (xn |νk , µk , Σk ).k=1A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 9/20 Mixture of Student-t modelIntroducing znk , zk = {znk , n = 1, · · · , N}, Z = {znk },c = {cn , n = 1, · · · , N},θ k = {νk , ak , µk , Σk }, Θ = {θ k , k = 1, · · · , K }Assigning the priorsp(Θ) = k p(θ k ), we can write:p(X, c, Z, Θ|K ) =nk−1ak N (xn |µk , zn,k Σk ) G(znk | ν2k , ν2k ) p(θ k )Joint posterior law:p(c, Z, Θ|X, K ) =p(X, c, Z, Θ|K ).p(X|K )The main task now is to propose some approximations to it insuch a way that we can use it easily in all the abovementioned tasks of classification or clustering.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 10/20 Variational Bayesian Approximation (VBA)Main idea: to propose easy computational approximationq(c, Z, Θ) for p(c, Z, Θ|X, K ).Criterion: KL(q : p)Interestingly, by noting thatp(c, Z, Θ|X, K ) = p(X, c, Z, Θ|K )/p(X|K )we have:KL(q : p) = −F(q) + ln p(X|K )whereF(q) = − ln p(X, c, Z, Θ|K )qis called free energy of q and we have the following properties:– Maximizing F(q) or minimizing KL(q : p) are equivalentand both give un upper bound to the evidence of the modelln p(X|K ).– When the optimum q ∗ is obtained, F(q ∗ ) can be used as acriterion for model selection.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 11/20 VBA: choosing the good familiesUsing KL(q : p) has the very interesting property that using qto compute the means we obtain the same values if we haveused p (Conservation of the means).Unfortunately, this is not the case for variances or othermoments.If p is in the exponential family, then choosing appropriateconjugate priors, the structure of q will be the same and wecan obtain appropriate fast optimization algorithms.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 12/20 Hierarchical graphical modelξ0γ0 , Σ0µ0 , η0k0 dccc© ‚d E µkαkaβk Σk¨d d ¨‚ d©‚©d  ¨¨¨%¨EznkxnFigure : Graphical representation of the model.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 13/20 VBA for mixture of Student-tIn our case, noting thatp(xn , cn , znk |ak , µk , Σk , νk )p(X, c, Z, Θ|K ) =nk[p(αk ) p(βk ) p(µk |Σk ) p(Σk )]kwith−1p(xn , cn , znk |ak , µk , Σk , νk ) = N (xn |µk , zn,k Σk ) G(znk |αk , βk )is separable, in one side for [c, Z] and in other size incomponents of Θ, we propose to useq(c, Z, Θ) = q(c, Z) q(Θ).A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 14/20 VBA for mixture of Student-tWith this decomposition, the expression of theKullback-Leibler divergence becomes:KL(q1 (c, Z)q2 (Θ) : p(c, Z, Θ|X, K ) =q1 (c, Z)q2 (Θ) lncq1 (c, Z)q2 (Θ)dΘ dZp(c, Z, Θ|X, K )The expression of the Free energy becomes:F(q1 (c, Z)q2 (Θ)) =q1 (c, Z)q2 (Θ) lncA. Mohammad-Djafari,VBA for Classification and Clustering...,p(X, c, Z|Θ, K )p(Θ|K )dΘ dZq1 (c, Z)q2 (Θ)GSI2015, October 28-30, 2015, Polytechnique, France 15/20 Proposed VBA for Mixture of Student-t priors modelUsing a generalized Student-t obtained by replacingG(zn,k | ν2k , ν2k ) by G(zn,k |αk , βk ) it will be easier to proposeconjugate priors for αk , βk than for νk .−1p(xn , cn = k, znk |ak , µk , Σk , αk , βk , K ) = ak N (xn |µk , zn,k Σk ) G(zn,k |αk , βk ).In the following, noting byΘ = {(ak , µk , Σk , αk , βk ), k = 1, · · · , K },we propose to use the factorized prior laws:[p(αk ) p(βk ) p(µk |Σk ) p(Σk )]p(Θ) = p(a)kwith the following components: p(a) = D(a|k0 ), k0 = [k0 , · · · , k0 ] = k0 1 p(α ) = E(α |ζ ) = G(α |1, ζ )0kk 0kp(βk ) = E(βk |ζ0 ) = G(αk |1, ζ0 ) p(µ |Σk ) = N (µ |µ0 1, η −1 Σk )kk0p(Σk ) = IW(Σk |γ0 , γ0 Σ0 )A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 16/20 Proposed VBA for Mixture of Student-t priors modelwhereD(a|k) =Γ(kk )l Γ(kl )alkl −1llis the Dirichlet pdf,E(t|ζ0 ) = ζ0 exp [−ζ0 t]is the Exponential pdf,G(t|a, b) =b a a−1texp [−bt]Γ(a)is the Gamma pdf andIW(Σ|γ, γ∆) =1| 2 ∆|γ/2 exp − 1 Tr ∆Σ−12ΓD (γ/2)|Σ|γ+D+12.is the inverse Wishart pdf.With these prior laws and the likelihood: joint posterior law:p(X, c, Z, Θ)pk (c, Z, Θ|X) =.p(X)A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 17/20 Expressions of qq(c, Z, Θ) = q(c, Z) q(Θ) =nk [q(cn= k|znk ) q(znk )]k [q(αk ) q(βk ) q(µk |Σk ) q(Σk )] q(a).with:˜˜˜˜ q(a) = D(a|k), k = [k1 , · · · , kK ] q(αk ) = G(αk |ζk , ηk )˜ ˜˜ ˜q(βk ) = G(βk |ζk , ηk )˜ q(µk |Σk ) = N (µk |µ, η −1 Σk )q(Σk ) = IW(Σk |˜ , γ Σ)γ ˜˜With these choices, we haveF(q(c, Z, Θ)) = ln p(X, c, Z, Θ|K )q(c,Z,Θ)F1kn +=kF1knFA. Mohammad-Djafari,= ln p(xn , cn , znk , θ k )= ln p(x , c , znk , θ k )2kn nVBA for Classification and Clustering...,nF2kkq(cn =k|znk )q(znk )θq( k 28-30, 2015, Polytechnique, France 18/20GSI2015, October ) VBA Algorithm stepExpressions of the updating expressions of the tilded parametersare obtained by following three steps:E step: Optimizing F with respect to q(c, Z) when keepingq(Θ) fixed, we obtain the expression of q(cn = k|znk ) = ˜k ,aq(znk ) = G(znk |αk , βk ).M step: Optimizing F with respect to q(Θ) when keepingq(c, Z) fixed, we obtain the expression of˜˜˜˜˜ ˜q(a) = D(a|k), k = [k1 , · · · , kK ], q(αk ) = G(αk |ζk , ηk ),˜k , ηk ), q(µk |Σk ) = N (µk |µ, η −1 Σk ), andq(βk ) = G(βk |ζ ˜˜q(Σk ) = IW(Σk |˜ , γ Σ), which gives the updating algorithmγ ˜˜for the corresponding tilded parameters.F evaluation: After each E step and M step, we can alsoevaluate the expression of F(q) which can be used forstopping rule of the iterative algorithm.Final value of F(q) for each value of K , noted Fk , can beused as a criterion for model selection, i.e.; the determinationof the number of clusters.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 19/20 ConclusionsClustering and classification of a set of data are between themost important tasks in statistical researches for manyapplications such as data mining in biology.Mixture models and in particular Mixture of Gaussians areclassical models for these tasks.We proposed to use a mixture of generalised Student-tdistribution model for the data via a hierarchical graphicalmodel.To obtain fast algorithms and be able to handle large datasets, we used conjugate priors everywhere it was possible.The proposed algorithm has been used for clustering,classification and discriminant analysis of some biological data(Cancer research related), but in this paper, we only presentedthe main algorithm.A. Mohammad-Djafari,VBA for Classification and Clustering...,GSI2015, October 28-30, 2015, Polytechnique, France 20/20

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The textile plot proposed by Kumasaka and Shibata (2008) is a method for data visualization. The method transforms a data matrix in order to draw a parallel coordinate plot. In this paper, we investigate a set of matrices induced by the textile plot, which we call the textile set, from a geometrical viewpoint. It is shown that the textile set is written as the union of two differentiable manifolds if data matrices are restricted to be full-rank.
 

What is textile plot?Textile setMain resultOther resultsSummaryGeometric Properties of textile plotTomonari SEI and Ushio TANAKAUniversity of Tokyo and Osaka Prefecture University´at Ecole Polytechnique, Oct 28, 20151 / 23 What is textile plot?Textile setMain resultOther resultsSummaryIntroductionThe textile plot proposed by Kumasaka and Shibata (2008) isa method for data visualization.The method transforms a data matrix into another matrix,Rn×pX → Y ∈ Rn×p ,in order to draw a parallel coordinate plot.The parallel coordinate plot is a standard 2-dimensionalgraphical tool for visualizing multivariate data at a glance.In this talk, we investigate a set of matrices induced by thetextile plot, which we call the textile set, from a differentialgeometrical point of view.It is shown that the textile set is written as the union of twodifferentiable manifolds if data matrices are “generic”.2 / 23 What is textile plot?Textile setMain resultOther resultsSummaryIntroductionThe textile plot proposed by Kumasaka and Shibata (2008) isa method for data visualization.The method transforms a data matrix into another matrix,Rn×pX → Y ∈ Rn×p ,in order to draw a parallel coordinate plot.The parallel coordinate plot is a standard 2-dimensionalgraphical tool for visualizing multivariate data at a glance.In this talk, we investigate a set of matrices induced by thetextile plot, which we call the textile set, from a differentialgeometrical point of view.It is shown that the textile set is written as the union of twodifferentiable manifolds if data matrices are “generic”.2 / 23 What is textile plot?Textile set1Main result4Other results5SummaryTextile set3Other resultsWhat is textile plot?2Main resultSummary3 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotExample (Kumasaka and Shibata, 2008)7.9Textile plot for the iris data.(150 cases, 5 attributes)6.92.52Each variate is transformedby a location-scaletransformation.virginicaversicolorCategorical data isquantified.Missing data is admitted.setosa0.14.4idWal.Leal.Petal.SepngWidgtenl.LpaSeththhsieecSpth14.3PetOrder of axes can bemaintained.4 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotExample (Kumasaka and Shibata, 2008)7.9Textile plot for the iris data.(150 cases, 5 attributes)6.92.52Each variate is transformedby a location-scaletransformation.virginicaversicolorCategorical data isquantified.Missing data is admitted.setosa0.14.4idWal.Leal.Petal.SepngWidgtenl.LpaSeththhsieecSpth14.3PetOrder of axes can bemaintained.4 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coefficients aj and bj are determined by the followingprocedure.5 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coefficients aj and bj are determined by the followingprocedure.5 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotLet us recall the method of the textile plot.For simplicity, we assume no categorical variate and nomissing value.Let X = (x1 , . . . , xp ) ∈ Rn×p be the data matrix.Without loss of generality, assume the sample mean andsample variance of each xj are 0 and 1, respectively.The data is transformed into Y = (y1 , . . . , yp ), whereyj = aj + bj xj ,aj , bj ∈ R, j = 1, . . . , p.The coefficients aj and bj are determined by the followingprocedure.5 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile plotCoefficients a = (aj ) and b = (bj )are the solution of the followingminimization problem:pn∑∑Minimize(ytj − yt· )2¯a,byt4t=1 j=1subject to yj = aj + bj xj ,p∑yjj=1Intuition: as horizontal as possible.2yt3yt.= 1.yt5yt1yt2Solution: a = 0 and b is theeigenvector corresponding to themaximum eigenvalue of thecovariance matrix of X.6 / 23 What is textile plot?Textile setMain resultOther resultsExample (n = 100, p = 4)X ∈ R100×4 . Each row ∼ N(0, Σ), Σ =3.271−0.60.50.1−0.61−0.6−0.20.5−0.610.0Summary0.1−0.2.0.01−3.932.982.432.232.982.43−2.582.23−2.71−2.72−2.58−2.71−3.93(a) raw data X−2.723.27(b) textile plot Y7 / 23 What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisfies two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following definition of the textile set.8 / 23 What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisfies two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following definition of the textile set.8 / 23 What is textile plot?Textile setMain resultOther resultsSummaryOur motivationThe textile plot transforms the data matrix X into Y.Denote the map by Y = τ (X).What is the image τ (Rn×p )?We can show that Y ∈ τ (Rn×p ) satisfies two conditions:∃λ ≥ 0, ∀i = 1, . . . , p,p∑yi yj = λ yi2j=1andp∑yj2= 1.j=1This motivates the following definition of the textile set.8 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile setDefinitionThe textile set is defined byTn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is defined by∑Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile setDefinitionThe textile set is defined byTn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is defined by∑Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile setDefinitionThe textile set is defined byTn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is defined by∑Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile setDefinitionThe textile set is defined byTn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is defined by∑Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTextile setDefinitionThe textile set is defined byTn,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,∑yi yj = λ yij2,∑yj2= 1 },jThe unnormalized textile set is defined by∑Un,p = { Y ∈ Rn×p | ∃λ ≥ 0, ∀i,yi yj = λ yi2}.jWe are interested in mathematical properties of Tn,p and Un,p .Bad news: statistical implication such is a future work.Let us begin with small p case.9 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTn,p with small pLemma (p = 1)Tn,1 = Sn−1 , the unit sphere.Lemma (p = 2)Tn,2 = A ∪ B, where√A = {(y1 , y2 ) | y1 = y2 = 1/ 2},B = {(y1 , y2 ) | y1 − y2 = y1 + y2 = 1},each of which is diffeomorphic to Sn−1 × Sn−1 . Their intersectionA ∩ B is diffeomorphic to the Stiefel manifold Vn,2 .→ See next slide for n = p = 2 case.10 / 23 What is textile plot?Textile setMain resultOther resultsSummaryTn,p with small pLemma (p = 1)Tn,1 = Sn−1 , the unit sphere.Lemma (p = 2)Tn,2 = A ∪ B, where√A = {(y1 , y2 ) | y1 = y2 = 1/ 2},B = {(y1 , y2 ) | y1 − y2 = y1 + y2 = 1},each of which is diffeomorphic to Sn−1 × Sn−1 . Their intersectionA ∩ B is diffeomorphic to the Stiefel manifold Vn,2 .→ See next slide for n = p = 2 case.10 / 23 What is textile plot?Textile setMain resultOther resultsSummaryExample (n = p = 2)T2,2 ⊂ R4 is the union of two tori, glued along O(2).ηφξθ{T2,2 =1√2(cos θsin θ)}cos φsin φ{ ()}1 cos ξ + cos η cos ξ − cos η∪2 sin ξ + sin η sin ξ − sin η11 / 23 What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we define two concepts: noncompactStiefel manifold and canonical form.Definition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices:V ∗ := { Y ∈ Rn×p | rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the Gram-Schmidt orthonormalization, the quotient spaceV ∗ /O(n) is identified with upper-triangular matrices withpositive diagonals. → see next slide.12 / 23 What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we define two concepts: noncompactStiefel manifold and canonical form.Definition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices:V ∗ := { Y ∈ Rn×p | rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the Gram-Schmidt orthonormalization, the quotient spaceV ∗ /O(n) is identified with upper-triangular matrices withpositive diagonals. → see next slide.12 / 23 What is textile plot?Textile setMain resultOther resultsSummaryFor general dimension pTo state our main result, we define two concepts: noncompactStiefel manifold and canonical form.Definition (e.g. Absil et al. (2008))Let n ≥ p. Denote by V ∗ the set of all column full-rank matrices:V ∗ := { Y ∈ Rn×p | rank(Y) = p }.V ∗ is called the noncompact Stiefel manifold.Note that dim(V ∗ ) = np and V ∗ = Rn×p .The orthogonal group O(n) acts on V ∗ .By the Gram-Schmidt orthonormalization, the quotient spaceV ∗ /O(n) is identified with upper-triangular matrices withpositive diagonals. → see next slide.12 / 23 What is textile plot?Textile setMain resultOther resultsSummaryNoncompact Stiefel manifold and canonical formDefinition (Canonical form)Let us denote by V ∗∗ the set of all matrices written asy11 · · · y1p. .  0 ....  ... .. ypp  , y > 0, 1 ≤ i ≤ p. .ii 0 ··· 0  .. .  ...0 ··· 0We call it a canonical form.Note that V ∗∗ ⊂ V ∗ and V ∗ /O(n)V ∗∗ .13 / 23 What is textile plot?Textile setMain resultOther resultsSummaryNoncompact Stiefel manifold and canonical formDefinition (Canonical form)Let us denote by V ∗∗ the set of all matrices written asy11 · · · y1p. .  0 ....  ... .. ypp  , y > 0, 1 ≤ i ≤ p. .ii 0 ··· 0  .. .  ...0 ··· 0We call it a canonical form.Note that V ∗∗ ⊂ V ∗ and V ∗ /O(n)V ∗∗ .13 / 23 What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : non-compact Stiefel manifold,V ∗∗ : set of canonical forms.DefinitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identified with Un,p .∗∗So it is essential to study Un,p .14 / 23 What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : non-compact Stiefel manifold,V ∗∗ : set of canonical forms.DefinitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identified with Un,p .∗∗So it is essential to study Un,p .14 / 23 What is textile plot?Textile setMain resultOther resultsSummaryRestriction of unnormalized textile setV ∗ : non-compact Stiefel manifold,V ∗∗ : set of canonical forms.DefinitionDenote the restriction of Un,p to V ∗ and V ∗∗ by∗Un,p = Un,p ∩ V ∗ ,∗∗Un,p = Un,p ∩ V ∗∗ ,respectively.∗The group O(n) acts on Un,p .∗∗∗The quotient space Un,p /O(n) is identified with Un,p .∗∗So it is essential to study Un,p .14 / 23 What is textile plot?Textile setMain resultOther resultsSummary∗∗Un,p for small pLet us check examples.Example (n = p = 1)∗∗U1,1 = {(1)}.Example (n = p = 2)()y11 y12Let Y =with y11 , y22 > 0. Then0 y22∗∗222U2,2 = {y12 = 0} ∪ {y11 = y12 + y22 },union of a plane and a cone.15 / 23 What is textile plot?Textile setMain resultOther resultsSummary∗∗Un,p for small pLet us check examples.Example (n = p = 1)∗∗U1,1 = {(1)}.Example (n = p = 2)()y11 y12Let Y =with y11 , y22 > 0. Then0 y22∗∗222U2,2 = {y12 = 0} ∪ {y11 = y12 + y22 },union of a plane and a cone.15 / 23 What is textile plot?Textile setMain resultOther resultsSummaryMain theorem∗∗The differential geometrical property of Un,p is given as follows:TheoremLet n ≥ p ≥ 3. Then we have the following decomposition∗∗Un,p = M1 ∪ M2 ,where each Mi is a differentiable manifold, the dimensions of whichare given byp(p + 1)− (p − 1),2p(p + 1)dim M2 =− p,2dim M1 =respectively. M2 is connected while M1 may not.16 / 23 What is textile plot?Textile setMain resultOther resultsSummaryExample∗∗U3,3 is the union of 4-dim and 3-dim manifolds.We look at a cross section with y11 = y22 = 1:y13y33y12Union of a surface and a vertical line.17 / 23 What is textile plot?Textile setMain resultOther resultsSummaryCorollaryLet n ≥ p ≥ 3. Then we have∗Un,p = π −1 (M1 ) ∪ π −1 (M2 ),where π denotes the map of Gram-Schmidt orthonormalization.The dimensions aredim π −1 (M1 ) = np − (p − 1),dim π −1 (M2 ) = np − p.18 / 23 What is textile plot?Textile setMain resultOther resultsSummaryOther resultsWe state other results. First we have n = 1 case.LemmaIf n = 1, then the textile set T1,p is the union of a(p − 2)-dimensional manifold and 2(2p − 1) isolated points.Example∗∗U1,3 consists of a circle and 14 points:∗∗U1,3 = (S 2 ∩ {y1 + y2 + y3 = 1})111111111∪ {±( √3 , √3 , √3 ), ±( √2 , √2 , 0), ±( √2 , 0, √2 ), ±(0, √2 , √2 ),± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)}.19 / 23 What is textile plot?Textile setMain resultOther resultsSummaryOther resultsWe state other results. First we have n = 1 case.LemmaIf n = 1, then the textile set T1,p is the union of a(p − 2)-dimensional manifold and 2(2p − 1) isolated points.Example∗∗U1,3 consists of a circle and 14 points:∗∗U1,3 = (S 2 ∩ {y1 + y2 + y3 = 1})111111111∪ {±( √3 , √3 , √3 ), ±( √2 , √2 , 0), ±( √2 , 0, √2 ), ±(0, √2 , √2 ),± (1, 0, 0), ±(0, 1, 0), ±(0, 0, 1)}.19 / 23 What is textile plot?Textile setMain resultOther resultsSummaryDifferential geometrical characterization of fλ −1 (O)Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by∑y1 yj − λ y1 2 j....fλ (y1 , . . . , yp ) := ∑2 j yp yj − λ yp ∑2−1j yjLemmaWe have a classification of Tn,p , namelyfλ −1 (O) =Tn,p =λ≥0fλ −1 (O).0≤λ≤n20 / 23 What is textile plot?Textile setMain resultOther resultsSummaryDifferential geometrical characterization of fλ −1 (O)Fix λ ≥ 0 arbitrarily. We define the map fλ : Rn×p → Rp+1 by∑y1 yj − λ y1 2 j....fλ (y1 , . . . , yp ) := ∑2 j yp yj − λ yp ∑2−1j yjLemmaWe have a classification of Tn,p , namelyfλ −1 (O) =Tn,p =λ≥0fλ −1 (O).0≤λ≤n20 / 23 What is textile plot?Textile setMain resultOther resultsSummaryDifferential geometrical characterization of fλ −1 (O)Lastly, we state a characterization of fλ −1 (O) from the viewpointof differential geometry.TheoremLet λ ≥ 0. fλ −1 (O) is a regular sub-manifold of Rn×p withcodimension p + 1 wheneverλ > 0,y11 yjj − y1j yj1 = 0,∃ ∈ { 2, . . . , p };p∑j = 2, . . . , p,yij + yi (1 − 2λ) = 0,i = 1, . . . , n.j=221 / 23 What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We defined the textile set Tn,p and find its geometricproperties.Present and future study:1Characterize the classification fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate differential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one find statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23 What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We defined the textile set Tn,p and find its geometricproperties.Present and future study:1Characterize the classification fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate differential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one find statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23 What is textile plot?Textile setMain resultOther resultsSummaryPresent and future studySummary:We defined the textile set Tn,p and find its geometricproperties.Present and future study:1Characterize the classification fλ −1 (O) with inducedRiemannian metric from Rnp by (global) Riemanniangeometry: geodesic, curvature etc.2Investigate differential geometrical and topological propertiesof Tn,p and fλ −1 (O), including its group action.3Can one find statistical implication such as sample distributiontheory?.Merci beaucoup!.22 / 23 What is textile plot?Textile setMain resultOther resultsSummaryReferences1 Absil, P.-A., Mahony, R., and Sepulchre, R. (2008), OptimizationAlgorithms on Matrix Manifolds, Princeton University Press..2 Honda, K. and Nakano, J. (2007), 3 dimensional parallel coordinate plot,Proceedings of the Institute of Statistical Mathematics, 55, 69–83.3 Inselberg, A. (2009), Parallel Coordinates: VISUAL MultidimensionalGeometry and its Applications, Springer..4 Kumasaka, N. and Shibata, R. (2008), High-dimensional datavisualisation: The textile plot, Computational Statistics and DataAnalysis, 52, 3616–3644..23 / 23

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In anomalous statistical physics, deformed algebraic structures are important objects. Heavily tailed probability distributions, such as Student’s t-distributions, are characterized by deformed algebras. In addition, deformed algebras cause deformations of expectations and independences of random variables. Hence, a generalization of independence for multivariate Student’s t-distribution is studied in this paper. Even if two random variables which follow to univariate Student’s t-distributions are independent, the joint probability distribution of these two distributions is not a bivariate Student’s t-distribution. It is shown that a bivariate Student’s t-distribution is obtained from two univariate Student’s t-distributions under q-deformed independence.
 

A generalization of independence andmultivariate Student’s t-distributionsMATSUZOE HiroshiNagoya Institute of Technologyjoint works withSAKAMOTO Monta123456(Efrei, Paris)Deformed exponential familyNon-additive differentials and expectation functionalsGeometry of deformed exponential familiesGeneralization of independenceq-independence and Student’s t-distributionsAppendixNotions of expectations, independence are determinedfrom the choice of statistical models. Probability density function

Hessian Information Geometry (chaired by Shun-Ichi Amari, Michel Boyom)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We define a metric and a family of α-connections in statistical manifolds, based on ϕ-divergence, which emerges in the framework of ϕ-families of probability distributions. This metric and α-connections generalize the Fisher information metric and Amari’s α-connections. We also investigate the parallel transport associated with the α-connection for α = 1.
 
no preview

2nd Conference on Geometric Science of Information, GSI2015October 28–30, 2015 – Ecole Polytechnique, Paris-SaclayNew Metric and Connectionsin Statistical ManifoldsRui F. Vigelis,1 David C. de Souza,2and Charles C. Cavalcante31 32Federal University of Ceará – BrazilFederal Institute of Ceará – BrazilSession “Hessian Information Geometry ”, October 28 OutlineIntroductionϕ-Functionsϕ-DivergenceGeneralized Statistical ManifoldConnectionsϕ-FamiliesDiscussion IntroductionIn the paperR.F. Vigelis, C.C. Cavalcante. On ϕ-families of probabilitydistributions. J. Theor. Probab., 26(3):870–884, 2013,the authors proposed the so called ϕ-divergenceDϕ (p q),for p, q ∈ Pµ .The ϕ-divergence is defined in terms of a ϕ-function.The metric and connections that we propose is derived fromthe ϕ-divergence Dϕ (· ·). IntroductionThe proposition of new geometric structures (metric andconnections) in statistical manifolds is a recurrent researchtopic.To cite a few:J. Zhang. Divergence function, duality, and convex analysis.Neural Computation, 16(1): 159–195, 2004.J. Naudts. Estimators, escort probabilities, and φ-exponentialfamilies in statistical physics. JIPAM, 5(4): Paper No. 102, 15p., 2004.S.-i. Amari, A. Ohara, H. Matsuzoe. Geometry of deformedexponential families: invariant, dually-flat and conformalgeometries. Physica A, 391(18): 4308–4319, 2012.H. Matsuzoe. Hessian structures on deformed exponentialfamilies and their conformal structures. Differential Geom.Appl, 35(suppl.): 323–333, 2014. IntroductionLet (T , Σ, µ) be a measure space.All probability distributions will be consideredˆ0Pµ = p ∈ L : p > 0 andpdµ = 1 ,Twhere L0 denotes the set of all real-valued, measurablefunctions on T , with equality µ-a.e. ϕ-FunctionsA function ϕ : R → (0, ∞) is said to be a ϕ-function if thefollowing conditions are satisfied:(a1) ϕ(·) is convex;(a2) limu→−∞ ϕ(u) = 0 and limu→∞ ϕ(u) = ∞;(a3) there exists a measurable function u0 : T → (0, ∞) such thatˆϕ(c(t) + λu0 (t))dµ < ∞, for all λ > 0,Tfor each measurable function c : T → R such that ϕ(c) ∈ Pµ .Not all functions satisfying (a1) and (a2) admit the existenceof u0 .Condition (a3) is imposed so that ϕ-families areparametrizations for Pµ in the same manner as exponentialfamilies. ϕ-FunctionsThe κ-exponential function expκ : R → (0, ∞), for κ ∈ [−1, 1],which is given by√(κu + 1 + κ2 u 2 )1/κ , if κ = 0,expκ (u) =exp(u),if κ = 0,is a ϕ-function.The q-exponential function11−qexpq (u) = [1 + (1 − q)u]+ ,where q > 0 and q = 1,is not a ϕ-function (expq (u) = 0 for u < 1/(1 − q)).A ϕ-function ϕ(·) may not be a φ-exponential functionexpφ (·), which is defined as the inverse ofˆ u1lnφ (u) =dx,u > 0,1 φ(x)for some increasing function φ : [0, ∞) → [0, ∞). ϕ-DivergenceWe define the ϕ-divergence asˆϕ−1 (p) − ϕ−1 (q)dµ(ϕ−1 ) (p),Dϕ (p q) = T ˆu0dµ−1T (ϕ ) (p)for any p, q ∈ Pµ .If ϕ(·) = exp(·) and u0 = 1 then Dϕ (p q) coincides with theKullback–Leibler divergenceˆpdµ.DKL (p q) =p logqT Generalized Statistical ManifoldA metric (gij ) can be derived from the ϕ-divergence:∂∂∂θi p ∂θj∂ 2 fθ= −Eθ,∂θi ∂θjgij = −qDϕ (p q)q=pwhere fθ = ϕ−1 (pθ ) and´(·)ϕ (fθ )dµEθ [·] = ´ T.T u0 ϕ (fθ )dµConsidering the log-likelihood function lθ = log(pθ ) in theplace of fθ = ϕ−1 (pθ ), we get the Fisher information matrix. Generalized Statistical ManifoldA family o probability distributions P = {pθ : θ ∈ Θ} ⊆ Pµ is saidto be a generalized statistical manifold if the following conditionsare satisfied:(P1) Θ is a domain (an open and connected set) in Rn .(P2) p(t; θ) = pθ (t) is a differentiable function with respect to θ.(P3) The operations of integration with respect to µ anddifferentiation with respect to θi commute.(P4) The matrix g = (gij ), which is defined bygij = −Eθ∂ 2 fθ,∂θi ∂θjis positive definite at each θ ∈ Θ. Generalized Statistical ManifoldThe matrix (gij ) can also be expressed asgij = Eθ´(·)ϕ (fθ )dµwhere Eθ [·] = ´T.T u0 ϕ (fθ )dµ∂fθ ∂fθ,∂θi ∂θjAs consequence, the mappingaiX =i∂→X =∂θiaii∂fθ∂θiis an isometry between the tangent space Tθ P at pθ andTθ P = span∂fθ: i = 1, . . . , n ,∂θiequipped with the inner product X , Yθ= Eθ [X Y ]. ConnectionsWe use the ϕ-divergence Dϕ (· ·) to define a pair of mutuallydual connections D (1) and D (−1) , whose Christoffel symbolsare given by(1)Γijk = −∂2∂θi ∂θjp∂∂θkq∂∂θk∂2∂θi ∂θjqDϕ (p q)q=pand(−1)Γijk=−pDϕ (p q)q=p.Connections D (1) and D (−1) correspond to the exponential emixture connections. Connections(1)(−1)Expressions for the Christoffel symbols Γijk and Γijk(1)Γijk = Eθare given by∂ 2 fθ ∂fθ∂ 2 fθ∂fθ− EθE u0 k∂θi ∂θj ∂θk∂θi ∂θj θ∂θand(−1)Γijkwhere∂ 2 fθ ∂fθ∂fθ ∂fθ ∂fθ+ Eθ∂θi ∂θj ∂θk∂θi ∂θj ∂θk∂fθ∂fθ∂fθ ∂fθ∂fθ ∂fθ− EθEθ u0 i − EθEθ u0 j ,j ∂θ ki ∂θ k∂θ∂θ∂θ∂θ= Eθ´(·)ϕ (fθ )dµ.Eθ [·] = ´TT u0 ϕ (fθ )dµTerms in red vanish if ϕ(·) = exp(·) and u0 = 1. ConnectionsUsing the pair of mutually dual connections D (1) and D (−1) ,we can specify a family of α-connections D (α) in generalizedstatistical manifolds, whose Christoffel symbols are(α)Γijk =1 + α (1) 1 − α (−1)Γijk +Γijk .22The connections D (α) and D (−α) are mutually dual.For α = 0 , the connection D (0) , which is clearly self-dual.corresponds to the Levi–Civita connection . ϕ-FamiliesA parametric ϕ-family Fp = {pθ : θ ∈ Θ} centered atp = ϕ(c) is defined bynθi ui (t) − ψ(θ)u0 (t) ,pθ (t) := ϕ c(t) +i=1where ψ : Θ → [0, ∞) is a normalizing function.The functions satisfy some conditions, which imply ψ ≥ 0.The domain Θ can be chosen to be maximal.If ϕ(·) = exp(·) and u0 = 1, then Fp corresponds to anexponential family. ϕ-FamiliesThe normalizing function and ϕ-divergence are related byψ(θ) = Dϕ (p pθ ).The matrix (gij ) is the Hessian of the normalizing function ψ:gij =∂2ψ.∂θi ∂θjAs a result,(0)Γijk =1 ∂gij1 ∂2ψ=.2 ∂θk2 ∂θi ∂θj ∂θj ϕ-Families(1)In ϕ-families, the Christoffel symbols Γijk vanish identically,i.e., (θi ) is an affine coordinate system, and the connectionD (1) is flat (and D (−1) is also flat).Thus Fp admits a coordinate system (ηj ) that is dual to (θi ),and there exist potential functions ψ and ψ ∗ such thatθi =∂ψ ∗,∂ηiηj =∂ψ,∂θjandψ(p) + ψ ∗ (p) =θi (p)ηi (p).i Discussion(1)(−1)Advantages of (gij ), and Γijk , ΓijkDϕ (· ·):being derived fromDuality.Pythagorean Relation.Projection Theorem.Open questions:An example of generalized statistical manifold whosecoordinate system is D (−1) -flat.Parallel transport with respect to D (−1) .Divergence or ϕ-function associated with α-connections. EndThank you!

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Curvature properties for statistical structures are studied. The study deals with the curvature tensor of statistical connections and their duals as well as the Ricci tensor of the connections, Laplacians and the curvature operator. Two concepts of sectional curvature are introduced. The meaning of the notions is illustrated by presenting few exemplary theorems.
 
no preview

Curvatures of statistical structuresBarbara OpozdaParis, October 2015Barbara Opozda ()Curvatures of statistical structuresParis, October 20151 / 29 Statistical structures - statistical settingM - open subset of RnΛ - probability space with a fixed σ-algebrap : M × Λ (x, λ) → p(x, λ) ∈ R - smooth relative to x such thatpx (λ) := p(x, λ) is a probability measure on Λ — probability distribution(x, λ) := log (p(x, λ))gij (x) := Ex [(∂i )(∂j )], where Ex is the expectation relative to theprobability px ∀x ∈ M, ∂1 , ..., ∂n - the canonical frame on Mg – Fisher information metric tensor field on MCijk (x) = Ex [(∂i )(∂j )(∂k )] - cubic form(g , C ) – statistical structure on MBarbara Opozda ()Curvatures of statistical structuresParis, October 20152 / 29 Statistical structures (Codazzi structures)– geometric setting; threeequivalent definitionsM – manifold, dim M = nI) (g , C ), C - totally symmetric (0, 3)-tensor field on M, that is,C (X , Y , Z ) = C (Y , X , Z ) = C (Y , Z , X )∀X , Y , Z ∈ Tx M, x ∈ MC – cubic formII) (g , K ), K – symmetric (1, 2)-tensor field (i.e., K (X , Y ) = K (Y , X ))and symmetric relative to g , that is,g (X , K (Y , Z )) = g (Y , K (X , Z ))is symmetric for all arguments.C (X , Y , Z ) = g (X , K (Y , Z ))Barbara Opozda ()Curvatures of statistical structuresParis, October 20153 / 29 III) (g , ),- torsion-free connection such that(X g )(Y , Z )Y g )(X , Z )=((1)— statistical connectionT – any tensor field of type (p, q) on M,T (X , Y1 , ..., Yq ) = (T – of type (p, q + 1)X T )(Y1 , ..., Yq )In particular, g (X , Y , Z ) = ( X g )(Y , Z )(1) ⇔ g is a symmetric cubic formˆ - Levi-Civita connection for gK (X , Y ) :=XY− ˆXYK – difference tensorg (X , Y , Z ) = −2g (X , K (Y , Z )) = −2C (X , Y , Z )Barbara Opozda ()Curvatures of statistical structuresParis, October 20154 / 29 A statistical structure is trivial if and only if K = 0 or equivalently C = 0or equivalently = ˆ .KX Y := K (X , Y )E := tr g K = K (e1 , e1 ) + ... + K (en , en ) = (tr Ke1 )e1 + ... + (tr Ken )enE – mean difference vector fieldE =0⇔tr KX = 0∀X ∈ TM⇔tr g C (X , ·, ·) = 0∀X ∈ TME = 0 ⇒ trace-free statistical structureFact. (g , ) – trace-free if and only ifdetermined by gBarbara Opozda ()νg = 0, where νg – volume formCurvatures of statistical structuresParis, October 20155 / 29 ExamplesRiemannian geometry of the second fundamental formM – locally strongly hypersurface in Rn+1– the second fundamental form h satisfies the Codazzi equationh(X , Y , Z ) =h(Y , X , Z ),whereis the induced connection (the Levi-Civita connection of the firstfundamental form)(h, ) - statistical structureSimilarly one gets statistical structures on hypersurfaces in space forms.Barbara Opozda ()Curvatures of statistical structuresParis, October 20156 / 29 Equiaffine geometry of hypersurfaces in the standard affine spaceRn+1M – locally strongly convex hypersurface in Rn+1ξ – a transversal vector fieldD – standard flat connection on Rn+1 , X , Y ∈ X (M), ξ - transversalvector fieldDX Y =XY+ h(X , Y )ξ− Gauss formula– induced connection, h – second fundamental form (metric tensor field)DX ξ = −SX + τ (X )ξ− Weingarten formulaIf τ = 0, ξ is called equiaffine. In this case the Codazzi equation is satisfiedh(X , Y , Z ) =h(Y , X , Z )(h, ) – statistical structureBarbara Opozda ()Curvatures of statistical structuresParis, October 20157 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20158 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 20159 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29 Barbara Opozda ()Curvatures of statistical structuresParis, October 201510 / 29 Geometry of Lagrangian submanifolds in Kaehler manifoldsN – Kaehler manifold of real dimension 2n and with complex structure JM – Lagrangian submanifold of N - n-dimensional submanifold such thatJTM orthogonal to TM, i.e. JTM is the normal bundle (in the metricsense) for M ⊂ ND – the Kaehler connection on NDX Y =XY+ JK (X , Y )g – induced metric tensor field on M(g , K ) – statistical structureIt is trace-free ⇔ M is minimal in N.Barbara Opozda ()Curvatures of statistical structuresParis, October 201511 / 29 Most of statistical structures are outside the three classes of examples. Forinstance, in order that a statistical structure is locally realizable on anequiaffine hypersurface it is necessary that is projectively flat.Barbara Opozda ()Curvatures of statistical structuresParis, October 201512 / 29 Dual connections, curvature tensorsg – metric tensor field on M,– any connectionXg (Y , Z ) = g (XY , Z)+ g (Y ,XZ)(2)– dual connection(g , ) – statistical structure if and only if (g , ) – statistical structureR(X , Y )Z – (1, 3) - curvature tensor forIf R = 0 the structure is called HessianR(X , Y )Z – curvature tensor forg (R(X , Y )Z , W ) = −g (R(X , Y )W , Z )(3)In particular, R = 0 ⇔ R = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201513 / 29 ˆ – Levi-Civita connection for g ,= ˆ + K, = ˆ − KˆR – curvature tensor for ˆˆR(X , Y ) = R(X , Y ) +( ˆ X K )Y − ( ˆ Y K )X+ [KX , KY ](4)+ [KX , KY ](5),where[KX , KY ] = KX KY − KY KXˆR(X , Y ) = R(X , Y ) −( ˆ X K )Y + ( ˆ Y K )XˆR(X , Y ) + R(X , Y ) = 2R(X , Y ) + 2[KX , KY ]Barbara Opozda ()Curvatures of statistical structuresParis, October 2015(6)14 / 29 Sectional curvaturesR does not have to be skew-symmetric relative to g , i.e.g (R(X , Y )Z , W ) = −g (R(X , Y )W , Z ), in general.Lemma *The following conditions are equivalent:1) g (R(X , Y )Z , W ) = −g (R(X , Y )W , Z ) ∀X , Y , Z , W2) R = R3) ˆ K is symmetric, that is,( ˆ K )(X , Y , Z ) = ( ˆ X K )(Y , Z ) = ( ˆ Y K )(X , Z ) = ( ˆ K )(Y , X , Z )∀X , Y , Z .For hypersurfaces in Rn+1 each of the above conditions describes an affinesphereBarbara Opozda ()Curvatures of statistical structuresParis, October 201515 / 29 R :=R+R2[K , K ](X , Y )Z := [KX , KY ]ZR(X , Y )Z and [K , K ](X , Y )Z are Riemann-curvature-like tensors – theyare skew-symmetric in X , Y , satisfy the first Bianchi identity,R(X , Y ), [K , K ](X , Y ) are skew-symmetric relative to g ∀X , Yπ – vector plane in Tx M, X , Y – orthonormal basis of πˆˆsectional curvature for g – k(π) := g (R(X , Y )Y , X )sectional K -curvature – k(π) := g ([K , K ](X , Y )Y , X )sectional-curvature – k (π) := g (R(X , Y )Y , X )Barbara Opozda ()Curvatures of statistical structuresParis, October 201516 / 29 In general, Schur’s lemma does not hold for kand k. We have, however,LemmaAssume that M is connected, dim M > 2 and the sectional - curvature(the sectional K -curvature) is point-wise constant. If one of the equivalentconditions in Lemma * holds then the sectional -curvature (the sectionalK -curvature) is constant on M.sectional K -curvatureThe easiest situation which should be taken into account is when thesectional K -curvature is constant for all vector planes in Tx M. In thisrespect we haveBarbara Opozda ()Curvatures of statistical structuresParis, October 201517 / 29 TheoremIf the sectional K -curvature is constant and equal to A for all vectorplanes in Tx M then there is an orthonormal basis e1 , ..., en of Tx M andnumbers λ1 , ..., λn , µ1 , ..., µn−1 such thatµ1...K e1 =  Ke = µ1iµ1λ1µ1...µi−1λi· · · µi−1µi...µiK en=µ1 · · · µn−1Barbara Opozda ()µ1. . . µn−1 λnCurvatures of statistical structuresParis, October 201518 / 29 continuation of the theoremMoreoverµi =λi −λ2 − 4Ai−1i2,Ai = Ai−1 − µ2 ,ifor i = 1, ..., n − 1 where A0 = A. The above representation of K is notunique, in general. If additionally tr g K = 0 then A 0, λn = 0 and λi , µifor i = 1, ..., n − 1 are expressed as followsλi = (n − i)−Ai−1,n−i +1µi = −−Ai−1.n−i +1In particular, in the last case the numbers λi , µi depend only on A and thedimension of M.Barbara Opozda ()Curvatures of statistical structuresParis, October 201519 / 29 Example 1.K e1= Ke = λ/2 · · · 0iλ/2λλ/2...λ/2...000...0K en=λ/2. . . λ/2 · · · 00 0The sectional K -curvature is constant = λ2 /4Barbara Opozda ()Curvatures of statistical structuresParis, October 201520 / 29 Example 2.K -curvature vanishes, i.e. [K , K ] = 0. There is an orthonormal framee1 , ..., e1 such that0...K e1 =  Ke = 0i0λ10...0· · · 0 λi0...0K enBarbara Opozda ()=0...00 · · · 0 λnCurvatures of statistical structuresParis, October 201521 / 29 Some theorems on the sectional K -curvature(g , K ) – trace-free if E = tr g K = 0TheoremLet (g , K ) be a trace-free statistical structure on M with symmetric ˆ K .If the sectional K -curvature is constant then either K = 0 (the statisticalˆstructure is trivial) or R = 0 and ˆ K = 0.TheoremˆLet ˆ K = 0. Each of the following conditions implies that R = 0:1) the sectional K -curvature is negative,2) [K,K]=0 and K is non-degenerate, i.e. X → KX is a monomorphism.Barbara Opozda ()Curvatures of statistical structuresParis, October 201522 / 29 TheoremK is as in Example 1. at each point of M, ˆ K is symmetric, div E isconstant on M (E = tr g K ). Then the sectional curvature for g by anyplane containing E is non-positive. Moreover, if M is connected it isconstant. If ˆ E = 0 then ˆ K = 0 and the sectional curvature (of g ) byany plane containing E vanishes.TheoremIf the sectional K -curvature is non-positive on M and [K , K ] · K = 0 thenthe sectional K -curvature vanishes on M.CorollaryIf (g , K ) is a Hessian structure on M with non-negative sectional curvatureˆˆof g and such that R · K = 0 then R = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201523 / 29 TheoremˆˆThe sectional K -curvature is negative on M, R · K = 0. Then R = 0.TheoremLet M be a Lagrangian submanifold of N, where N is a Kaehler manifoldof constant holomorphic curvature 4c, the sectional curvature of the firstˆfundamental form g on M is smaller than c on M and R · K = 0, where Kˆ = 0.is the second fundamental tensor of M ⊂ N. Then RBarbara Opozda ()Curvatures of statistical structuresParis, October 201524 / 29 -sectional curvatureAll affine spheres are statistical manifolds of constant sectional-curvatureA Riemann curvature-like tensor defines the curvature operator. Forinstance, for the curvature tensor R = (R + R)/2 we have the curvatureoperator R : Λ2 TM → Λ2 TM given byg (R(X ∧ Y ), Z ∧ W ) = g (R(Z , W )Y , X )A curvature operator is symmetric relative to the canonical extension of gto the bundle Λ2 TM. Hence it is diagonalizable. In particular, it can bepositive definite, negative definite etc.The assumption that R is positive definite is stronger than the assumptionthat the sectional -curvature is positive.Barbara Opozda ()Curvatures of statistical structuresParis, October 201525 / 29 TheoremLet M be a connected compact oriented manifold and (g , ) be atrace-free statistical structure on M. If R = R and the curvature operatorˆdetermined by the curvature tensor R is positive definite on M then thesectional -curvature is constant.TheoremLet M be a connected compact oriented manifold and (g , ) be atrace-free statistical structure on M. If the curvature operator forR = R+R is positive on M then the Betti numbers2b1 (M) = ... = bn−1 (M) = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201526 / 29 sectional curvature for gˆˆk(π) = g (R(X , Y )Y , X ), X , Y – an orthonormal basis for πTheoremLet M be a compact manifold equipped with a trace-free statisticalˆstructure (g , ) such that R = R. If the sectional curvature k for g ispositive on M then the structure is trivial, that is = ˆ .In the 2-dimensional case we haveTheoremLet M be a compact surface equipped with a trace-free statistical structure(g , ). If M is of genus 0 and R = R then the structure is trivial.Barbara Opozda ()Curvatures of statistical structuresParis, October 201527 / 29 B. Opozda, Bochner’s technique for statistical manifolds, Annals ofGlobal Analysis and Geometry, DOI 10.1007/s10455-015-9475-zB. Opozda, A sectional curvature for statistical structures,arXiv:1504.01279[math.DG]Barbara Opozda ()Curvatures of statistical structuresParis, October 201528 / 29 Hessian structuresˆ(g , ) – Hessian if R = 0. Then R = 0 and R = −[K , K ].ˆˆ K is symmetric and R = −[K , K ].(g , ) is Hessian if and only ifAll Hessian structure are locally realizable on affine hypersurfaces in Rn+1equipped with Calabi’s structure. If they are trace-free they are locallyrealizable on improper affine spheres.If the difference tensor is as in Example 1. and the structure is Hessianthen K = 0.Barbara Opozda ()Curvatures of statistical structuresParis, October 201529 / 29

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We show that Hessian manifolds of dimensions 4 and above must have vanishing Pontryagin forms. This gives a topological obstruction to the existence of Hessian metrics. We find an additional explicit curvature identity for Hessian 4-manifolds. By contrast, we show that all analytic Riemannian 2-manifolds are Hessian.
 
no preview

The Pontryagin Forms of Hessian ManifoldsJ. ArmstrongS.AmariOctober 27, 2015 SummaryQuestionGiven a Riemannian metric g , under what circumstances is itlocally a Hessian metric?QuestionWhen can we locally find a function f and coordinates x such thatgij = ∂i ∂j f ?Answer (Partial)In dimension 2 all analytic metrics g are Hessian. In dimensions 3the general metric is not Hessian. In dimensions 4 there are evenrestrictions on the curvature tensor of g — in particular thePontrjagin forms vanish. Solving unusual partial differential equationsQuestionGiven a symmetric g , when can we locally find a function f andcoordinates x such that gij = (∂i f )(∂j f )?AnswerOnly if g lies in the n dimensional subspace Im φ ⊂ S 2 T whereφ : T → S 2Tby φ(x) = xx.Sometimes we can’t find a solution even at a point.QuestionGiven a one form η, when can we locally find a function f suchthat df = η.AnswerSince ddf = 0 we must have dη = 0 at x. Sometimes we can finda solution at a point, but can’t extend it even to first order aroundx. GeneralizingLet E and F be vector bundles and let D : Γ(E ) → Γ(F ) be adifferential operator.D : Jk (E ) → F where Jk is the bundle of k jets.Define D1 : Jk+1 (E ) → J1 (F ) to be the first prolongation.This is the operator which maps a section e to the one jet ofj1 (De).Define Di : Jk+i (E ) → Ji (F ) to be the i-th prolongatione → ji (e)We can only hope to solve the differential equation De = f if wecan find an algebraic solution to every equationDi e = ji (f )at the point x.Applying the fact that derivatives commute may yield obstructionsto the existence of solutions to a differential equation even locally. Dimension countingThe dimension of the space of k-jets of 1 functions of n realvariables is:k+2kdim(S i T ) =dim Jk :=i=0i=0n+i −1.iThe reason for this is that derivatives commute. Note thisfact is also encoded in the statement ddf = 0. The counting argumentWe wish to solve∂ ∂f = gij .∂xi ∂xjwhich is a second order equation for f and coords x. So inputis n + 1 functions of n variables.Dimension of space of (k + 2) jets of f and xk+21dk = dim Jk+2 (x, f ) =(n + 1)i=0n+i −1.iDimension of space of k jets of g :k2dk = dim Jk (g ) =i=0n(n + 1) n + i − 1.2i12If n > 2 dk grows more slowly than dk . So most metrics arenot Hessian metrics. Informal versionA Riemannian metric depends onvariables.n(n+1)2functions of nA Hessian metric depends on n + 1 functions of n variables.“Therefore” if n > 2 there are more Riemannian metrics thanHessian metrics.Note: this computation is suggestive but slightly wrongbecause we’ve ignored the diffeomorphism group. It wouldsuggest that in dimension 1 there are more Hessian metricsthan Riemannian metrics! CurvatureReminder:Hessian metrics locally correspond to g -dually flat structures,and vice versa.∗is flat.g -dually flat means is flat and it’s dual w.r.t. gg(ZX,Y )= g (X ,∗Z Y ).PropositionLet (M, g ) be a Riemannian manifold. Letdenote theLevi–Civita connection and let = + A be a g -dually flatconnection. Then(i) The tensor Aijk lies in S 3 T ∗ . We shall call it the S 3 -tensor of.(ii) The S 3 -tensor determines the Riemann curvature tensor asfollows:Rijkl = −g ab Aika Ajlb + g ab Aila Ajkb . Proofis torsion free implies A ∈ S 2 T ∗ ⊗ TUsing metric to identify T ∗ andT , bothfree implies A ∈ S 3 T ∗R = 0. But by definition:R XY Z =XYZ−YXand−∗are torsion[X ,Y ] ZExpanding in terms of Levi–Civita:R XY Z = RXY Z + 2([X A)Y ] Z+ 2A[X AY ] ZCurvature symmetries tell us (using g to identify T and T ∗ ):R ∈ Λ2 T ⊗ Λ2 TOn the other hand:([· A)·]∈ Λ2 T ⊗ S 2 TProjecting the equation onto Λ2 T ⊗ Λ2 T gives the desiredresult. Curvature obstructionDefine a quadratic equivariant map ρ fromS 3 T ∗ −→ Λ2 T ∗ ⊗ Λ2 T ∗ by:ρ(Aijk ) = −g ab Aika Ajlb + g ab Aila AjkbIf g is a Hessian metric R lies in image of ρ.CorollaryIn dimension 5, ρ is not onto. Therefore there conditionR ∈ Im ρ is an obstruction to a metric being a Hessian metric.Proof.dim R = dim(Space of algebraic curvature tensors) =1dim(S 3 T ) = n(1 + n)(2 + n)6The former is strictly greater than the latter if n51 2 2n (n − 1)12 Dimension 4Numerical observation: ρ is not onto in dimension 4 even thoughdim R = dim(S 3 T ∗ ) = 20.Proof.Pick a random A ∈ S 3 T ∗ and compute rank of (ρ∗)A , thedifferential of ρ at A. It is 18 whereas the space of algebraiccurvature tensors is 20 dimensional. (Proof with probability 1) QuestionWhat are the conditions on the curvature tensor for it to lie in theimage of ρ?What does this question mean?This is an implicitization question. Im ρ is given parametricallyby the map ρ. We want implicit equations on the curvaturetensor that define Im ρ.This is a real algebraic geometry question and so we shouldexpect inequalities for our implicit equations. (e.g.Im x 2 = {y : y 0})Complexify the vector spaces to get a complex algebraicgeometry where we expect equalities for our implicitequations. This is how we choose to interpret the question.Gr¨bner basis algorithms allow us to solve the latter problemoin principle (for fixed n) but not in practice (doublyexponential time is common).Algorithms do exist for the real algebraic geometry problemtoo, but they’re even less practical. StrategySpace of algebraic curvature tensors R is associated to arepresentation of SO(n).Decompose R into irreducible components under SO(n)Any invariant linear condition on R can be expressed as alinear combination of these irreducibles.Decompose S 2 R ⊕ R into irreducibles. Any invariantquadratic condition on R can be expressed as a linearcombination of these irreducibles. etc.If we have m irreducible components ρ1 (R), ρ2 (R), . . . ,ρm (R). Choose m + 1 random tensors A and solve theequationαi ρi (R) = 0ifor αi . (In fact we only need to check linear combinationsover isomorphic components)This is feasible in dimension 4. Representation theory ofSU(2) × SU(2) is simple. is simple Hessian curvature tensors in dimension 4TheoremThe space of possible curvature tensors for a Hessian 4-manifold is18 dimensional. In particular the curvature tensor must satisfy theidentities:α(Rija b Rklb a ) = 0α(Riajb Rk bcd Rl dac − 2Riajb Rkc ad Rl dbc ) = 0where α denotes antisymmetrization of the i, j, k and l indices.Proof.Using a symbolic algebra package, write the general tensor inS 3 T ∗ with respect to an orthonormal basis in terms of its 20components. Compute the curvature tensor using ρ. One can thendirectly check the above identities.Both expressions define 4-forms on a general Riemannianmanifold. The first is a well-known 4-form. It defines the firstPontrjagin class of the manifold. Pontrjagin formsThe Gauss–Bonnet formula gives an important link betweencurvature and topology. In this case the integral of scalarcurvature is related to the Euler class.The theory of characteristic classes generalizes this.To a complex vector bundle V over a manifold M one canassociate topological invariants, the Chern classesci (V ) ∈ H 2i (M).The Pontrjagin classes of a real vector bundle V R are definedto be the Chern classes of the complexificationpi (V R ) ∈ H 4i (M).The Pontrjagin classes of a manifold are defined to be thePontrjagin classes of its tangent bundle.It is possible to find explicit representatives for the De Rhamcohomology classes of a bundle by computing appropriatepolynomial expressions if a curvature tensor for the bundle.We call these explicit representatives Pontrjagin forms. Relationship between Pontrjagin forms and curvatureTheoremFor each p, the form Qp (R) defined by:Qip i2 ...i2p =1sgn(σ)Riσ(1) iσ(2) a1 a2 Riσ(3) iσ(4) a2 a3 Riσ(5) iσ(6) a3 a4 . . . Riσ(2p−1) iσ(2p) ap a1σ∈S2pis closed. The Pontrjagin forms can all be written as algebraicexpressions in these Qp (R) using the ring structure of Λ∗ andvice-versa.This is a standard result from the theory of characteristic classes. Main resultTheoremThe forms Qp (R) vanish on Hessian manifolds, hence thePontrjagin forms vanish on Hessian manifolds.CorollaryIf a manifold M admits a metric that is everywhere locally Hessianthen its Pontrjagin classes all vanish.Note that we’re being clear to distinguish this from the case of amanifold which is globally dually flat, where the vanishing of thePontrjagin classes is a trivially corollary of the existence of flatconnections. Graphical notationρ(Aijk ) = −g ab Aika Ajlb + g ab Aila AjkbijRijkl = −ij+k.lklTrivalent graphEach vertex represents the tensor AConnecting vertices represents contraction with the metricPicture naturally incorporates symmetries of Aiσ(1)iσ(2)− sgn(σ)Ri1 i2 ab =σ∈S2.ab Proofiσ(1)iσ(2)− sgn(σ)Ri1 i2 ab =σ∈S2.abBy definition:Qip i2 ...i2p =1sgn(σ)Riσ(1) iσ(2) a1 a2 Riσ(3) iσ(4) a2 a3 Riσ(5) iσ(6) a3 a4 . . . Riσ(2p−1) iσ(2p) ap a1σ∈S2pWe can replace each R with an H:Qip i2 ...i2p =1(−1)psgn(σ)iσ(1)iσ(2)iσ(3)iσ(4)iσ(5)iσ(6)iσ(2p−1) iσ(2p)...σ∈S2pSince the cycle 1 → 2 → 3 . . . → 2p → 1 is an odd permutation,one sees that Q p = 0. SummaryIn dimension 2 all metrics are locally Hessian (UseCartan–K¨hler theory. Proved independently by RobertaBryant)In dimensions3 not all metrics are locally HessianIn dimensions4 there are conditions on the curvatureIn dimension 4 we have identified two conditions explicitly.These are necessary conditions and, working over the complexnumbers, they characterize Im ρ.In dimension n 4 we have identified a number of explicitcurvature conditions in terms of the Pontrjagin forms.Dimension counting tells us that other curvature conditionsexist, but we do not know them explicitly.

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Based on the theory of compact normal left-symmetric algebra (clan), we realize every homogeneous cone as a set of positive definite real symmetric matrices, where homogeneous Hessian metrics as well as a transitive group action on the cone are described efficiently.
 
no preview

Matrix realization of a homogeneous coneHideyuki ISHI(Nagoya University)1 §1. Introduction§2. Matrix realization and left-symmetric algebra§3. Homogeneous Hessian metrics2 V : real vector spaceΩ : regular open convex cone in V , that is,• V ⊃ Ω : open subset,• x ∈ Ω, c > 0 ⇒ cx ∈ Ω,• x, y ∈ Ω, 0 ≤ t ≤ 1 ⇒ (1 − t)x + ty ∈ Ω,• Ω ∩ (−Ω) = {0}.Ω : homogeneous coneif ∃G: Lie group acting on Ω transitively as linear transforms3 Example 1.V = Sym(n, R)Ω = Pn := { X ∈ Sym(n, R) | X is positive definite }G = GL(n, R) acts on Pn transitively byρ(A)X := AX tA (A ∈ GL(n, R), X ∈ Pn).{Hn := T ∈ GL(n, R) | Tij = 0 (i < j), Tii > 0 (i = 1, . . . , n)Hn acts on Pn simply transitively by ρbecause of the Cholesky decomposition:∀X ∈ Pn ∃1T ∈ Hn s.t. X = T tT .4} Example 2.V := { X ∈ Sym(3, R) | X12 = X21 = 0}x1 0 x4= X =  0 x2 x5 | x1, . . . , x5 ∈ Rx4 x5 x3Ω := V ∩ P3t1 0 0H := T =  0 t2 0  | t1, t2, t3 > 0, t4, t5 ∈ R ⊂ H3t4 t5 t3Then H acts on Ω simply transitively by ρ.5 Example 3.n ≥ 3V := X = x1...x3. . . | x , . . . , xn ∈ Rxn 1x2}> 0, x1x2 − x2 − · · · − x2 > 0n3x1x3 {. . xn.Ω := Z ∩ Pn−1 = X | x1t1...H := T =  | t1, t2 > 0, t3, . . . , tn ∈ Rt1t3 . . . t n t2H acts on Ω simply transitively.6 The cone Ω ⊂ Pn−1 is linearly isomorphic to the circular{}coney ∈ Rn | y1 >y1 − y2√22y2 + · · · + ynin Rny3......because ∈Ωy1 − y2yn y3...yny1 + y2√22iff y1 > y2 + · · · + yn .Roughly speaking, every homogeneous cone is realizedsimilarly.7 §2. Matrix realization and left-symmetric algebra{Put hn := T ∈ Mat(n, R) | Tij = 0 (i < j)For X ∈ Sym(n, R), define X ∈ hn by}= Lie(Hn).∨Xij(X )ij := Xii/2∨(i > j)(i = j)0(i < j)Then X = X + t(X ).∨∨For X, Y ∈ Sym(n, R), defineX△Y := X Y + Y t(X ) ∈ Sym(N, R).∨∨Then △ gives a bilinear product on the vector spaceSym(n, R), encoding the action ρ of Hn on Pn.8 Main Theorem. (i) Let Z be a subspace of Sym(n, R)such that Z△Z ⊂ Z and En ∈ Z. Then PZ { Z ∩ Pn is:=}a homogeneous cone. The set HZ := Hn ∩ X | X ∈ Z∨forms a subgroup of Hn and acts simply transitively onPZ .(ii) Every homogeneous cone is linearly isomorphic tosuch PZ .Examples 2 and 3 are special cases. x1 0 x4(Recall Z =  0 x2 x5 | x1, . . . , x5 ∈ R xx5 x34in Example 2).9 The algebra (Sym(n, R), △) has the following properties:(C1) X△(Y △Z)−(X△Y )△Z = Y △(X△Z)−(Y △X)△Zfor all X, Y, Z(left-symmetry )(C2) there exists a linear form ξ such that ξ(X△X) > 0for all non-zero X (compactness)(C3) For each X, the left-multiplication operator LX :Y → X△Y has only real eigenvalues (normality )(C4) En△X = X△En = X for all X (∃unit element).An R-algebra (V, △) satisfying (C1) is called a leftsymmetric algebra (or Koszul-Vinberg algebra), whilea left-symmetric algebra satisfying (C2) and (C3) iscalled a clan (a compact normal left-symmetric algebra).10 Vinberg obtained a one-to-one correspondence betweena homogeneous cone and a clan with unit element upto natural isomorphisms.Theorem. Every clan with a unit element is isomorphic to a subalgebra of (Sym, △).Theorem. A subalgebra Z of (Sym, △) with En ∈ Z admits a specific block decomposition after an appropriatepermutation of rows and columns (see Proceedings).11 §3. Homogeneous Hessian metricsFor X ∈ Sym(n, R) and k = 1, . . . , n,let X [k] := (Xij )1≤i,j≤k ∈ Sym(k, R).For X ∈ Pn and s = (s1, . . . , sn) ∈ Rn , define>0∏n[k] )sk −sk+1 , where s∆s(X) := k=1(det Xn+1 := 0.∏If X = T tT with T ∈ Hn, then ∆s(X) = n (Tkk )2skk=1Therefore,∏∆s(ρ(T )Y ) = ( n (Tkk )2sk )∆s(Y ) (Y ∈ Pn).k=1Let gs be the Hessian metric on Pn whose potentialis − log ∆s(Y ). Then gs is Hn-invariant.For X ∈ Pn and A, B ∈ TX Pn ≡ Sym(n, R), we have()∑ngs(A, B)X := k=1(sk −sk+1)Tr A[k](X [k])−1B [k](X [k])−1 .12 A Hessian metric g on a domain D ⊂ Rn is said to behomogeneous if ∃G : Lie group acting on D transitivelyas affine isometries.Clearly the Hessian metric gs on Pn is homogeneous.Moreover, Every homogeneous Hessian metric on Pnis equivalent to some gs. Namely, for a homogeneousHessian metric g on Pn, there exists a linear transformf : Pn → Pn such that g = f ∗gs.13 Theorem. Let Z be a subalgebra of (Sym(n, R), △) withEn ∈ Z. Then every homogeneous Hessian metric onthe homogeneous cone PZ is equivalent to the restriction gs|PZ with some s ∈ Rn .>0This parametrization is redundant because different smay give the same metric on PZ .See Proceedings for a precise parametrization.14

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this article, we derive an inequality satisfied by the squared norm of the imbedding curvature tensor of Multiply CR-warped product statistical submanifolds N of holomorphic statistical space forms M. Furthermore, we prove that under certain geometric conditions, N and M become Einstein.
 
no preview

Multiply CR- Warped Product StatisticalSubmanifolds of a HolomorphicStatistical Space FromByPROF. MOHAMMAD HASAN SHAHIDDepartment of MathematicsFaculty of Natural SciencesJamia Millia Islamia(Central University)New Delhi, India(with Prof. Michal Boyom and M. Jamali) CR-submanifold and CR-warped ProductSubmanifoldLet B and F be two Riemannian Manifolds with Riemannian metric gBand g F , respectively , and f a positive differentiable function on B. Thewarped product manifold B ´ F equipped with the Riemannian metricg = gB + f 2gFThe function f is called the warping function. It is well known that thenotion of warped product plays some important roles id differentialgeometry as well as in physics. Let M be a Kaehler manifold with complex structure J and N aRiemannian manifold isometrically immersed in M . For eachxÎ N , we denote by D x the maximal holomorphic subspace ofthe tangent space T x N of N. If the dimension of D x is the samefor all xÎN , the space D x define a holomorphic distribution Don N, which is called the holomorphic distribution of N. Asubmanifold N is a Kaehler manifold M is called aCR-submanifold if there exists a holomorphic distribution D on Nwhose orthogonal complement D^ is totally real distribution,^^i.e., JD Ì T N . A CR-submanifold is called a totally realsubmanifold if dim D x =0. Statistical manifolds introduced, in 1985, by Amari have beenstudied in term of information geometry. Since the geometry ofsuch manifolds includes the notion of dual connections, alsocalled conjugate connection in affine geometry, it is closelyrelated to the affine differential geometry. Further, a statisticalstructure being a generalization of a Hessian geometry.Let ( M , g ) be Riemannian manifold and M a submanifold of M .If (M,Ñ, g) is a statistical manifold, then we call (M,Ñ, g) astatistical submanifold of ( M , g ) , where Ñ is an affineconnection on M and g is the metric tensor on M induced fromthe Riemannian metric g on M . Let Ñ be an affine connectionon M . If ( M , g , Ñ ) is astatistical manifold and M a submanifold ofM , then (M,Ñ, g) is also a statistical manifold by inducedconnection Ñ and metric g. In the case ( M , g ) is a semi-Riemannian manifold, theinduced metric connection g has to be non-degenerated.In the geometry of submanifolds, Gauss formula, Weingartenformula and the equation of Gauss, Codazzi and Ricci areknow as fundamental equations. Corresponding fundamentalequations on statistical submanifolds were obtained .Let M be an n-dimensional submanifold of M . Then, for anyX , Y Î G(TM ) , Gauss formula isÑ X Y = Ñ X Y + h( X , Y )*XÑ Y = Ñ* Y + h* ( X , Y )X Where h and h * are symmetric and bilinear, called theimbedding curvature tensor of M in M for Ñ and the*imbedding curvature tensor of M in M for Ñ , respectively.it is also proved that (Ñ, g ) and (Ñ* , g) are dual statisticalstructure on M, where g is induced metric on G(TM) from theRiemannian metric g on M .Let us denote the normal bundle on M by G(TM^ ) . Since hand h * are bilinear, we have the linear transformation A xand A x* defined byg ( Ax X , Y ) = g ( h ( X , Y ), x )g ( Ax* X , Y ) = g ( h * ( X , Y ), x ) Definition. Let N1, N2 ,....,Nk be Riemannian manifold of thedimensions n1, n2 ,....,nk respectively and let N = N1, N2 ,....,Nk bethe Cartesian product of N1, N2 ,....,Nk . For each a, denote bypa : N®Na the canonical projection N and Na . We denote theshorizontal lift of Na in N via p a by Na itself. If s2,....., k : N1 ®R+ arepositive valued functions, thenk(2.1)g( X ,Y ) = p1* X ,p1*Y + å(s a o p1)2 p a* X ,p a*Ya=1define a metric g on N. The product manifold N endowed withthis metric is denoted by N1 ´s 2 N2 ´.....´sk Nk . This product manifoldN is known as multiply warped product manifold. Definition. If N1, N2 ,....,Nk be k statistical manifolds, thenN= N1 ´s 2 N2 ´.....´sk Nk is again a statistical manifold with metricgiven by equation (2.1). This manifold N is called multiplywarped product statistical manifold.Now let us denote the part s2 N2 ´...´sk Nk by N^ and N1 by NT .Then N can be represented as N = NT ´N^. We denote byX,Y....ÎG(M) as the vector field on M and X, Y…. the inducedvector field on N.Definition. A multiply warped product statistical submanifoldN = NT ´N^ in an almost complex manifold M is called a multiplyCR-warped product statistical submanifold if NT is an invariantsubmanifold and N^ is an anti-invariant submanifold of M. We denote by, m ≥ 1 the Euclidean 2m space with thestandard metric. Then the canonical complex structure ofis defined byJ (x1, y1,..., xm , ym ) = (- y1, x1,...,- ym , xm )Example. Consider inthe submanifold is given by theequations [B. Sahin, Geom. Dedicata 2006](*) From (*) one can obtain that TM is spanned bywhere,Using (*) one gets thatis invariant with respectto J. Moreover,are orthogonal to TM. Hence,is anti-invariant with respect to J. Thus M is aCR-submanifold of . Furthermore, we can derive thatand are integralable. Denoting the integral manifold of D andbyrespectively, then the induced metric tensor isThus M is a CR-warped product submanifold ofwarping function.,with • A. Bejancu, CR- submanifold of a Kaehler a manifold I, Proc.Amer. Math. Soc. 69 (1978), 135-142.• A. Bejancu, CR-submanifold of a Kaehler manifold II, Trans.Amer. Math. Soc. 69 (1979), 333-345.• Chen BY (1981) CR-submanifolds in Kaehler manifolds. I. JDiff Geometry 16: 305-322; CR-submanifolds in Kaehlermanifolds. II. Ibid 16: 493-509.• Chen BY (2001) Geometry of warped product CRsubmanifolds in Kaehler manifolds I. Monatsh Math 133:177-195; Geometry of warped product CR-submanifolds inKaehler manifolds. II. Ibid 134: 103-119.• S . Amari, Differential-Geometrical methods in Statistics,Springer-Verlag, 1985.• Yano K. and Kon, M.: CR-submanifolds of Kaehlerian andSasakian Manifolds, Birkhauser, Basel, 1983. From the decomposition of TN = D Å D^ and T ^ N = JD ^ Å lwe may writeh ( X , Y ) = h JD ^ ( X , Y ) + h l ( x , y )Also for multiply CR- warped product statistical submanifolds N of astatistical manifold [L. Tod., Diff. Geom. – Dynamical system ,2006]z =kå ( X (log sa=2a)) z a and Ñ * Z =Xkå ( X (loga=2s a )) Zfor any vector fields X Î D and Z ÎD^ , where Zdenotes the N a- component of Z .aa(4) Lemma 1. Letbe a multiply CRwarped product statistical submanifold of a holomorphicstatistical space form M. then we havek(i) hJD ( JX , Y ) = å ( X (logs a )) JZ a + JPz JX^a =2(ii)(iii)g ( PZ JX , W ) = g (Q Z JX , JW )g (h( JX , Z ), Jh( X , Z )) = hl ( Z , X ) + g (QZ X , Jhl ( X , Z ))2For any vector field X in D and Z , W in Ddenotes the N a - component of Z.^Za, where Proof. From Gauss formulawe can writeÑ Z JX + h( JX , Y ) = PZ X + QZ X + JÑ Z X + Jh( X , Z )kh( JX , Z ) = PZ X + QZ X + J (å ( X (log s a ) Z a ) + Jh ( Z , X )a =2k- å ( JX (log s a )) JZ a(5)a=2where P and Q denotes the tangential and normal projection.Comparing the tangential part in the above equation and thentaking inner product with W Î D ^ , we getkhJD^ (JX, Z) = å( X (logsa ))Z + JP JX, "X ÎD, Z ÎDZa=2a^ Now comparing normal parts of (5) and taking inner productwith JW for WÎD^kg (hJD^ ( JX , Z ), JW ) g (QZ X , JW ) + å ( X (logs a ) g ( JZ a , JW ))a =2Using part(i) of the lemma 1 we arrive atg ( PZ JX ,W ) = g (QZ X , JW ). Comparing normal part of h(JX, Z) - Jh (Z, X) = QZ X +lon both the sides and taking inner product withfindk(X(log a )JZaå sa=2weTheorem 2. Letbe multiplyCR-warped product statistical submanifold of holomorphicstatistical space form M with P ^ DÎD , then the square norm ofDimbedding curvature tensor of N in M satisfies the followinginequalities : Proof. Let { X 1 , X 2 ,...., X p , X p +1 = JX 1 ,..., JX 2 p = JX p } be localorthonormal frame of vector field frame of the vector field on N Tand {Z1, Z2 ,...,Zq}be such that Z D a is a basis for some N a ,a= 2,…..,k whereD 2 = {1, 2,..., n 2 } ,…., D k = {n2 + n3 + .... + nk -1 + 1,..., n1 + n2 + ... + nk }andn 2 + n 3 + ..... + n k = q The above equation impliesNow using part (i) of the Lemma 1 we getIn the view of the above assumption PD D Î D , the above inequalitytakes the form^ By the Cuachy-Schwartz inequality the above equationbecomesTherefore Theorem 3. Letbe a compactorientable multiply CR- warped product statistical submanifoldwithout boundary of holomorphic statistical space form M ofconstant curvature k. If PD D Î D andThen^And the equality holds if and if^Proof. Let XÎD , Z ÎD , then form holomorphic statisticalspace form of constant curvature k, we have Which implies.On the other hand from Codazzi equation, we may write(7)(8) Now, we calculate each term of (8) as(10)Similarly we replace X By JX in the last equation, we get(11) (12)(13)(14) .kR( X , JX , Z , JZ ) = å [{X ( X log s a ) + ( X log s a ) 2 }g ( Z a , Z a )]a=2k*X- g(h(JX, Z ),Ñ JZ) + å[{JX(JX logs a ) + (JX logs a )2}g(Z a , Z a )]a=2*JXk- g(h(JX, Z),Ñ JZ) - å(ÑX X logsa )g(Z a , Z a )a=2k- å{(ÑJX JX logs a ) g (Z a , Z a )}a =2(16) Combining (7) and (16) and taking summation over the rangefrom 1 to p, we havekpk()å Z4 a=2-kåa = 22agrad=kåa=2DD (log s a ) Z(logsa)a22p^+ å [ g (h( Jei , Z ), Ñ ei * ) JZ - g (h(ei , Z ), Ñ ^*i JZ )]Jei =1(18) Integrating both the sides, Green’s and the hypothesis leads tokk=- 4å Za 2a=2ò { gradpå Za =2på Z aSincea=2k2D£0Nkk(log s a ) }dv22a 2ò dvNò dv > 0Nå2 Z ò { grad D (log s a ) }dv ³ 0Anda=N2{ grad D (log s a ) }dv = 0Further the equality holds if and only if òa2NWhich implies that the equality holds ifproves the theorem.. This Theorem 4. Letbe acompact orientable anti-invariant multiply warped productstatistical submanifold without boundary of holomorphicstatistical space form M of constant curvature k. If PD D Î Dand AÑ^* JZJZ = AÑJX*JX X , then^XR( X ,Y , X ,Y ) ³ g (H , H * )and the equality holds if and only ifgradD (logs a ) = 0Proof. From the previous theorem we have. Since Nis anti-invariant , we have N T = 0 and N = N ^ . This implies that N becomes completely totally umbilicalsubmanifold of M. Furthermore, from the expression of theambient curvature we have, for two orthonormal vector X , Y Î TNkThen.R ( X ,Y , X ,Y ) = 4Furthermore, from Gauss equation and totally umbilicity of N, weobtainkR( X ,Y, X ,Y) = (- + g(H, H*))4R( X , Y , X , Y ) ³ g ( H , H * )and the equality holds if Theorem 5. Letbe acompact orientable anti-invariant multiply warped productstatistical submanifold without boundary of holomorphicstatistical space form M of constant curvature k. If P D D Î Dand AÑ^*JZ JX = AÑ^ * JZ X , then M is Einstein and N is Einstein ifXJXand only if^k+ g (H , H4*) is constant.Proof. The proof is straight from the last theorem and theGauss equation which combinely givekRic(Y , Z ) = (n - 1){ + g ( H , H * )}g (Y , Z )4 References:[1]. S. AMARI, “Differential Geometric methods in statistics”, Springer-Verlag,1985.[2]. S. AMARI and H. NAGAOKA, “Methods of Information Geometry”, Transl.Math. Monogr., Vol-191, Amer. Math. Soc., 2000.[3]. M. E. AYDIN, A. MIHAI and I. MIHAI, “Some inequalities on submanifolds instatistical manifolds of constant curvature”, Filomat (To appear).[4]. R.L. BISHOP and B. O’NEILL, “Manifolds of negative curvature”, Trans. ofAmer. Math. Soc., Vol-145(1969), 1-49.[5]. B. Y. CHEN, “Geometry of warped product CR-submanifolds in Kaehlermanifold”, Monatsh. Math., 133(2001), 177-195.[6]. B. Y. CHEN, “Geometry of warped product CR-submanifolds in Kaehlermanifold II”, Monatsh. Math., 134(2001), 103-119.[7]. B. Y. CHEN and FRANKI DILLEN, “Optimal inequalities fir multiply warpedproduct submanifolds”, Int. Elect. J. of Geometry, Vol-1 (2008), No-1, 1-11.[8]. H. FURUHATA, “Hypersurfaces in statistical manifolds”, Diff. Geom. Appl., 27,(2009), 420-429.[9]. L. Todgihounde, “Dualistic structures on warped product manifolds”,Differential Geometry-Dynamical Systems, Vol-8 (2006), 278-284.[10]. P. W. VOS, “Fundamental equations for statistical submanifolds withapplications to the Bartlett connection”, Ann. Inst. Statist. Math., 41(3) (1989),429-450.

Topological forms and Information (chaired by Daniel Bennequin, Pierre Baudot)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this lecture we will present joint work with Ryan Thorngren on thermodynamic semirings and entropy operads, with Nicolas Tedeschi on Birkhoff factorization in thermodynamic semirings, ongoing work with Marcus Bintz on tropicalization of Feynman graph hypersurfaces and Potts model hypersurfaces, and their thermodynamic deformations, and ongoing work by the author on applications of thermodynamic semirings to models of morphology and syntax in Computational Linguistics.
 

Information Algebras and their ApplicationsMatilde MarcolliGeometric Science of Information, Paris, October 2015Matilde MarcolliInformation Algebras Based on:M. Marcolli, R. Thorngren, Thermodynamic semirings, J.Noncommut. Geom. 8 (2014), no. 2, 337–392M. Marcolli, N. Tedeschi, Entropy algebras and Birkhofffactorization, J. Geom. Phys. 97 (2015) 243–265Matilde MarcolliInformation Algebras Min-Plus Algebra (Tropical Semiring)min-plus (or tropical) semiring T = R ∪ {∞}• operations ⊕ andx ⊕ y = min{x, y }xy =x +y• operations ⊕ andwith identity ∞with identity 0satisfy:associativitycommutativityleft/right identitydistributivity of productMatilde Marcolliover sum ⊕Information Algebras Thermodynamic semiringsTβ,S = (R ∪ {∞}, ⊕β,S , )• deformation of the tropical addition ⊕β,Sx ⊕β,S y = min{px + (1 − p)y −p1S(p)}ββ thermodynamic inverse temperature parameterS(p) = S(p, 1 − p) binary information measure, p ∈ [0, 1]• for β → ∞ (zero temperature) recovers unperturbed idempotentaddition ⊕• multiplication= + is undeformed• for S = Shannon entropy considered first in relation toF1 -geometry inA. Connes, C. Consani, From monoids to hyperstructures: insearch of an absolute arithmetic, arXiv:1006.4810Matilde MarcolliInformation Algebras Khinchin axiomsSh(p) = −C (p log p + (1 − p) log(1 − p))• Axiomatic characterization of Shannon entropy S(p) = Sh(p)1symmetry S(p) = S(1 − p)2minima S(0) = S(1) = 03extensivityS(pq) + (1 − pq)S(p(1 − q)/(1 − pq)) = S(p) + pS(q)• correspond to algebraic properties of semiring Tβ,S1commutativity of ⊕β,S2left and right identity for ⊕β,S3associativity of ⊕β,S⇒ Tβ,S commutative, unital, associative iff S(p) = Sh(p)Matilde MarcolliInformation Algebras Khinchin axioms n-ary formGiven S as above, define Sn : ∆n−1 → RSn (p1 , . . . , pn ) =(1 −1 j n−10bypi )S(1 i

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We show that the entropy function–and hence the finite 1-logarithm–behaves a lot like certain derivations. We recall its cohomological interpretation as a 2-cocycle and also deduce 2n-cocycles for any n. Finally, we give some identities for finite multiple polylogarithms together with number theoretic applications.
 

Finite polylogarithms, their multiple analogues andthe Shannon entropyGeometric Sciences of Information 2015Session “Topological Forms and Information”École Polytechnique (France), 28 October 2015Philippe Elbaz-Vincent(Université Grenoble Alpes) & HerbertGangl (Durham University) Content of this talkInformation theory, Entropy and Polylogarithms (review of pastworks),Algebraic interpretation of the entropy function,Cohomological interpretation of formal entropy functions,Finite multiple polylogarithms, applications and open problems.2 / 13 Information theory, Entropy and Polylogarithms (1/4)The Shannon entropy can be characterised in the framework ofinformation theory, assuming that the propagation of informationfollows a Markovian model (Shannon, 1948).If H is the Shannon entropy, it fulfills the equation, often called theFundamental Equation of Information Theory (FEITH)H(x) + (1 − x)Hy1−x− H(y ) − (1 − y )Hx1−y= 0.(FEITH)It is known (Aczel and Dhombres, 1989), that if g is a real functionlocally integrable on ]0, 1[ and if, moreover, g fulfills FEITH, thenthere exists c ∈ R such that g = cH (we can also restrict thehypothesis to Lebesgue measurable).3 / 13 Information theory, Entropy and Polylogarithms (2/4)It turns out that FEITH can be derived, in a precise formal sense(Elbaz-Vincent and Gangl, 2002), from the 5-term equation of theclassical (or p-adic) dilogarithm.Cathelineau (1996) found that an appropriate derivative of theBloch–Wigner dilogarithm coincides with the classical entropyfunction, and that the five term relation satisfied by the formerimplies the four term relation of the latter.zn|z| < 1, theMore precisely, we define Lim (z) = ∞ nm ,n=1m-logarithm. We setD2 (z) = i Im Li2 (z) + log(1 − z) log |z| ,Then D2 satisfies the following 5-term equationD2 (a) − D2 (b) + D2ba− D21−b1−a+ D21 − b−11 − a−1= 0,whenever such an expression makes sense. The relation is thefamous five term equation for the dilogarithm (first stated by Abel).4 / 13 Information theory, Entropy and Polylogarithms (3/4)It can be shown formarly (see Cathelineau, Elbaz-Vincent andGangl) that FEITH is an infinitesimal version of this 5-termequation.Kontsevich (1995) discovered that the truncated finite logarithmover a finite field Fp , with p prime, defined byp−1£1 (x) =k=1xk,ksatisfies FEITH.In our previous work, we showed how one can expand thisrelationship for “higher analogues" in order to produce and provesimilar functional identities for finite polylogarithms from those forclassical polylogarithms (using mod p reduction of p-adicpolylogarithms and their infinitesimal version). It was also shownthat functional equations for finite polylogarithms often hold evenas polynomial identities over finite fields.5 / 13 Information theory, Entropy and Polylogarithms (4/4) Entropy and FEITH arise from the infinitesimal picture (forboth archimedean and non-archimedean structure) and their finiteanalogs associated to the dilogarithm. Does their exist higheranalogue of the Shannon entropy associated to m-logarithms ? Itcould be connected to the higher degrees of the informationcohomology space of Baudot and Bennequin (Entropy 2015).6 / 13 Algebraic interpretation of the entropy function (1/2)Let R be a (commutative) ring and let D be a map from R to R.We will say that D is a unitary derivation over R if the followingaxioms hold :1 “Leibniz’s rule” : for all x, y ∈ R, we haveD(xy ) = xD(y ) + yD(x).2 “Additivity on partitions of unity” : for all x ∈ R, we haveD(x) + D(1 − x) = 0.We will denote by Der u (R) the set of unitary derivations over R.We will say that a map f : R → R is an abstract symmetricinformation function of degree 1 if the two following conditionshold : for all x, y ∈ R such that x, y , 1 − x, 1 − y ∈ R × , thefunctional equation FEITH holds and for all x ∈ R, we havef (x) = f (1 − x). Denote by IF 1 (R) the set of abstract symmetricinformation functions of degree 1 over R. Then IF 1 (R) is anR-module. Let Leib(R) be the set of Leibniz functions over R (i.e.which fulfill the “Leibniz rule”).7 / 13 Algebraic interpretation of the entropy function (2/2)Proposition : We have a morphism of R-modulesh : Leib(R) → IF 1 (R), defined by h(ϕ) = ϕ + ϕ ◦ τ , withτ (x) = 1 − x. Furthermore, Ker (h) = Der u (R). Hence, if h is onto, abstract information function are naturallyassociated to formal derivations. Nevertheless, h can be also 0.Indeed, if R = Fq , is a finite field, then Leib(Fq ) = 0, butIF 1 (Fq ) = 0 (it is generated by £1 ).8 / 13 Cohomological interpretation of formal entropy functionsThe following results are classical in origin (Cathelineau, 1988 andKontsevich, 1995)Proposition : Let F be a finite prime field and H : F → F afunction which fulfills the following conditions : H(x) = H(1 − x),the functional equation (FEITH) holds for H and H(0) = 0. Thenxthe function ϕ : F × F → F defined by ϕ(x, y ) = (x + y )H( x+y ) ifx + y = 0 and 0 otherwise, is a non-trivial 2-cocycle.sketch of proof : Suppose that ϕ is a 2-coboundary. Then, there exists a mapQ : F → F , such that ϕ(x, y ) = Q(x + y ) − Q(x) − Q(y ). The functionψλ (x) = Q(λx) − λQ(x) is an additive morphism F → F , hence entirelydetermined by ψλ (1). The map ψλ (1) fulfills the Leibniz chain rule on F × . Wededuce from it that ϕ = 0 (which is not possible, so it is not a coboundary !) We deduce that £1 is unique (up to a constant). In the real orcomplex we use other type of cohomological arguments (see alsothe relationship with Baudot and Bennequin, 2015).9 / 13 Finite multiple polylogarithms (1/3)While classical polylogarithms play an important role in the theoryof mixed Tate motives over a field, it turns out that it is oftenpreferable to also consider the larger class of multiplepolylogarithms (cf. Goncharov’s work). In a similar way it is usefulto investigate their finite analogues. We are mainly concerned withfinite double polylogarithms which are given as functionsZ/p × Z/p → Z/p by£a,b (x, y ) =0 0 be divisible by 3, and put ω = n/3 − 1. Thenωj=0ωj2ωj111£n−(j+1),j+1 [a, b]−[ , a b]−ap bp [b,]+bp [a b, ] = 0.aabbQuestions : what is the interpretation in term of informationtheory for the multiple polylogs ?12 / 13 Finite polylogarithms and Fermat’s last theoremSeveral classical criteria used by Kummer, Mirimanoff andWieferich to prove certain cases of Fermat’s Last Theorem can berephrased in terms of functional equations and evaluations of finite(multiple) polylogarithms. For example, Mirimanoff was led to thestudy of (nowadays called) Mirimanoff polynomials (cf. Ribenboimp−1book on FLT) ϕj (T ) = j=1 k j−1 T k , which are nothing else butfinite polylogarithms...The Mirimanoff congruences (op.cit) can be reformulated asfollows : for any solution (x, y , z) of x p + y p + z p = 0 in pairwiseprime integers not divisible by p (i.e. a Fermat triple) and forxt = − y we have£1 (t) = 0 ,£j (t)£p−j (t) = 0(j = 2, . . . ,p−1).2One can prove these congruences using an identity expressing£p−j−1,j+1 (1, T ) in terms of £n (T ).13 / 13

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a dictionary between arithmetic geometry of toric varieties and convex analysis. This correspondence allows for effective computations of arithmetic invariants of these varieties. In particular, combined with a closed formula for the integration of a class of functions over polytopes, it gives a number of new values for the height (arithmetic analog of the degree) of toric varieties, with respect to interesting metrics arising from polytopes. In some cases these heights are interpreted as the average entropy of a family of random processes.
 

”GSI’15”´Ecole Polytechnique, October 28, 2015Heights of toric varieties, entropyand integration over polytopesJos´ Ignacio Burgos Gil, Patrice Philippon & Mart´ SombraeınPatrice Philippon, IMJ-PRGUMR 7586 - CNRS1 Toric varietiesToric varieties form a remarkable class of algebraic varieties,endowed with an action of a torus having one Zariski dense openorbit. Toric divisors are those invariant by the action of the torus.Together with their toric divisors, they can be describedin terms of combinatorial objects such as lattice fans, supportfunctions or lattice polytopes(u1 ,u2 )→0(u1 ,u2 )→−u1(u1 ,u2 )→−u22 Each cone corresponds to an affine toric variety and the fanencodes how they glue together. If the fan is complete then thetoric variety is proper.The support function determines a toric divisor D on eachaffine toric chart. By duality, the stability set of the supportfunction is a polytope ∆, which may be empty but which is ofdimension n as soon as D is nef, which is equivalent to thesupport function being concave.One fundamental result is: if D is a toric nef divisor thendegD (X) = n!voln(∆).3 HeightsA height measures the complexity of objects over the field ofrational numbers, say. For a/b ∈ Q× and d = gcd(a, b):h(a/b) = log max(|a/d|, |b/d|) =log max(|a|v , |b|v ),vthanks to the product formula:|d|v = 1vfor any d ∈ Q× and where v runs over all the (normalised)absolute values on Q (usual and p-adic).4 HeightsA height measures the complexity of objects over the field ofrational numbers, say. For a/b ∈ Q× and d = gcd(a, b):h(a/b) = log max(|a/d|, |b/d|) =log max(|a|v , |b|v ),vthanks to the product formula:|d|v = 1, d ∈ Q×.vFor points of a projective space x = (x0 : . . . : xN ) ∈ PN (Q):h(x) =log xvv=−log(x) v ,vwhere · v is a norm on QN +1 compatible with the absolute value|·|v on Q (usual or p-adic). Metrics on OPN (1): (x) v = | (x)|v .x v5 On an abstract variety equipped with a divisor (X, D),defined over Q, the suitable arithmetic setting amounts to acollection of metrics on the space of rational sections of thedivisor, compatible with the absolute values on Q (the collectionis in bijection with the set of absolute values on Q). We denoteD the resulting metrised divisor.Arithmetic intersection theory allows to define the height ofX relative to D analogously to the degree degD (X):hD (X) =hv (X)vwhere the local heights hv are defined through an arithmeticanalogue of B´zout formula. Local heights depend on the choiceeof auxiliary sections but the global height does not.6 Metrics on toric varietiesOn toric divisors, a metric is said toric if it is invariant bythe action of the compact sub-torus of the principal orbit.There exists a bijection between toric metrics and continuousfunctions on the fan, whose difference with the support functionis bounded. The metric is semipositive iff the correspondingfunction is concave.By Legendre duality, the semipositive toric metrics are also inbijection with the continuous, concave functions on the polytopeassociated to the toric divisor, dubbed roof function.7 The roof function is the concave enveloppe of the graphof the function s → − log s v,sup, for s running over the toricsections of the divisor and its multiples.1Roof function of the pull-back of the canonical metric of P2 on P1 by t→( 1 : 2 :t)tv=2v=∞v=otherThe support function itself corresponds to the so-calledcanonical metric. Its roof function is the zero function on thepolytope.8 Heights on toric varietiesLet (X, D) be a toric varieties with a toric divisor (overQ), equipped with a collection of toric metrics (a toric metriseddivisor).The (local) roof functions attached to the toric metriseddivisor sum up in the so-called global roof function:ϑ :=ϑv .vWe have the analogue of the formula seen for the degree:hD (X) = (n + 1)!ϑ.∆9 Metrics from polytopesLet F (x) = x, uF + F (0) be the linear forms defining apolytope Γ ⊂ Rn, with F running over its facets and uF =voln−1(F )nvoln(Γ) . Let ∆ ⊂ Γ be another polytope, the restriction of1ϑ := −F log( F )cFto ∆, is the roof function of some (archimedean) metric on thetoric variety X and divisor D defined by ∆, hence D.Example: the roof function of the Fubini-Study metric on Pn is−(1/2)(x0 log(x0) + . . . + xn log(xn))1where x0 = 1 − x1 − . . . − xn (dual to − 2 log 1 +ne−2uii=1).10 Height as average entropyLet x ∈ Γ and βx be the (discrete) random variable thatmaps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F ):voln−1(F )P (βx = F ) = dist(x, F ).nvoln(Γ)x•∆ΓF11 Height as average entropyLet x ∈ Γ and βx be the (discrete) random variable thatmaps y ∈ Γ to the face F of Γ such that y ∈ Cone(x, F ):voln−1(F )P (βx = F ) = dist(x, F ).nvoln(Γ)The entropyE(βx) = −P (βx = F ) log(P (βx = F ))Fsatisfies1·voln(∆)hD (X)cE(βx)dvoln(x) =·.n + 1 degD (X)∆12 Integration over polytopesAn aggregate of ∆ in a direction u ∈ Rn is the union ofall the faces of ∆ contained in {x ∈ Rn | x, u = λ} for someλ ∈ R.Definition – Let V be an aggregate in the direction of u ∈ Rn,we set recursively: If u = 0, then Cn(∆, 0, V ) = voln(V ) andCk (∆, 0, V ) = 0 for k = n. If u = 0, thenCk (∆, u, V ) = −FuF , uCk (F, πF (u), V ∩ F ),2uwhere the sum is over the facets F of ∆. This recursive formulaimplies that Ck (∆, u, V ) = 0 for all k > dim(V ).13 Proposition [2, Prop.6.1.4] – Let ∆ ⊂ Rn be a polytope ofdimension n and u ∈ Rn. Then, for any f ∈ C n(R),dim(V )f (n)( x, u )dvoln(x) =∆Ck (∆, u, V )f (k)( V, u ).V ∈∆(u)k=0The coefficients Ck (∆, u, V ) are determined by this identity.nExample: If ∆ = Conv(ν0, . . . , νn) = i=0{x; x, ui ≥ λi} is asimplex and u ∈ Rn \ {0}, then C0(∆, u, ν0) equalsn!voln(∆)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this paper we propose a method to characterize and estimate the variations of a random convex set Ξ0 in terms of shape, size and direction. The mean n-variogram γ(n)Ξ0:(u1⋯un)↦E[νd(Ξ0∩(Ξ0−u1)⋯∩(Ξ0−un))] of a random convex set Ξ0 on ℝ d reveals information on the n th order structure of Ξ0. Especially we will show that considering the mean n-variograms of the dilated random sets Ξ0 ⊕ rK by an homothetic convex family rKr > 0, it’s possible to estimate some characteristic of the n th order structure of Ξ0. If we make a judicious choice of K, it provides relevant measures of Ξ0. Fortunately the germ-grain model is stable by convex dilatations, furthermore the mean n-variogram of the primary grain is estimable in several type of stationary germ-grain models by the so called n-points probability function. Here we will only focus on the Boolean model, in the planar case we will show how to estimate the n th order structure of the random vector composed by the mixed volumes t (A(Ξ0),W(Ξ0,K)) of the primary grain, and we will describe a procedure to do it from a realization of the Boolean model in a bounded window. We will prove that this knowledge for all convex body K is sufficient to fully characterize the so called difference body of the grain Ξ0⊕˘Ξ0. we will be discussing the choice of the element K, by choosing a ball, the mixed volumes coincide with the Minkowski’s functional of Ξ0 therefore we obtain the moments of the random vector composed of the area and perimeter t (A(Ξ0),U(Ξ)). By choosing a segment oriented by θ we obtain estimates for the moments of the random vector composed by the area and the Ferret’s diameter in the direction θ, t((A(Ξ0),HΞ0(θ)). Finally, we will evaluate the performance of the method on a Boolean model with rectangular grain for the estimation of the second order moments of the random vectors t (A(Ξ0),U(Ξ0)) and t((A(Ξ0),HΞ0(θ)).
 

Characterization and Estimation of the Variations of aRandom Convex Set by its Mean n-Variogram :Application to the Boolean ModelS.Rahmani, J-C.Pinoli & J.DebayleEcole Nationale Sup´rieure des Mines de Saint-Etienne,FRANCEeSPIN, PROPICE / LGF, UMR CNRS 530728/10/2015SR (ENSM-SE / LGF-PMDM)GSI 201528/10/20151 / 22 Geometric Stochastic Modeling and objectivesSection 1Geometric Stochastic Modeling and objectivesSR (ENSM-SE / LGF-PMDM)GSI 201528/10/20152 / 22 Geometric Stochastic Modeling and objectivesStochastic materialsMaterial modellingMaterial characterizationSR (ENSM-SE / LGF-PMDM)GSI 201528/10/20153 / 22 Geometric Stochastic Modeling and objectivesGerm-Grain model [Matheron 1967]DefinitionΞ=xi + Ξi(1)xi ∈ΦThe Ξi are i.i.d.Φ a point processLaw of ΦLaw of Ξ0⇔⇔Spatial distributiongranulometryBoolean model ⇒ Φ Poisson point process of intensity λSR (ENSM-SE / LGF-PMDM)GSI 201528/10/20154 / 22 Geometric Stochastic Modeling and objectivesObjectives and state of the artGeometrical characterization of Ξ0from measurements in a bounded window Ξ ∩ MNo assumption on Ξ0 ’s shape.Describing Ξ0 .State of the artMiles formulae [Miles 1967]Tangent points method [Molchanov 1995]Minimum contrast method[ Dupac & Digle 1980]⇒ Mean geometric parameter λ, E[A(Ξ0 )], E[U(Ξ0 )]Formula for distribution for model of disk [Emery 2012]SR (ENSM-SE / LGF-PMDM)GSI 201528/10/20155 / 22 Geometric Stochastic Modeling and objectivesCharacterization and description of the grainFor homothetic grains:E[U(Ξ0 )]2πE[U(Ξ0 )]=4E[A(Ξ0 )]πDisk of radius r : E[r ] =& E[r 2 ] =Square of side x :E[x]& E[r 2 ] = E[A(Ξ0 )]⇒ Parametric distribution of homothetic factor!For non homothetic grains: rectangle, ellipse...Same mean for area and perimeter (Minkowski densities)⇒ insufficient to fully characterize Ξ0 ! What about the variations ofthese geometrical characteristics?SR (ENSM-SE / LGF-PMDM)GSI 201528/10/20156 / 22 Theoretical aspectsSection 2Theoretical aspectsSR (ENSM-SE / LGF-PMDM)GSI 201528/10/20157 / 22 Theoretical aspectsFrom covariance of Ξ to variation of Ξ0Mean covariogram:γΞ0 (u) = E[A(Ξ0 ∩ Ξ0 + u)]¯Covariance:CΞ (u) = P(x ∈ (Ξ ∩ Ξ + u))Relationship:γΞ0 (u) =¯2CΞ (u) − pΞ1log 1 +γ(1 − pΞ )2(2)In addition:R2SR (ENSM-SE / LGF-PMDM)γΞ0 (u)du = E[A(Ξ0 )2 ]¯GSI 201528/10/20158 / 22 Theoretical aspectsStability by convex dilationsΞΞ⊕K(a) grain Ξ0 , intensity λ(b) grain Ξ0 ⊕ K , intensity λWhere X ⊕ Y = {x + y |x ∈ X , y ∈ Y }⇒ The Boolean model is stable under convex dilationsSR (ENSM-SE / LGF-PMDM)GSI 201528/10/20159 / 22 Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =SR (ENSM-SE / LGF-PMDM)GSI 2015R2E[γΞ0 ⊕rK (u)]du28/10/201510 / 22 Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =R2E[γΞ0 ⊕rK (u)]duSteiner’s formula (mixed volumes)A(Ξ0 ⊕ rK ) = A(Ξ0 ) + 2rW (Ξ0 , K ) + r 2 A(K )The polynomial ζ0,Kζ0,K (r ) = E[A2 ] + 4r E[A0 W (Ξ0 , K )] + r 2 (4E[W (Ξ0 , K )2 ] +0+ 2A(K )E[A0 ]) + 4r 3 A(K )E[W (Ξ0 , K )] + r 4 A(K )2SR (ENSM-SE / LGF-PMDM)GSI 201528/10/201510 / 22 Theoretical aspectsThe proposed methodConsequently, for all r ≥ 0 we can estimate:ζ0,K (r ) = E[A(Ξ0 ⊕ rK )2 ] =R2E[γΞ0 ⊕rK (u)]duSteiner’s formula (mixed volumes)A(Ξ0 ⊕ rK ) = A(Ξ0 ) + 2rW (Ξ0 , K ) + r 2 A(K )The polynomial ζ0,Kζ0,K (r ) = E[A2 ] + 4r E[A0 W (Ξ0 , K )] + r 2 (4E[W (Ξ0 , K )2 ] +0+ 2A(K )E[A0 ]) + 4r 3 A(K )E[W (Ξ0 , K )] + r 4 A(K )2⇒ Estimation of E[A2 ], E[A0 W (Ξ0 , K )] and E[W (Ξ0 , K )2 ]0SR (ENSM-SE / LGF-PMDM)GSI 201528/10/201510 / 22 Theoretical aspectsGeneralization to nth order momentsThe mean n-variogram(n)For n ≤ 2, γΞ0 (u1 , · · · un−1 ) = E[A(n−1i=1 (Ξ0− ui ) ∩ Ξ0 )]Relation n-variogram → n point probability function (see proceding)(n)Of course R2 · · · R2 γΞ0 (u1 , · · · un−1 )du1 · · · dun−1 = E[A(Ξ0 )n ]Then the development of E[A(Ξ0 ⊕ K )n ] by Steiner’s formula gives:∀K convex, nth order moments of (A0 , W (Ξ0 , K ))SR (ENSM-SE / LGF-PMDM)GSI 201528/10/201511 / 22 Theoretical aspectsThe interpretation of the mixed areaDefinitionFor Ξ0 and K convex, W (Ξ0 , K ) = 1 (A(Ξ0 ⊕ K ) − A(K ))2For unit ball :W (Ξ0 , B) = U(Ξ) the perimetereFor a segment: W (Ξ0 , Sθ ) = HΞ0 (θ) the F´ret’s diameterHΞ0 (θ)Ξ0OxθFor a polygon W (Ξ,Ni=1 αi Sθi )

Short course (chaired by Roger Balian)

Creative Commons None (All Rights Reserved) None (All Rights Reserved)
Voir la vidéo

ifINSTITUTFOURIERGeometry on the set of quantum states andquantum correlationsDominique SpehnerInstitut Fourieret Laboratoire de Physique et Mod´lisation des Milieux Condens´s,eeGrenoble´Short course, GSI’2015, Ecole Polytechnique, Paris, 28/10/2015 Quantum Correlations & Quantum Information™ Quantum Information Theory (QIT) studies quantum systemsthat can perform information-processing tasks more efficientlythan one can do with classical systems:- computational tasks (e.g. factorizing into prime numbers)- quantum communication (e.g. quantum cryptography, ...) A quantum computer works with qubits,i.e. two-level quantum systems inlinear combinations of |0y and |1y. Entanglement is a resource for quantum computation and communication[Bennett et al. ’96, Josza & Linden ’03]However, other kinds of “quantum correlations” differingfrom entanglement could also explain the quantum efficiencies. Outlines Entangled and non-classical states Contractive distances on the set of quantum states Geometrical measures of quantum correlations
Basic mathematical objects in quantummechanics(1) A Hilbert space H (in this talk: n  dim H   V).(2) States ρ are non-negative operators on H with trace one.(3) Observables A are self-adjoint operators on H(in this talk: A € MatpC, nq finite Hermitian matrices)(4) An evolution is given by a linear map Φ : MatpC, nq Ñ MatpC, nqwhich is(TP) trace preserving (so that trpΦpρqq  trpρq  1)(CP) Completely Positive, i.e. for any integer d ¥ 1 and anyd ¢ d matrix pAij qd 1 ¥ 0 with elements Aij € MatpC, nq,i,jone has pΦpAij qqd 1 ¥ 0.i,jSpecial case: unitary evolution Φpρq  U ρ U ¦ with U unitary. Pure and mixed quantum states A pure state is a rank-one projector ρψ  |ψyxψ| with |ψy € H,}ψ}  1 (actually, |ψy belongs to the projective space P H).The set E pHq of all quantum states is a convex cone. Its extremalelements are the pure states. A mixed state is a non-pure state.It has infinitely many purestate decompositionsρwith pi ¥ 0,°¸pi|ψiyxψi|,ii pi 1 and |ψiy € P H.Statistical interpretation: the pure states |ψiy have beenprepared with probability pi. Quantum-classical analogyHilbert space Hstate ρobservableset of quantum statesE pHqCPTP map ΦØØØØfinite sample space Ωprobability p on pΩ, P pΩqqrandom variable on pΩ, P pΩqqprobability simplex2Eclass  p € R ;°@1Ø stochastic matrices pΦklqk,l1,...,n°(Φkl ¥ 0, k Φkl  1 d l)nk pk Separable statesA bipartite system AB is composed of two subsystems A and B withHilbert spaces HA and HB. It has Hilbert space HAB  HA ˜ HB.For instance, A and B can be the polarizations of two photonslocalized far from each other ñ HAB C2 ˜ C2 (2 qubits): A pure state |Ψy of AB is separable if it is a product state|Ψy  |ψy ˜ |φy with |ψy € P HA and |φy € P HB. A mixed state ρ is separable if it admits a pure state°decomposition ρ i pi|ΨiyxΨi| with |Ψiy  |ψiy ˜ |φiyseparable for all i. Entangled states Nonseparable states are called entangled. Entanglement isãÑ the most specific feature of Quantum Mechanics.ãÑ used as a resource in Quantum Information (e.g. quantumcryptography, teleportation, high precision interferometry...). Examples of entangled & separable states: let HA HBC2 (qubits) with canonical basis t|0y, |1yu. The pure states¡©|Ψ¨ y  c12 |0 ˜ 0y ¨ |1 ˜ 1y are maximally entangled.BellãÑ lead to the maximal violation of the Bell inequalitiesobserved experimentally [Aspect et al ’82] ñ nonlocality of QM1    |  1 |Ψ¡ yxΨ¡ |In contrast, the mixed state ρ  |ΨBellyxΨBellBell22 Bellis separable !(indeed, ρ 11|0 ˜ 0yx0 ˜ 0|   2 |1 ˜ 1yx1 ˜ 1|).2 Classical states
 A state ρ of AB is classical if it has a spectral decomposition°ρ  k pk |Ψk yxΨk | with product u states |Ψk y  |αk y ˜ |βk y.Classicality is equivalent to separability for pure states only.° A state ρ is A-classical if ρ  i qi|αiyxαi| ˜ ρB|i witht|αiyu orthonormal basis of HA and ρB|i arbitrary states of B. The set CAB (resp. CA) of all (A-)classical states is not convex.Its convex hull is the set of separable states SAB. Some tasks impossible to do clas-ρSABCACBCABsically can be realized using separable non-classical mixed states. Such states are easier to produce andpresumably more robust to a couplingwith an environment. Quantum vs classical correlations Central question in Quantum Information theory: identify(and try to protect) the Quantum Correlations responsiblefor the exponential speedup of quantum algorithms.classical correlations For mixed states,two (at least)kinds of QCsquantum correlationsÕ×entanglement [Schr¨dinger ’36]ononclassicality (quantum discord)[Ollivier, Zurek ’01, Henderson, Vedral ’01] OutlinesEntangled and non-classical states Contractive distances on the set of quantum states
Contractive distancesσφ(σ)ρφ(ρ)CONTRACTIVE DISTANCE
 The set EAB of all quantum states ofa bipartite system AB (i.e. , operatorsρ ¥ 0 on HAB with tr ρ  1) canbe equipped with many distances d. From a QI point of view, interesting distances must be contractiveunder CPTP maps, i.e. for any such map Φ on EAB, d ρ, σ € EAB,dpΦpρq, Φpσ qq ¤ dpρ, σ qPhysically: irreversible evolutions can only decrease thedistance between two states. A contractive distance is in particular unitarily invariant, i.e.dpU ρU ¦, U σU ¦q  dpρ, σ q for any unitary U on HAB The Lp-distances dppρ, σq  }ρ ¡ σ}p  ptr |ρ ¡ σ|pq1{p arenot contractive excepted for p  1 (trace distance) [Ruskai ’94]. Petz’s characterization of contractive distances
 Classical setting:there exists a unique (up to a multiplicativefactor) contractive Riemannian distance dclas on the probability°2simplex Eclas, with Fisher metric ds  k dp2 {pk [Cencov ’82]k Quantum generalization: any Riemannian contractive distanceon the set of states E pHq with n  dim H   V has metricds2  gρpdρ, dρq n¸k,l1cppk , plq|xk |dρ|ly|2where pk and |k y are the eigenvalues and eigenvectors of ρ,pf pq {pq   qf pp{q qcpp, q q 2pqf pp{q qf pq {pqand f : R  Ñ R  is an arbitary operator-monotone functionsuch that f pxq  xf p1{xq[Morozova & Chentsov ’90, Petz ’96] Distance associated to the von Neumann entropy™ Quantum analog of the Shannon entropy: von Neumann entropyS pρq  ¡ trpρ ln ρq™ Since S is concave, the physically most natural metric is§§§ρds2  gS pdρ, dρq  ¡ d S pdt tdρq § d F pX  sdX q §§s0dst02222[Bogoliubov; Kubo & Mori; Balian, Alhassid & Reinhardt, ’86, Balian ’14].with F pX q  ln trpeX q and ρ  eX ¡F pX q  eX { trpeX q.1™ ds2 has the Petz form with f pxq  x¡xlnãÑ the corresponding distance is contractive.™ Loss of information when mixing the neighboring equiprobablestates ρ¨  ρ ¨ 1 dρ: ds2{8  S pρq ¡ 1 S pρ q ¡ 1 S pρ¡q222 Bures distance and Uhlmann fidelity™ Fidelity (generalizes F |xψ|φy|2 for mixed states) [Uhlmann ’76]2 c c 1{2@2F pρ, σ q  trr σρ σ s F pσ, ρq —¨1[Bures ’69]™ Bures distance: dBupρ, σq  2 ¡ 2 F pρ, σqãÑ has metric of the Petz form with f pxq  x 12ãÑ smallest contractive Riemannian distance[Petz ’96]ãÑ coincides with the Fubiny-Study metric on P H for pure statesãÑ dBupρ, σ q2 is jointly convex in pρ, σ q™ dBupρ, σq  sup dclaspp, qq with sup over all measurements2giving outcome k with proba pk (for state ρ) and qk (for state σ)[Fuchs ’96] Bures distance and Fisher information
In quantum metrology, the goal is toestimate an unknown parameter φ bymeasuring the output statesρoutpφq  e¡iφH ρ eiφHand using a statistical estimatordepending on the measurement results(e.g. in quantum interferometry: estimate the phase shift φ1 ¡ φ2)§© ie¡§ãÑ precision ∆φ § fxφestyφ §¡1§ fφ § φest¡φ2 1{2φThe smallest precision is given by the quantum Cr´mer-Rao bounda1p∆φqbest  cN cF pρ,H q , F pρ, H q  4dBupρ, ρ   dρq2 , dρ  ¡irH, ρsN = number of measurementsF pρ, H q = quantum Fisher information[Braunstein & Caves ’94] SummaryCONTRACTIVE RIEMANNIAN METRICS:ClassicalQuantumInterpretationQ. metrologyunique:Bures ds2Bu..(Fisher information) ¡d2SLoss of informationds2 clasÕ¸ dp2kk(Fisher)pkÑ×ds2S..when merging 2 statesHellinger ds2Hel Q. state discrimination..with many copies OutlinesEntangled and non-classical statesContractive distances on the set of quantum states Geometrical measures of quantum correlations
Geometric approach of quantum correlationsGeometric entanglement:E pρq  min dpρ, σsepq2σsep€SABρGeometric quantum discord :SABCACBCABDApρq  min dpρ, σA-clq2σA-cl€CAProperties:E pρΨq  DApρΨq for pure states ρΨE is convexÐ for Bures distanceÐ if d2 is jointly convexE pΦA ˜ ΦBpρqq ¤ E pρq for anyEntanglement monotonicity:TPCP maps ΦA and ΦB acting on A and B (also true for DA but¦only when ΦApρAq  UA ρA UA).Ð if d is contractive Bures geometric measure of entanglement—EBupρq  dBupρ, SABq2  2 ¡ 2 F pρ, SABqwith F pρ, SABq  maxσsep€SAB F pρ, σsepq= maximal fidelity between ρ and a separable state.ÝÑ Main physical question: determine F pρ, SABq explicitely. pb: it is not easy to find the geodesics for the Bures distance!™ The closest separable state to a pure state ρΨ is a pure productstate, so that F pρΨ, SABq  max|ϕy,|χy |xϕ ˜ χ|Ψy|2Ñ easy!™ For mixed states ρ, F pρ, SABq coincides with the convex roofF pρ, SABq  maxt|Ψiy,ηiu[Streltsov, Kampermann and Bruß’10]°i pi FpρΨ , SABqÑ not easy!°max. over all pure state decompositions ρ  i pi|ΨiyxΨi| of ρ.i The two-qubit caseAssume that both subsystems A and B are qubits, HA
Concurrence:HBC2 .[Wootters ’98]C pρq  maxt0, λ1 ¡ λ2 ¡ λ3 ¡ λ4uwith λ2 ¥ λ2 ¥ λ2 ¥ λ2 the eigenvalues of ρ σy ˜ σy ρ σy ˜ σy1234σy¢0i¡i0= Pauli matrixρ = complex conjugate of ρ in the canonical (product) basis.
Then[Wei and Goldbart ’03, Streltsov, Kampermann and Bruß’10]F pρ, SABq 12 1 —1 ¡ C pρq2¨ Quantum State Discrimination"ancilla"|ψ2 >| ψ3 >| ψ1 >?|0>1111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000111111111111111111111110000000000000000000000011111111111111111111111000000000000000000000001111111111111111111111100000000000000000000000MEASUREMENTAPPARATUSãÑAreceiver gets a state ρi randomlychosen with probability ηi among aknown set of states tρ1, ¤ ¤ ¤ , ρmu. To determine the state he has in hands,he performs a measurement on it.Applications : quantum communication, cryptography,... If the ρi are u, one can discriminate them unambiguously Otherwise one succeeds with probability°PS  i ηi trpMiρiqΠ2|ψ2>Mi = non-negative operators describing the°measurement, i Mi  1.Open pb (for n ¡ 2): find the optimal measurementopttMioptu and highest success probability PS .|ψ1>Π1 Bures geometric quantum discordThe square Bures distance DApρq  dBupρ, CAq2 to the set CA ofA-classical states is a geometric analog of the quantum discordcharacterizing the “quantumness” of states(actually, the A-classical states are the states with zero discord)opt
PS p|αiyq  optimal success proba. in discriminating the states¡1cρ |α yxα | ˜ 1 cρρi  η iiiwith proba ηi  xαi| trBpρq|αiy, where t|αiyu  orthonormal basis of HA.
The geometric quantum discord is given by solving a statediscrimination problem[Spehner and Orszag ’13]DApρq  2 ¡ 2 maxt|αiyu˜optPS p|αiyq Closest A-classical states to a state ρ
The closest A-classical states to ρ areσρ 1F pρ,CAq°|optαiiyxoptαi|˜xoptαicρ Πoptcρ |αopty|ii[Spehner and Orszag ’13]where tΠoptu is the optimal von Neumann measurement andioptoptt|αi yu the orthonormal basis of HA maximizing PS , i.e.¸ηF pρ, C q nAAi1pMioptρoptq.iopttri
ρ can have either a unique or an infinity of closest A-classicalstates. The qubit case
If A is a qubit, HAF pρ, CAq C2, and dim HB  nB, thenA1max 1 ¡ tr Λpuq   2λlpuq2 }u}1l 13nB¸[Spehner and Orszag ’14]λ1puq ¥ ¤ ¤ ¤ ¥ λ2nB puq eigenvaluesof the 2nB ¢ 2nB matrixcρ σ ˜ 1 cρΛpuq uwith u € R3, }u}  1, andσu  u1σ1   u2σ2   u3σ3 with σiPauli matrices.c3MKστLc2σρστρIc1Η+Η−σρτρc1=c 2JG−NG+ Conclusions & perspectives Conclusions:™ Contractive Riemannian distances on the set of quantum states™provide useful tools for measuring quantum correlations inbipartite systems.Major challenges areãÑ compute the geometric measures for simple systemsãÑ compare the measures obtained from different distancesand look for universal properties References:-Review article: D. Spehner, J. Math. Phys. 55, 075211 (’14)D. Spehner, M. Orszag, New J. Phys. 15, 103001 (’13)D. Spehner, M. Orszag, J. Phys. A 47, 035302 (’14)R. Roga, D. Spehner, F. Illuminati, arXiv:1510.06995

Keynote speach Marc Arnaudon (chaired by Frank Nielsen)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We will prove a Euler-Poincaré reduction theorem for stochastic processes taking values in a Lie group, which is a generalization of the Lagrangian version of reduction and its associated variational principles. We will also show examples of its application to the rigid body and to the group of diffeomorphisms, which includes the Navier-Stokes equation on a bounded domain and the Camassa-Holm equation.
 

Deterministic frameworkStochastic frameworkStochastic Euler-Poincaré reduction.Marc ArnaudonUniversité de Bordeaux, FranceGSI, École Polytechnique, 29 October 2015Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkReferencesArnaudon, Marc; Chen, Xin; Cruzeiro, Ana Bela; Stochastic Euler-Poincaréreduction. J. Math. Phys. 55 (2014), no. 8, 17ppChen, Xin; Cruzeiro, Ana Bela; Ratiu, Tudor S.; Constrained and stochasticvariational principles for dissipative equations with advected quantities.arXiv:1506.05024Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic framework1Deterministic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathssCharacterization of the geodesics on GV , ·, · 0sEuler-Poincaré equation on GV2Stochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsMarc ArnaudonStochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is defined byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the Euler-Lagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is defined byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the Euler-Lagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is defined byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the Euler-Lagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkLet M be a Riemannian manifold and L : TM × [0, T ] → R a Lagrangian on M.1Let q ∈ Ca,b ([0, T ]; M) := {q ∈ C 1 ([0, T ], M), q(0) = a, q(T ) = b}.1The action functional C : Ca,b ([0, T ]; M) → R is defined byTC (q(·)) :=˙L (q(t), q(t), t) dt.0The critical points for C satisfy the Euler-Lagrange equationddt∂L˙∂qMarc Arnaudon−∂L= 0.∂qStochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the configuration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is defined byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisfies the Euler-Poincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the configuration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is defined byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisfies the Euler-Poincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic Euler-Poincaré reduction. Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkSuppose that the configuration space M = G is a Lie group and L : TG → R is aleft invariant Lagrangian:(ξ) := L(e, ξ) = L(g, g · ξ), ∀ξ ∈ Te G, g ∈ G.(here and in the sequel, g · ξ = Te Lg ξ)1The action functional C : Ca,b ([0, T ]; G) → R is defined byTC (g(·)) :=T˙L (g(t), g(t)) dt =(ξ(t)) dt,00˙where ξ(t) := g(t)−1 · g(t).[J.E. Marsden, T. Ratiu 1994] [J.E. Marsden, J. Scheurle 1993]: g(·) is a critical∗point for C if and only if it satisfies the Euler-Poincaré equation on Te Gddtddξ− ad∗ξ(t)ddξ= 0,∗∗where ad∗ : Te G → Te G is the dual action of adξ : Te G → Te G:ξad∗ η, θ = η, adξ θ ,ξMarc Arnaudon∗η ∈ Te G,θ ∈ Te G.Stochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVWe will be interested in variations ξ(·) satisfying˙ξ(t) = ν(t) + adξ(t) ν(t)˙for some ν ∈ C 1 ([0, T ], Te G),which is equivalent to the variation of g(·) with the perturbationg ε (t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE onG:de (t) = εeε,ν (t) · ν(t),˙dt ε,νeε,ν (0) = e.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVWe will be interested in variations ξ(·) satisfying˙ξ(t) = ν(t) + adξ(t) ν(t)˙for some ν ∈ C 1 ([0, T ], Te G),which is equivalent to the variation of g(·) with the perturbationg ε (t) = g(t)eε,ν (t), where eε,ν (t) is the unique solution to the following ODE onG:de (t) = εeε,ν (t) · ν(t),˙dt ε,νeε,ν (0) = e.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVLet M be a n-dimensional compact Riemannian manifold. We defineGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an infinite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVLet M be a n-dimensional compact Riemannian manifold. We defineGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an infinite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVLet M be a n-dimensional compact Riemannian manifold. We defineGs :=g : M → M a bijection , g, g −1 ∈ H s (M, M) ,where H s (M, M) denotes the manifold of Sobolev maps of class s > 1 +M to itself.nfrom2nthen Gs is a C ∞ Hilbert manifold.2Gs is a group under composition between maps, right translation is smooth, lefttranslation and inversion are only continuous. Gs is also a topological group (butnot an infinite dimensional Lie group).If s > 1 +Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVThe tangent space Tη Gs at arbitrary η ∈ Gs isTη Gs = U : M → TM of class H s , U(m) ∈ Tη(m) M .The Riemannian structure on M induces the weak L2 , or hydrodynamic, metric·, · 0 on Gs given byU, V0ηUη (m), Vη (m):=mdµg (m),Mfor any η ∈ Gs , U, V ∈ Tη Gs . Here Uη := U ◦ η −1 ∈ Te Gs and µg denotes theRiemannian volume asociated with (M, g).Obviously, ·, ·0is a right invariant metric on Gs .Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVLet be the Levi-Civita connection associated with the Riemannian manifold(M, g). We define a right invariant connection 0 on Gs by0 ˜˜YX(η) :=∂∂tt=0˜Y (ηt ) ◦ ηt−1 ◦ η +Xη Yη◦ η,˜ ˜˜˜where X , Y ∈ L (Gs ), Xη := X ◦ η −1 , Yη := Y ◦ η −1 ∈ L s (M), and η is a C 1s such that η = η and d˜ηt = X (η). Here L (Gs ) denotes the setcurve in G0dt t=0of smooth vector fields on Gs .0is the Levi-Civita connection associated to Gs , ·, ·Marc Arnaudon0.Stochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVLet be the Levi-Civita connection associated with the Riemannian manifold(M, g). We define a right invariant connection 0 on Gs by0 ˜˜YX(η) :=∂∂tt=0˜Y (ηt ) ◦ ηt−1 ◦ η +Xη Yη◦ η,˜ ˜˜˜where X , Y ∈ L (Gs ), Xη := X ◦ η −1 , Yη := Y ◦ η −1 ∈ L s (M), and η is a C 1s such that η = η and d˜ηt = X (η). Here L (Gs ) denotes the setcurve in G0dt t=0of smooth vector fields on Gs .0is the Levi-Civita connection associated to Gs , ·, ·Marc Arnaudon0.Stochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 -metric ·, · 0 and its Levi-Civita connection 0,V are defined on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 -metric ·, · 0 and its Levi-Civita connection 0,V are defined on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 -metric ·, · 0 and its Levi-Civita connection 0,V are defined on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVFor s > 1 + n , let2sGV := g, g ∈ Gs , g is volume preserving .sGV is still a topological group.sThe tangent space Te GV isssGV = Te GV = U, U ∈ Te Gs , div(U) = 0 .sThe L2 -metric ·, · 0 and its Levi-Civita connection 0,V are defined on GV bysorthogonal projection. More precisely the Levi Civita connection on GV is given by0,VX Y= Pe (0XY)swith Pe the orthogonal projection on GV :sH s (TM) = GV ⊕ dH s+1 (M).Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every fixed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every fixed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVConsider the ODE on Mddt(gt (x))g0 (x)= u (t, gt (x))= x.Here u(t, ·) ∈ Te Gs for every t > 0.For every fixed t > 0, gt (·) ∈ Gs (M). So g ∈ C 1 ([0, T ], Gs ).sIf div(u(t)) = 0 for every t then g ∈ C 1 ([0, T ], GV )Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsV[V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian pathssg ∈ C 2 ([0, T ], GV ) satisfying the equation above is a geodesic on GV , ·, · 0,V˙(i.e. 0,V g(t)) if and only of the velocity field u satisfies the Euler equation for˙g(t)incompressible inviscid fluids(E)Notice that the termsystem rewrites as∂u∂tdivu=−=0uu−p corresponds to the use of∂u∂tdivuMarc Arnaudon=−=0p0instead of0,Vu uStochastic Euler-Poincaré reduction.0,V :the first Deterministic frameworkStochastic frameworkEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsV[V.I. Arnold 1966] [D.G. Ebin, J.E. Marsden 1970] A Lagrangian pathssg ∈ C 2 ([0, T ], GV ) satisfying the equation above is a geodesic on GV , ·, · 0,V˙(i.e. 0,V g(t)) if and only of the velocity field u satisfies the Euler equation for˙g(t)incompressible inviscid fluids(E)Notice that the termsystem rewrites as∂u∂tdivu=−=0uu−p corresponds to the use of∂u∂tdivuMarc Arnaudon=−=0p0instead of0,Vu uStochastic Euler-Poincaré reduction.0,V :the first Euler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVDeterministic frameworkStochastic frameworkIf we takes: Te GV → R as(X ) := X , X ,sX ∈ Te GV ,s1and define the action functional C : Ce,e ([0, T ], GV ) → R byTC (g(·)) :=˙g(t) · g(t)−1dt,0sthen a Lagrangian path g ∈ C 2 ([0, T ], GV ) integral path of u is a critical point of Cif and only if u satisfies the Euler equation (E). [J.E. Marsden, T. Ratiu 1994][J.E. Marsden, J. Scheurle 1993]Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic framework[S. Shkoller 1998] If we takeX, X(X ) :=mEuler-Poincaré equationsDiffeomorphism group on a compact Riemannian manifoldVolume preserving diffeomorphism groupLagrangian pathsCharacterization of the geodesics on Gs , ·, · 0VEuler-Poincaré equation on GsVs: Te GV → R as the H 1 metricdµg (m) + α2MX,MXmsdµg (m), X ∈ Te GV ,s1and define the action functional C : Ce,e ([0, T ], GV ) → R in the same way assbefore, then a Lagrangian path g ∈ C 2 ([0, T ], GV ) integral path of u is a criticalpoint of C if and only if u satisfies the Camassa-Holm equation ∂ν ∂t + u · ν + α2 ( u)∗ · ∆ν = p,ν = (1 + α2 ∆)u,div(u) = 0.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsAim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsAim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsAim: to establish a stochastic Euler-Poincaré reduction theorem in a general Lie group.To apply it to volume preserving diffeomorphisms of a compact symmetric space.Stochastic term will correspond for Euler equation to introducing viscosity.Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkAn Rn -valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has finite variation.If (Nt ) is a martingale, thenE[Nt |Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DefiningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkAn Rn -valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has finite variation.If (Nt ) is a martingale, thenE[Nt |Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DefiningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkAn Rn -valued semimartingale ξt has a decompositionξt (ω) = Nt (ω) + At (ω)where (Nt ) is a local martingale and (At ) has finite variation.If (Nt ) is a martingale, thenE[Nt |Fs ] = Ns ,t ≥ s.We are interested in semimartingales which furthermore satisfytAt (ω) =as (ω) ds.0DefiningDξtξt+ε − ξt:= lim EFt ,ε→0dtεwe haveDξt= atdtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a definition for manifold-valued martingales.DefinitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a definition for manifold-valued martingales.DefinitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a definition for manifold-valued martingales.DefinitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkItô formula :tf (ξt ) = f (ξ0 ) +tdf (ξs ), dNs +0df (ξs ), dAs +012tHessf (dξs ⊗ dξs ).0From this we see that ξt is a local martingale if and only if for all f ∈ C 2 (Rn ),f (ξt ) − f (ξ0 ) −12tHessf (dξs ⊗ dξs ) is a real valued local martingale.0This property becomes a definition for manifold-valued martingales.DefinitionLet at ∈ Tξt M an adapted process. If for all f ∈ C 2 (M)tf (ξt )−f (ξ0 )−df (ξs ), as ds−0then12tHessf (dξs ⊗dξs ) is a real valued local martingale0Dξt= at .dtMarc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsLet G be a Lie group with right invariant metric ·, · and right invariant connectionLet G := Te G be the Lie algebra of G.Consider a countable family Hi , i ≥ 1, of elements of G , and u ∈ C 1 ([0, T ], G ).Consider the Stratonovich equationdgtg0==ei≥1Hi ◦ dWti −12Hi Hi.dt + u(t) dt · gtwhere the (Wti ) are independent real valued Brownian motions. Itô formula writestf (gt ) =f (g0 ) +0i≥1+This implies that12idf (gs ), Hi dWs +tdf (gs ), u(s)gs ds0tHessf (Hi (gs ), Hi (gs )) ds.i≥10Dgt= u(t)gt .dtParticular caseIf (Hi ) is an orthonormal basis, Hi Hi = 0, is the Levi Civita connection associatedto the metric and u ≡ 0, then gt is a Brownian motion in G.Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of G-valued semimartingales defineJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the flow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDefinitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of G-valued semimartingales defineJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the flow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDefinitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkOn the space S (G) of G-valued semimartingales defineJ(ξ) =1E2T0Dξdt2dt .Perturbation: for v ∈ C 1 ([0, T ], G ) satisfying v (0) = v (T ) = 0 and ε > 0, leteε,v (·) ∈ C 1 ([0, T ], G) the flow generated by εv :de (t)dt ε,veε,v (0)˙= εv (t) · eε,v (t)=eDefinitionWe say that g ∈ S (G) is a critical point of J if for all v ∈ C 1 ([0, T ], G ) satisfyingv (0) = v (T ) = 0,dJdεgε=0 ε,v=0where gε,v (t) = eε,v (t)g(t).Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkTheoremg is a critical point of J if and only ifdu(t)= −ad∗(t) u(t) − K (u(t))˜udtwith12˜u (t) = u(t) −Hi Hi ,ad∗ v , w = v , adu vui≥1and K : G → G satisfiesK (u), v = −u,12adv Hi Hi+Hi(adv (Hi ))i≥1Remark 1If for all i ≥ 1, Hi = 0, or u v = 0 for all u, v ∈ G , then K (u) = 0 and we get thestandard Euler-Poincaré equation.Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkTheoremg is a critical point of J if and only ifdu(t)= −ad∗(t) u(t) − K (u(t))˜udtwith12˜u (t) = u(t) −Hi Hi ,ad∗ v , w = v , adu vui≥1and K : G → G satisfiesK (u), v = −u,12adv Hi Hi+Hi(adv (Hi ))i≥1Remark 1If for all i ≥ 1, Hi = 0, or u v = 0 for all u, v ∈ G , then K (u) = 0 and we get thestandard Euler-Poincaré equation.Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkPropositionIf for all i ≥ 1,Hi Hi= 0 thenK (u) = −12Hi·Hi u+ R(u, Hi )Hi .i≥1In particular if (Hi ) is an o.n.b. of G thenK (u) = −111u = − ∆u + Ric u222Marc Arnaudonthe Hodge Laplacian.Stochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector field on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can find (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector field on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can find (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic Euler-Poincaré reduction. Semi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsDeterministic frameworkStochastic frameworkLetsGv = {g : M → MAssume s > 1volume preserving bijection, such that g, g −1 ∈ H s }.s+ dimM . Then GV is2ssGV = Te GV = {X :a C ∞ smooth manifold. Lie algebraH s (M, TM), π(X ) = e, div(X ) = 0}.sNotice that π(X ) = e means that X is a vector field on M: X (x) ∈ Tx M. On GVconsider the two scalar productsX, Y0X (x), Y (x) dx=MandX, Y1X (x), Y (x) dx +=MX (x),Y (x) dx.M0VX YsThe Levi Civita connection on GV is given by= Pe ( 0 Y ) with 0 the LeviX0 on Gs and P the orthogonal projection on G s :Civita connection of ·, ·eVsH s (TM) = GV ⊕ dH s+1 (M).One can find (Hi )i≥1 such that for all i ≥ 1,Hi2 f= ν∆f ,Hi Hi= 0, div(Hi ) = 0, andf ∈ C 2 (M).i≥1Marc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves Navier-Stokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of Jonly if u solves Camassa-Holm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves Navier-Stokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of Jonly if u solves Camassa-Holm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic Euler-Poincaré reduction. Deterministic frameworkStochastic frameworkSemi-martingales in a Lie groupStochastic Euler-Poincaré reductionGroup of volume preserving diffeomorphismsNavier-Stokes and Camassa-Holm equationsCorollary(1) g is a critical point of J·,· 0if and only if u solves Navier-Stokes equation∂u∂tdivu=−=0uu+ν∆u2−p(2) Assume M = T2 the 2-dimensional torus. Then g is a critical point of Jonly if u solves Camassa-Holm equation ∂uν= − uv − 2 ∂tvj uj + 2 ∆v − pj=1v= u − ∆udivu = 0·,· 1if andFor the proof, use Itô formula and compute in different situations ad∗ (u) and K (u).vMarc ArnaudonStochastic Euler-Poincaré reduction.

Information Geometry Optimization (chaired by Giovanni Pistone, Yann Ollivier)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
When observing data x1, . . . , x t modelled by a probabilistic distribution pθ(x), the maximum likelihood (ML) estimator θML = arg max θ Σti=1 ln pθ(x i ) cannot, in general, safely be used to predict xt + 1. For instance, for a Bernoulli process, if only “tails” have been observed so far, the probability of “heads” is estimated to 0. (Thus for the standard log-loss scoring rule, this results in infinite loss the first time “heads” appears.)
 

Laplace’s Rule of Succession inInformation GeometryYann OllivierCNRS & Paris-Saclay University, France Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively. Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man? Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man?Common performance criterion for prediction: cumulated log-lossLT := −T −1∑︁t=0to be minimized.log p t+1 (xt+1 | x1...t ) Sequential predictionSequential prediction problem: given observations x1 , . . . , xt , build aprobabilistic model p t+1 for xt+1 , iteratively.Example: given that w women and m men entered this room, whatis the probability that the next person who enters is a woman/man?Common performance criterion for prediction: cumulated log-lossLT := −T −1∑︁log p t+1 (xt+1 | x1...t )t=0to be minimized.This corresponds to compression cost, and is also equal to squareloss for Gaussian models. Maximum likelihood estimatorMaximum likelihood strategy: Fix a parametric model p

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A divergence function defines a Riemannian metric G and dually coupled affine connections (∇, ∇  ∗ ) with respect to it in a manifold M. When M is dually flat, a canonical divergence is known, which is uniquely determined from {G, ∇, ∇  ∗ }. We search for a standard divergence for a general non-flat M. It is introduced by the magnitude of the inverse exponential map, where α = -(1/3) connection plays a fundamental role. The standard divergence is different from the canonical divergence.
 

GSI – 2015 - ParisStandard Divergence in Manifoldof Dual Affine ConnectionsShun-ichi Amari (RIKEN Brain Science Institute)Nihat Ay (Max-Planck Inst. Mathematics in Science) Divergence and metricD p:qD:0d1gij2d idjO d3G : Riemannian metric, positive-definite Divergence and dual affine connectionsijkijkijkijkD:ijkijkD:ii;jj Dual geometryM , g, ,X Y,ZXM , g,T ,ijkoijkY,ZTijk2TijkY,ijkXZijko: Levi-Civita connection Dual geometryM : dually flat :Dcanonical divergence,:Bregman divergence Exponential map :t00qpXp1q0Xgeodesiclog p q Exponential map divergenceD p:qX p:q2q-geodesic-divergenceDp:qXpp:q2 Theorem 1. Exponential map divergenceinducesTheorem 2.3 geometry13exponential mapdivergence recovers the original geometryStandard divergence: Dstanp:qX1/3p, q2 Dstan [ p : q ]Dstan p : qD *stan [q : p ]1X21/3p, q2X 1/3 q, pRemark: dually flat caseD[ p : q ]10t2(t ) dtDstan2Dcan Divergence and projectionˆparg min D p : qq SSpˆpprojection theorem:Xc grad q D p : q

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The statistical structure on a manifold M is predicated upon a special kind of coupling between the Riemannian metric g and a torsion-free affine connection ∇ on the TM, such that ∇ g is totally symmetric, forming, by definition, a “Codazzi pair” { ∇ , g}. In this paper, we first investigate various transformations of affine connections, including additive translation (by an arbitrary (1,2)-tensor K), multiplicative perturbation (through an arbitrary invertible operator L on TM), and conjugation (through a non-degenerate two-form h). We then study the Codazzi coupling of ∇ with h and its coupling with L, and the link between these two couplings. We introduce, as special cases of K-translations, various transformations that generalize traditional projective and dual-projective transformations, and study their commutativity with L-perturbation and h-conjugation transformations. Our derivations allow affine connections to carry torsion, and we investigate conditions under which torsions are preserved by the various transformations mentioned above. Our systematic approach establishes a general setting for the study of Information Geometry based on transformations and coupling relations of affine connections – in particular, we provide a generalization of conformal-projective transformation.
 

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper address the problem of online learning finite statistical mixtures of exponential families. A short review of the Expectation-Maximization (EM) algorithm and its online extensions is done. From these extensions and the description of the k-Maximum Likelihood Estimator (k-MLE), three online extensions are proposed for this latter. To illustrate them, we consider the case of mixtures of Wishart distributions by giving details and providing some experiments.
 

Online k-MLE for mixture modelling withexponential familiesChristophe Saint-JeanFrank NielsenGeometry Science Information 2015Oct 28-30, 2015 - Ecole Polytechnique, Paris-Saclay Application ContextWe are interested in building a system (a model) which evolveswhen new data is available:x1 , x2 , . . . , xN , . . .The time needed for processing a new observation must beconstant w.r.t the number of observations.The memory required by the system is bounded.Denote π the unknown distribution of X2/27 Outline of this talk1Online learning exponential families2Online learning of mixture of exponential familiesIntroduction, EM, k-MLERecursive EM, Online EMStochastic approximations of k-MLEExperiments3Conclusions3/27 Reminder : (Regular) Exponential FamilyFirstly, π will be approximated by a member of a (regular)exponential family (EF):EF = {f (x; θ) = exp { s(x), θ + k(x) − F (θ)|θ ∈ Θ}Terminology:λ source parameters.θ natural parameters.η expectation parameters.s(x) sufficient statistic.k(x) auxiliary carrier measure.F (θ) the log-normalizer:differentiable, strictlyconvexΘ = {θ ∈ RD |F (θ) < ∞}is an open convex setAlmost all common distributions are EF members but uniform,Cauchy distributions.4/27 Reminder : Maximum Likehood Estimate (MLE)Maximum Likehood Estimate for general p.d.f:Nˆθ(N) = argmaxθi=11f (xi ; θ) = argmin −NθNlog f (xi ; θ)i=1assuming a sample χ = {x1 , x2 , ..., xN } of i.i.d observations.Maximum Likehood Estimate for an EF:1ˆθ(N) = argmin −s(xi ), θ − cst(χ) + F (θ)Nθiwhich is exactly solved in H, the space of expectation parameters:η (N) =ˆ1ˆF (θ(N) ) =Nˆs(xi ) ≡ θ(N) = ( F )−1i1Ns(xi )i5/27 Exact Online MLE for exponential familyA recursive formulation is easily obtainedAlgorithm 1: Exact Online MLE for EFInput: a sequence S of observationsInput: Functions s and ( F )−1 for some EFOutput: a sequence of MLE for all observations seen beforeη (0) = 0; N = 1;ˆfor xN ∈ S doη (N) = η (N−1) + N −1 (s(xN ) − η (N−1) );ˆˆˆyield η (N) or yield ( F )−1 (ˆ(N) );ˆηN = N + 1;Analytical expressions of ( F )−1 exist for most EF (but not all)6/27 Case of Multivariate normal distribution (MVN)Probability density function of MVN:d11T Σ−1 (x−µ)N (x; µ, Σ) = (2π)− 2 |Σ|− 2 exp− 2 (x−µ)One possible decomposition:N (x; θ1 , θ2 ) = exp{ θ1 , x + θ2 , −xx T F1d1−1− t θ1 θ2 θ1 − log(π) + log |θ2 |}422=⇒s(x) = (x, −xx T )1TT( F )−1 (η1 , η2 ) = (−η1 η1 − η2 )−1 η1 , 2 (−η1 η1 − η2 )−17/27 Case of the Wishart distributionSee details in the paper.8/27 Finite (parametric) mixture modelsNow, π will be approximated by a finite (parametric) mixturef (·; θ) indexed by θ:Kπ(x) ≈ f (x; θ) =Kwj fj (x; θj ),0 ≤ wj ≤ 1,j=1wj = 1j=10.050.100.150.20Unknown true distribution f*Mixture distribution fComponents density functions f_j0.000.1 * dnorm(x) + 0.6 * dnorm(x, 4, 2) + 0.3 * dnorm(x, −2, 0.5)0.25where wj are the mixing proportions, fj are the componentdistributions.When all fj ’s are EFs, it is called a Mixture of EFs (MEF).−505x109/27 Incompleteness in mixture modelsincompleteobservableχ = {x1 , . . . , xN }deterministic←completeunobservableχc = {y1 = (x1 , z1 ), . . . , yN }Zi ∼ catK (w )Xi |Zi = j ∼ fj (·; θj )For a MEF, the joint density p(x, z; θ) is an EF:K[z = j]{log(wj ) + θj , sj (x) + kj (x) − Fj (θj )}log p(x, z; θ) =j=1K=j=1[z = j]log wj − Fj (θj ),[z = j] sj (x)θj+ k(x, z)10/27 Expectation-Maximization (EM) [1]ˆThe EM algorithm maximizes iteratively Q(θ; θ(t) , χ).Algorithm 2: EM algorithmˆInput: θ(0) initial parameters of the modelInput: χ(N) = {x1 , . . . , xN }ˆ ∗Output: A (local) maximizer θ(t ) of log f (χ; θ)t ← 0;repeatˆCompute Q(θ; θ(t) , χ) := Eθ(t) [log p(χc ; θ)|χ] ;ˆˆ(t+1) = argmaxθ Q(θ; θ(t) , χ) ;ˆChoose θ// E-Step// M-Stept ← t +1;until Convergence of the complete log-likehood;11/27 EM for MEFFor a mixture, the E-Step is always explicit:(t)ˆ(t)wj f (xi ; θj )ˆ(t)(t)ˆ(t)zi,j = wj f (xi ; θj )/ˆˆjFor a MEF, the M-Step then reduces to:K(t)ˆi zi,j(t)ˆi zi,j sj (xi )ˆθ(t+1) = argmax{wj ,θj } j=1,log wj − Fj (θj )θjN(t+1)wjˆ(t)=zi,j /Nˆi=1(t+1)ηjˆ=ˆ(t+1) ) =F ( θj(t)ˆi zi,j sj (xi )(t)ˆi zi,j(weighted average of SS)12/27 k-Maximum Likelihood Estimator (k-MLE) [2]The k-MLE introduces a geometric split χ =accelerate EM :(t)(t)Kˆj=1 χjto(t)ˆzi,j = [argmax wj f (xi ; θj ) = j]˜jEquivalently, it amounts to maximize Q over partition Z [3]For a MEF, the M-Step of the k-MLE then reduces to:(t)K|χj |ˆlog wj − Fj (θj ),(t) sj (xi )θjx ∈χˆˆθ(t+1) = argmax{wj ,θj } j=1ij(t)(t+1)wjˆ(t)= |χj |/Nˆ(t+1)ηjˆ=ˆ(t+1) ) =F (θjxi ∈χjˆsj (xi )(t)|χj |ˆ(cluster-wise unweighted average of SS)13/27 Online learning of mixturesConsider now the online settingx1 , x2 , . . . , xN , . . .ˆDenote θ(N) or η (N) the parameter estimate after dealing NˆobservationsˆDenote θ(0) or η (0) their initial valuesˆRemark: For a fixed-size dataset χ, one may apply multiplepasses (with shuffle) on χ.The increase in the likelihood function is no more guaranteedafter an iteration.14/27 Stochastic approximations of EM(1)Two main approaches to online EM-like estimation:Stochastic M-Step : Recursive EM (1984) [5]ˆˆˆθ(N) = θ(N−1) + {NIc (θ(N−1) }−1θˆlog f (xN ; θ(N−1) )where Ic is the Fisher Information matrix for the completedata:log p(x, z; θ)ˆIc (θ(N−1) ) = −Eθ(N−1)ˆ∂θ∂θTjA justification for this formula comes from the Fisher’sIdentity:log f (x; θ) = Eθ [log p(x, z; θ)|x]One can recognize a second order Stochastic Gradient Ascentwhich requires to update and invert Ic after each iteration.15/27 Stochastic approximations of EM(2)Stochastic E-Step : Online EM (2009) [7]ˆˆˆQ(N) (θ) = Q(N−1) (θ)+α(N) Eθ(N−1) [log p(xN , zN ; θ)|xN ] − Q(N−1) (θ)ˆIn case of a MEF, the algorithm works only with the cond.expectation of the sufficient statistics for complete data.zN,j = Eθ(N−1) [zN,j |xN ]ˆˆ(N)Swjˆ(N)Sθj=ˆ(N−1)Swjˆ(N−1)Sθj+αzN,jˆ−zN,j sj (xN )ˆ(N)ˆ(N−1)Swjˆ(N−1)SθjThe M-Step is unchanged:(N)wjˆ(N)(N)ˆ= ηwj = Swjˆ(N)ˆ(N)ˆ(N) ˆ(N)θj = ( Fj )−1 (ˆθj = Sθj /Swj )η16/27 Stochastic approximations of EM(3)Some properties:ˆInitial values S (0) may be used for introducing a ”prior”:(0)ˆ(0)ˆ(0)Swj = wj , Sθj = wj ηjParameters constraints are automatically respectedNo matrix to invert !Policy for α(N) has to be chosen (see [7])Consistent, asymptotically equivalent to the recursive EM !!17/27 Stochastic approximations of k-MLE(1)In order to keep previous advantages of online EM for an onlinek-MLE, our only choice concerns the way to affect xN to a cluster.Strategy 1 Maximize the likelihood of the complete data(xN , zN )(N−1)ˆzN,j = [argmax wj˜jˆ(N−1) ) = j]f (xN ; θjEquivalent to Online CEM and similar to Mac-Queeniterative k-Means.18/27 Stochastic approximations of k-MLE(2)Strategy 2 Maximize the likelihood of the complete data(xN , zN ) after the M-Step:(N)zN,j = [argmax wj˜ˆjˆ(N)f (xN ; θj ) = j]Similar to Hartigan’s method for k-means.Additional cost: pre-compute all possibleM-Steps for the Stochastic E -Step.19/27 Stochastic approximations of k-MLE(3)Strategy 3 Draw zN,j from the categorical distribution˜(N−1)zN sampled from CatK ({pj = log(wj˜ˆˆ(N−1) ))}j )fj (xN ; θjSimilar to sampling in Stochastic EM [3]The motivation is to try to break theinconsistency of k-MLE.For strategies 1 and 3, the M-Step reduces the update of theparameters for a single component.20/27 Experiments2True distribution π = 0.5N (0, 1) + 0.5N (µ2 , σ2 )Different values for µ2 , σ2 for more or less overlap betweencomponents.A small subset of observations has be taken for initialization(k-MLE++ / k-MLE).Video illustrating the inconsistency of online k-MLE.21/27 Experiments on Wishart22/27 Conclusions - Future worksOn consistency:EM, Online EM are consistentk-MLE, online k-MLE (Strategies 1,2) are inconsistent(due to the Bayes error in maximizing the classificationlikelihood)Online stochastic k-MLE (Strategy 3) : consistency ?So, when components overlap, online EM > k-MLE > onlinek-MLE for parameter learning.Need to study how the dimension influences theinconstancy/convergence rate for online k-MLE.Convergence rate is lower for online methods (sub-linearconvergence of the SGD)Time for an update vs sample size:online k-MLE (1,3) < online EM < online k-MLE (2) << k-MLE23/27 online EM appears to be the best compromise !!24/27 References IDempster, A.P., Laird, N.M. and Rubin, D.B.:Maximum likelihood from incomplete data via the EMalgorithm.Journal of the Royal Statistical Society. Series B(Methodological), pp. 1–38, 1977.Nielsen, F.:On learning statistical mixtures maximizing the completelikelihoodBayesian Inference and Maximum Entropy Methods in Scienceand Engineering (MaxEnt 2014), AIP Conference ProceedingsPublishing, 1641, pp. 238-245, 214.Celeux, G. and Govaert, G.:A classification EM algorithm for clustering and two stochasticversions.Computational Statistics and Data Analysis, 14(3), pp.315-332, 1992.25/27 References IISam´, A., Ambroise, C., Govaert, G.:eAn online classification EM algorithm based on the mixturemodelStatistics and Computing, 17(3), pp. 209–218, 2007.Titterington, D. M. :Recursive Parameter Estimation Using Incomplete Data.Journal of the Royal Statistical Society. Series B(Methodological), Volume 46, Number 2, pp. 257–267, 1984.Amari, S. I. :Natural gradient works efficiently in learning.Neural Computation, Volume 10, Number 2, pp. 251?276,1998.Capp´, O., Moulines, E.:eOn-line expectation-maximization algorithm for latent datamodels.Journal of the Royal Statistical Society. Series B(Methodological), 71(3):593-613, 2009.26/27 References IIINeal, R. M., Hinton, G. E.:A view of the EM algorithm that justifies incremental, sparse,and other variants.In Jordan, M. I., editor, Learning in graphical models, pages355-368. MIT Press, Cambridge, 1999.Bottou, L´on :eOnline Algorithms and Stochastic Approximations.Online Learning and Neural Networks, Saad, DavidEds.,Cambridge University Press, 1998.27/27

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We discuss the optimization of the stochastic relaxation of a real-valued function, i.e., we introduce a new search space given by a statistical model and we optimize the expected value of the original function with respect to a distribution in the model. From the point of view of Information Geometry, statistical models are Riemannian manifolds of distributions endowed with the Fisher information metric, thus the stochastic relaxation can be seen as a continuous optimization problem defined over a differentiable manifold. In this paper we explore the second-order geometry of the exponential family, with applications to the multivariate Gaussian distributions, to generalize second-order optimization methods. Besides the Riemannian Hessian, we introduce the exponential and the mixture Hessians, which come from the dually flat structure of an exponential family. This allows us to obtain different Taylor formulæ according to the choice of the Hessian and of the geodesic used, and thus different approaches to the design of second-order methods, such as the Newton method.
 

GSI2015 2nd conference on Geometric Science of Information28-30 Oct 2015 Ecole Polytechnique Paris-SaclaySecond-order Optimization over theMultivariate Gaussian DistributionLuigi Malag`o1 Shinshu2 de12Giovanni PistoneUniversity JP & INRIA Saclay FRCastro Statistics, Collegio Carlo Alberto, Moncalieri IT Introduction• This is is the presentation by Giovanni of the paper with the sametitle in the Proceedings.• Unfortunately, Giovanni is the least qualified of the two authors topresent this specific application of Information Geometry, hisspecific field of expertise being non-parametric InformationGeometry and its applications in Probability and Statistical Physics.Luigi is currently working in Japan and could not make it.• Among the two of us, Luigi is the responsible for the idea of usinggradient methods and later, Newton methods, in black boxoptimization. Our collaboration started with the preparation of theFOGA 2011 paper•L. Malag`, M. Matteucci, and G. Pistone. Towards the geometry of estimation of distribution algorithmsobased on the exponential family.In Proceedings of the 11th workshop on Foundations of genetic algorithms, FOGA ’11, pages 230–242,New York, NY, USA, 2011. ACM Summary1. Geometry of the Exponential Family2. Second-Order Optimization: The Newton Method3. Applications to the Gaussian Distribution4. Discussion and Future Work• A short introduction for Taylor formulæ on Gaussian exponentialfamilies is provided. The binary case has been previously discussed in•L. Malag` and G. Pistone. Combinatorial optimization with information geometry: Newton method.oEntropy, 16:4260–4289, 2014.• Riemannian Newton methods are discussed in a Session of thisConference cf,•P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds.Princeton University Press, Princeton, NJ, 2008.With a foreword by Paul Van Dooren• The focus of this short presenation is on a specific framework forInformation Geometry we call statistical bundle. Hilbert vs Tangent vs Statistical Bundle••S. Amari. Dual connections on the Hilbert bundles of statistical models.In Geometrization of statistical theory (Lancaster, 1987), pages 123–151, Lancaster, 1987. ULDM PublR. E. Kass and P. W. Vos. Geometrical foundations of asymptotic inference.Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York,1997. Statistical Bundle: Gaussian case• Hα (x), x ∈ Rm , are Hermite polynomials of order 1 and 2.2• E.g, m = 3, H010 (x) = x2 , H011 (x) = x2 x3 , H020 (x) = x2 − 1.• The Gaussian model with sufficient statisticsB = {X1 , . . . , Xn } ⊂ {Hα ||α| = 1, 2}, isnN = p(x; θ) = exp θj Xj − ψ(θ)j=1• The fibers are Vp = Span (Xj − Ep [Xj ]|j = 1, . . . , n)• The statistical bundle isSN = {(p, U)|p ∈ N , U ∈ Vp }• Each U ∈ Vp , p ∈ N , is a polynomial of degree up to 2 andt → Eq etU is finite around 0, q ∈ N• Every polynomial X belongs to ∩q∈N L2 (q) Parallel transportsDefinition• e-transport:eUq : VppU → U − Eq [U] ∈ Vq .• m-transport: for each U ∈ Vp and V ∈ VqU, m Up Vqp=eUq U, VpqProperties• e Ur e Uq = e Urqpp••mU r m Uq = m Urqppe• IfUq U, m Uq VppqpV2q= U, V∈ L (p), thenmpUp Vqis its orthogonal projection onto Vp . Parallel transports in coordinates IWe define on the statistical bundle SN a system of moving frames.1. The exponential frame of the fiber Sp N = Vp is the vector basisBp = {Xj − Ep [Xj ]|j = 1, . . . , n}2. Each element U ∈ Vp is uniquely written asnαj (U)(Xj − Ep [Xj ]) = α(U)T (X − Ep [X ])U=j=13. The expression in the exponential frame of the scalar product is theFisher information matrix:eIij (p) = Xi − Ep [Xi ] , Xj − Ep [Xj ]p= Covp (Xi , Xj ) =4.eU → α(U) = I (p)−1 Covp (X , U)∂2θj ψ(θ)∂θi 2 Parallel transports in coordinates II5. The mixture frame of the fiber Sp N = Vp isneI (p)−1Bp =e ijI (p)(Xi − Ep [Xi ]) j = 1, . . . , ni=16. Each element V ∈ Vp is uniquely written asnV

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We prove the equivalence of two online learning algorithms, mirror descent and natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-Euclidean manifold. Natural gradient descent selects the steepest descent direction along a Riemannian manifold by multiplying the standard gradient by the inverse of the metric tensor. Mirror descent induces non-Euclidean structure by solving iterative optimization problems using different proximity functions. In this paper, we prove that mirror descent induced by a Bregman divergence proximity functions is equivalent to the natural gradient descent algorithm on the Riemannian manifold in the dual coordinate system.We use techniques from convex analysis and connections between Riemannian manifolds, Bregman divergences and convexity to prove this result. This equivalence between natural gradient descent and mirror descent, implies that (1) mirror descent is the steepest descent direction along the Riemannian manifold corresponding to the choice of Bregman divergence and (2) mirror descent with log-likelihood loss applied to parameter estimation in exponential families asymptotically achieves the classical Cramér-Rao lower bound.
 

Information geometry of mirror descentGeometric Science of InformationAnthea MonodDepartment of Statistical ScienceDuke University Information Initiative at DukeG. Raskutti (UW Madison) and S. Mukherjee (Duke)29 Oct 2015Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20151 / 18 Optimization of large-scale problemsOptimization of a function f (θ) where θ ∈ Rp .√O( p) - convergence rate of standard subgradient descent. A problem inmodern optimization, e.g. machine learning.Mirror descent [A Nemirovski, 1979. A Beck & M Teboulle, 2003]:O(log p) - convergence rate of mirror descent. Widely used tool inoptimization and machine learning.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20152 / 18 Differential geometry in statistics(1) Cram´r-Rao lower bound (Rao 1945) - Lower bound on the varianceeof an estimator is a function of curvature. Sometimes calledCram´r-Rao-Fr´chet-Darmois lower bound.ee(2) Invariant (non-informative) priors (Jeffreys 1946) - An uniformativeprior distribution for a parameter space is based on a differential form.(3) Information geometry (Amari 1985) - Differential geometry ofprobability distributions.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20153 / 18 Stochastic gradient descentGiven a convex differentiable cost function, f : Θ → R.Generate a sequence of parameters {θt }∞ which incur a loss f (θt ) thatt=1minimize regret at a time T , T f (θt ).t=1One solutionθt+1 = θt − αt f (θt ),where (αt )∞ denotes a sequence of step-sizes.t=0Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20154 / 18 Natural gradientFor certain cost functions (log-likelihoods of exponential family models)the set of parameters Θ are supported on a p-dimensional Riemannianmanifold, (M, H).Typically the metric tensor H = (hjk ) is determined by the Fisherinformation matrix(I(θ))ij = EDataAnthea Monod (Duke)∂f (x; θ)∂θi∂f (x; θ)∂θjInformation geometry of mirror descentθ,i, j = 1, . . . , p.29 Oct 20155 / 18 Natural gradientGiven a cost function f on the Riemannian manifold f : M → R, thenatural gradient descent step is:θt+1 = θt − αt H−1 (θt ) f (θt ),where H−1 is the inverse of the Riemannian metric.The natural gradient algorithm steps in the direction of steepest descentalong the Riemannian manifold (M, H). It requires a matrix inversion.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 20156 / 18 Mirror descentGradient descent can be writtenθt+1 = arg minθ∈Θθ, f (θt ) +1θ − θt2αt22.For a (strictly) convex proximity function Ψ : Rp × Rp → R+ mirrordescent is1θt+1 = arg min θ, f (θt ) + Ψ(θ, θt ) .θ∈ΘαtAnthea Monod (Duke)Information geometry of mirror descent29 Oct 20157 / 18 Bregman divergenceLet G : Θ → R be a strictly convex twice-differentiable function theBregman divergence isBG (θ, θ ) = G (θ) − G (θ ) −Anthea Monod (Duke)G (θ ), θ − θ .Information geometry of mirror descent29 Oct 20158 / 18 Bregman divergences for exponential familyFamilyN (θ, Ip×p )Poi(e θ )Be11+e −θAnthea Monod (Duke)G (θ)122 θ 2exp(θ)BG (θ, θ )122 θ−θ 2exp (θ/θ ) − exp(θ ), θ − θlog(1 + exp(θ))log1+e θ1+e θInformation geometry of mirror descent−eθ1+e θ,θ − θ29 Oct 20159 / 18 Mirror descentMirror descent using the Bregman divergence as the proximity functionθt+1 = arg minθAnthea Monod (Duke)θ, f (θt ) +1BG (θ, θt ) .αtInformation geometry of mirror descent29 Oct 201510 / 18 Convex dualsThe convex conjugate function for a function G is defined to be:H(µ) := sup { θ, µ − G (θ)} .θ∈ΘLet µ = g (θ) ∈ Φ be the extremal point of the dual. The dual Bregnmandivergence BH : Φ × Φ → R+ isBH (µ, µ ) = H(µ) − H(µ ) −Anthea Monod (Duke)H(µ ), µ − µ .Information geometry of mirror descent29 Oct 201511 / 18 Dual Bregman divergences for exponential familyG (θ)122 θ 2exp(θ)H(µ)122 µ 2µ, log µ − µBH (µ, µ )122 µ−µ 2µµ log µlog(1 + exp(θ))η log µ(1 − µ) log+(1 − µ) log(1 − µ)µ+µ log µAnthea Monod (Duke)Information geometry of mirror descent1−µ1−µ29 Oct 201512 / 18 Manifolds in primal and dual co-ordinatesBG (·, ·) induces a Riemannian manifold (Θ,co-ordinates.2G )in the primalΦ be the image of Θ under the continuous map g = G .BH : Φ × Φ → R+ induces the same Riemannian manifold (Φ,dual co-ordinates Φ.Anthea Monod (Duke)Information geometry of mirror descent2 H)29 Oct 2015under13 / 18 EquivalenceTheorem (Raskutti, Mukherjee)The mirror descent step with Bregman divergence defined by G applied tofunction f in the space Θ is equivalent to the natural gradient step alongRiemannian manifold (Φ, 2 H) in dual co-ordinates.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 201514 / 18 ConsequencesExponential family with density: p(y | θ) = h(y ) exp( θ, y − G (θ)).Consider the following mirror descent step given ytθt+1 = arg minθθ,θ BG (θ, h(yt ))|θ=θt+1BG (θ, θt ) .αtIn dual coordinates one would minimizeft (µ; yt ) = − log p(yt | µ) = BH (yt , µ).The natural gradient step isµt+1 = µt − αt [2H(µt )]−1 BH (yt , µt ),= µt+1 = µt − αt (µt − yt ),the curvature of the loss BH (yt , µt ) matches the metric tensorAnthea Monod (Duke)Information geometry of mirror descent2 H(µ).29 Oct 201515 / 18 Statistical efficiencyGiven independent samples YT = (y1 , ..., yT ) and a sequence of unbiasedestimators µT is Fisher efficient iflim EYT [(µT − µ)(µT − µ)T ] →T →∞where2H1T2H,is the inverse of the Fisher information matrix.Theorem (Raskutti, Mukherjee)The mirror descent step applied to the log loss (??) with step-sizes αt =asymptotically achieves the Cram´r-Rao lower bound.eAnthea Monod (Duke)Information geometry of mirror descent29 Oct 20151t16 / 18 Challenges(1) Information geometry on mixture of manifolds.(2) Proximity functions for functions over the Grassmannian.(3) EM algorithms for mixtures.Anthea Monod (Duke)Information geometry of mirror descent29 Oct 201517 / 18 AcknowledgementsFunding:Center for Systems Biology at DukeNSF DMS and CCFDARPAAFOSRNIHAnthea Monod (Duke)Information geometry of mirror descent29 Oct 201518 / 18

Geometry of Time Series and Linear Dynamical systems (chaired by Bijan Afsari, Arshia Cont)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present in this paper a novel non-parametric approach useful for clustering independent identically distributed stochastic processes. We introduce a pre-processing step consisting in mapping multivariate independent and identically distributed samples from random variables to a generic non-parametric representation which factorizes dependency and marginal distribution apart without losing any information. An associated metric is defined where the balance between random variables dependency and distribution information is controlled by a single parameter. This mixing parameter can be learned or played with by a practitioner, such use is illustrated on the case of clustering financial time series. Experiments, implementation and results obtained on public financial time series are online on a web portal http://www.datagrapple.com .
 

IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionClustering Random Walk Time SeriesGSI 2015 - Geometric Science of InformationGautier Marti, Frank Nielsen, Philippe Very, Philippe Donnat29 October 2015Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionContext (data from www.datagrapple.com)Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat is a clustering program?DefinitionClustering is the task of grouping a set of objects in such a waythat objects in the same group (cluster) are more similar to eachother than those in different groups.Example of a clustering programWe aim at finding k groups by positioning k group centers{c1 , . . . , ck } such that data points {x1 , . . . , xn } minimizeminc1 ,...,ckni=1mink d(xi , cj )2j=1But, what is the distance d between two random walk time series?Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat are clusters of Random Walk Time Series?French banks and building materialsCDS over 2006-2015Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionWhat are clusters of Random Walk Time Series?French banks and building materialsCDS over 2006-2015Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeometry of RW TS ≡ Geometry of Random Variablesi.i.d. observations:X1 :X2 :XN :12TX1 , X1 , . . . , X11,2,TX2X2. . . , X2..., ..., ..., ..., ...12TXN , XN , . . . , XNWhich distances d(Xi , Xj ) between dependent random variables?Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionPitfalls of a basic distance2Let (X , Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σX ),2 ) and whose correlation is ρ(X , Y ) ∈ [−1, 1].Y ∼ N (µY , σYE[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 + 2σX σY (1 − ρ(X , Y ))Now, consider the following values for correlation:22ρ(X , Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σX + σY .Assume µX = µY and σX = σY . For σX = σY1, weobtain E[(X − Y )2 ]1 instead of the distance 0, expectedfrom comparing two equal Gaussians.ρ(X , Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 .Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionPitfalls of a basic distance22Let (X , Y ) be a bivariate Gaussian vector, with X ∼ N (µX , σX ), Y ∼ N (µY , σY ) and whose correlation isρ(X , Y ) ∈ [−1, 1].222E[(X − Y ) ] = (µX − µY ) + (σX − σY ) + 2σX σY (1 − ρ(X , Y ))Now, consider the following values for correlation:22ρ(X , Y ) = 0, so E[(X − Y )2 ] = (µX − µY )2 + σX + σY . Assume µX = µY and σX = σY . ForσX = σ Y1, we obtain E[(X − Y )2 ]1 instead of the distance 0, expected from comparing twoequal Gaussians.ρ(X , Y ) = 1, so E[(X − Y )2 ] = (µX − µY )2 + (σX − σY )2 .0.400.350.300.250.200.150.100.050.003020100102030Probability density functions of Gaussians N (−5, 1) and N (5, 1), Gaussians N (−5, 3) and N (5, 3), andGaussians N (−5, 10) and N (5, 10).Green, red and blue Gaussians areequidistant using L2 geometry on theparameter space (µ, σ).Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionSklar’s TheoremTheorem (Sklar’s Theorem (1959))For any random vector X = (X1 , . . . , XN ) having continuousmarginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulative distribution P isuniquely expressed asP(X1 , . . . , XN ) = C (P1 (X1 ), . . . , PN (XN )),where C , the multivariate distribution of uniform marginals, isknown as the copula of X .Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionSklar’s TheoremTheorem (Sklar’s Theorem (1959))For any random vector X = (X1 , . . . , XN ) having continuous marginal cdfs Pi , 1 ≤ i ≤ N, its joint cumulativedistribution P is uniquely expressed as P(X1 , . . . , XN ) = C (P1 (X1 ), . . . , PN (XN )), where C , the multivariatedistribution of uniform marginals, is known as the copula of X .Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Copula TransformDefinition (The Copula Transform)Let X = (X1 , . . . , XN ) be a random vector with continuousmarginal cumulative distribution functions (cdfs) Pi , 1 ≤ i ≤ N.The random vectorU = (U1 , . . . , UN ) := P(X ) = (P1 (X1 ), . . . , PN (XN ))is known as the copula transform.Ui , 1 ≤ i ≤ N, are uniformly distributed on [0, 1] (the probabilityintegral transform): for Pi the cdf of Xi , we havex = Pi (Pi −1 (x)) = Pr(Xi ≤ Pi −1 (x)) = Pr(Pi (Xi ) ≤ x), thusPi (Xi ) ∼ U[0, 1].Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Copula TransformDefinition (The Copula Transform)Let X = (X1 , . . . , XN ) be a random vector with continuous marginal cumulative distribution functions (cdfs) Pi ,1 ≤ i ≤ N. The random vector U = (U1 , . . . , UN ) := P(X ) = (P1 (X1 ), . . . , PN (XN )) is known as the copulatransform.ρ ≈ 0.842Y ∼ ln(X)00.8PY ( Y)2468100.2ρ =11.21.00.60.40.20.00.00.20.40.60.8X ∼U[0,1]1.01.20.20.20.00.20.40.6PX ( X )0.81.0The Copula Transform invariance to strictly increasing transformationGautier Marti, Frank NielsenClustering Random Walk Time Series1.2 IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionDeheuvels’ Empirical Copula TransformttLet (X1 , . . . , XN ), 1 ≤ t ≤ T , be T observations from a random vector (X1 , . . . , XN ) with continuous margins.ttttSince one cannot directly obtain the corresponding copula observations (U1 , . . . , UN ) = (P1 (X1 ), . . . , PN (XN )),where t = 1, . . . , T , without knowing a priori (P1 , . . . , PN ), one can insteadDefinition (The Empirical Copula Transform)T1estimate the N empirical margins PiT (x) = T t=1 1(Xit ≤ x),1 ≤ i ≤ N, to obtain the T empirical observationsTtTt˜t˜t(U1 , . . . , UN ) = (P1 (X1 ), . . . , PN (XN )).˜Equivalently, since Uit = Rit /T , Rit being the rank of observationXit , the empirical copula transform can be considered as thenormalized rank transform.In practicex_transform = rankdata(x)/len(x)Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeneric Non-Parametric Distance2dθ (Xi , Xj ) = θ3E |Pi (Xi ) − Pj (Xj )|21+ (1 − θ)2RdPi−dλdPjdλ2dλ(i) 0 ≤ dθ ≤ 1, (ii) 0 < θ < 1, dθ metric,(iii) dθ is invariant under diffeomorphismGautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionGeneric Non-Parametric Distance2d0 :12RdPidλ−dPjdλ2dλ = Hellinger22d1 : 3E |Pi (Xi ) − Pj (Xj )|2 =1 − ρS= 2−62Remark:If f (x, θ) = cΦ (u1 , . . . , uN ; Σ)Ni=1 fi (xi ; νi )11C (u, v )dudv00thenN2ds 2 = dsGaussCopula +2dsmarginsi=1Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionThe Hierarchical Block ModelA model of nested partitionsThe nested partitions defined by themodel can be seen on the distancematrix for a proper distance and theright permutation of the data pointsGautier Marti, Frank NielsenIn practice, one observe and workwith the above distance matrixwhich is identitical to the left oneup to a permutation of the dataClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionResults: Data from Hierarchical Block ModelAlgo.Distance(1 − ρ)/2HC-ALE[(X − Y )2 ]GPR θ = 0GPR θ = 1GPR θ = .5GNPR θ = 0GNPR θ = 1GNPR θ = .5(1 − ρ)/2APE[(X − Y )2 ]GPR θ = 0GPR θ = 1GPR θ = .5GNPR θ = 0GNPR θ = 1GNPR θ = .5DistribAdjusted Rand IndexCorrelCorrel+Distrib0.00 ±0.010.99 ±0.010.56 ±0.010.00 ±0.000.09 ±0.120.55 ±0.050.34 ±0.010.01 ±0.010.06 ±0.020.00 ±0.010.99 ±0.010.56 ±0.010.34 ±0.010.59 ±0.120.57 ±0.0110.00 ±0.000.17 ±0.000.00 ±0.0010.57 ±0.000.99 ±0.010.25 ±0.200.95 ±0.080.00 ±0.000.99 ±0.070.48 ±0.020.14 ±0.030.94 ±0.020.59 ±0.000.25 ±0.080.01 ±0.010.05 ±0.020.00 ±0.010.99 ±0.010.48 ±0.020.06 ±0.000.80 ±0.100.52 ±0.0210.00 ±0.000.18 ±0.010.00 ±0.0110.59 ±0.000.39 ±0.020.39 ±0.111Gautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionResults: Application to Credit Default Swap Time SeriesDistance matricescomputed on CDStime series exhibit ahierarchical blockstructureMarti, Very, Donnat,Nielsen IEEE ICMLA 2015(un)Stability ofclusters with L2distanceGautier Marti, Frank NielsenStability of clusterswith the proposeddistanceClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionConsistencyDefinition (Consistency of a clustering algorithm)A clustering algorithm A is consistent with respect to the HierarchicalBlock Model defining a set of nested partitions P if the probability thatthe algorithm A recovers all the partitions in P converges to 1 whenT → ∞.Definition (Space-conserving algorithm)A space-conserving algorithm does not distort the space, i.e. the distanceDij between two clusters Ci and Cj is such thatDij ∈minx∈Ci ,y ∈Cjd(x, y ),Gautier Marti, Frank Nielsenmaxx∈Ci ,y ∈Cjd(x, y ) .Clustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionConsistencyTheorem (Consistency of space-conserving algorithms (Andler,Marti, Nielsen, Donnat, 2015))Space-conserving algorithms (e.g., Single, Average, CompleteLinkage) are consistent with respect to the Hierarchical BlockModel.T = 100T = 1000Gautier Marti, Frank NielsenT = 10000Clustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusion1Introduction2Geometry of Random Walk Time Series3The Hierarchical Block Model4ConclusionGautier Marti, Frank NielsenClustering Random Walk Time Series IntroductionGeometry of Random Walk Time SeriesThe Hierarchical Block ModelConclusionDiscussion and questions?Avenue for research:distances on (copula,margins)clustering using multivariate dependence informationclustering using multi-wise dependence informationOptimal Copula Transport for Clustering Multivariate Time Series,Marti, Nielsen, Donnat, 2015Gautier Marti, Frank NielsenClustering Random Walk Time Series

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper highlights some more examples of maps that follow a recently introduced “symmetrization” structure behind the average consensus algorithm. We review among others some generalized consensus settings and coordinate descent optimization.
 

Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith information-theoretic linksAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsthe announced talkLimit on accelerating consensus algorithmswith information-theoretic linksseems cool … in press at IEEE Trans. Automatic ControlAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith information-theoretic linksAlain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithms“ Symmetrization “L.Mazzarella, F.Ticozzi, A.S.arXiv:1311.3364andarXiv:1303.4077 Classical consensus algorithm!x3
 x1x...
x2
x4xN
x...Consensus: reaching agreement x1 = x2 = ... = xN 
is the basis for many distributed computing tasks
Very flexible and robust convergence:
as long as the network integrated over some finite T forms aconnected graph and α(t) ∈ [αm, αM] ⊂ (0,1)

 Convergence proof idea:
shrinking convex hull!xk(t+1)
α(t) ( xj(t) – xk(t) ) + xk(t)


!!

highest value can only decrease, lowest can only increase Our initial goal:
Bringing consensus into quantum regimeDefining consensus in tensor product space?How define consensus w.r.t. correlations, entanglement,...!How to write a consensus algorithm?Standard consensus: system states xk are directly accessible for 
computation, can be linearly combined, copied, communicated...Quantum consensus: the whole quantum state / proba distribution 
cannot be measured ➯ We must physically exchange “things”
 Consensus viewed as partial swappingPairwise consensus interaction between agents ( j, k ): Consensus viewed as partial swappingPairwise consensus interaction between agents ( j, k ):stay in placeswap j with kSuch mixture of two unitary operations: stay in place and swap
can be easily implemented physically in quantum systemsor, for that matter, in other information structures Consensus operation as discrete group actionLinear action a(g,x) of G on X!!(finite) groupvector space with objects “of interest”
Target : symmetrization− xreach a state − ∈ X where a(g,x) = − for all g ∈ Gx
Property: the projection on the symmetrization set can be written Consensus operation as discrete group actionLinear action a(g,x) of G on X!!(finite) groupvector space with objects “of interest”
Dynamics :−
with the−defining a convex combination over G at each tUsually sg(t) ≠0 only for g belonging to a very restricted subset of G Lift from actions to group …* The state x(t) at any time can be written as a convex combination!!with p independent of x(0)
* The dynamics can then be lifted to the vector p(t) and written as!!! … yields consensus on group weights!!
starting point pg(0) = δ(g,e)target −g = 1/|G| for all gpPossibly large number of nodes, e.g. |G| = N! for permutation groupThe exact values of sh(t), and even the selected interactions ateach time step, need not be exactly controlled ➯ strong robustness – holds if:Convergence to p!
!!!!!Proof:* possible by analogy with classical consensus* alternative: use entropy of p(t) as strict Lyapunov function Various applicationsG = Permutations leads to random consensus by acting onclassical state values (standard consensus)classical or quantum probability distributionsG = cyclic group leads to random Fourier transform (use?)G = decoupling group links to quantum Dynamical DecouplingG = operational gates gives uniform random gate generationConsensus with antagonistic interactionsConsensus towards leader valueGradient descent and coordinate descentthe announced talk Consensus with antagonistic interactionsG = permutation matrices with arbitrary sign ±1 on each entryWeights sg: Birkhoff decomposition on |ajk| as for standard consensusThen swap weights to non-positive permutation if ajk <0Non-trivial weight assignment & convergence resultSolves previously not covered cases to distinguish {xk}=0 or {xk} = -{xj} Consensus towards leader valuealso other algorithms with (ajk) substochasticG = permutation matrices with arbitrary sign ±1 on each entryNon-trivial weight assignment (iterative procedure, see paper)Operator conclusions about which components of x converge to zero
(slightly more general than standard convergence to x=0) Gradient & Coordinate descentSearch for min of f(x) by computingAssume (sorry) f(x)= xT A xIn the eigenbasis of A this becomes a (if stable) substochastic iteration.
Not a big insight… extension:kcycle through coordinates k Gradient & Coordinate descentSearch for min of f(x) by computingAssume (sorry) f(x)= xT A xIn the eigenbasis of A this becomes a (if stable) substochastic iteration.
Not a big insight… extension:kcycle through coordinates kG = permutation matrices with arbitrary sign ±1 on each entryWeights * follow from reflection matrices around nonorthogonal directions 
* sum to 1 but may be negative➱ Study coordinate descent convergence via symmetric but possibly 
negative transition matrix: ∃ clear tools e.g. in consensus Operational viewpoint on consensusinspired by quantum consensus objectivecovers some more linear algorithmsLimit on accelerating consensus algorithmswith information-theoretic linksarXiv:1412.0402Alain Sarlette, INRIA/QUANTIC & Ghent University/SYSTeMS Add one memory, no more+kkmemoryProperly using one memory x(t-1)-x(t) allows to converge 
[Muthukrishnan et al, 98]quadratically fasterWhat about more memories? Add one memory, no more+kkmemoryProperly using one memory x(t-1)-x(t) allows to converge exponentially 
[Muthukrishnan et al, 98]quadratically fasterWhat about more memories?Our result: if graph eigenvalues can be any in [a,b] with a,b knownthen more memories do not improve worst consensus eigenvalueproof: not very information-theoretic, see arXiv:1412.0402 Interesting linksOptimization:!Nesterov method not further improvable by m(t-2),… ?
Robust control: design plant=to be stable under feedback u = -k y , k in interval!Communication theory:network Interesting linksOptimization:!Nesterov method not further improvable by m(t-2),… ?
Robust control: design plant=to be stable under feedback u = -k y , k in interval!Communication theory:networkimproves by taking direct
feedback to itself into account Interesting linksOptimization:!Nesterov method not further improvable by m(t-2),… ?
Robust control: design plant=to be stable under feedback u = -k y , k in interval!Communication theory:networkif network poorly known, no
benefit to account for longer loops

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodness-of-fit-testing in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding information-geometric 3D computer-graphical selection procedure. Some sample-size asymptotics is given as well.
 

New model search fornonlinear recursive models,regressions and autoregressionsWolfgang Stummer and Anna-Lena KißlingerFAU University of Erlangen-NürnbergTalk at GSI 2015, Palaiseau, 29/10/2015 Outline Outline• introduce a new method for model search (model preselection,structure detection) in data streams/clouds:key technical tool: density-based probability distances/divergences with “scaling”• gives much flexibility for interdisciplinary situation-based applications(also with cost functions, utility, etc.)• goal-specific handling of outliers and inliers (dampening, amplification)not directly covered today• give new general parameter-free asymptotic distributions for involveddata-derived distances/divergences• outline a corresponding information-geometric 3D computer-graphical selectionprocedure29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|3 WHY distances between (non-)probability measures (1)• “distances” D (P , Q ) between two (non-)probability measures P, Qplay a prominent role in modern statistical inferences:• parameter estimation,• testing for goodness-of-fit resp. homogenity resp. independence,• clustering,• change-point detection,• Bayesian decision proceduresas well as for other research fields such as• information theory,• signal processing including image and speech processing,• pattern recognition,• feature extraction,• machine learning,• econometrics, and• statistical physics.29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|4 WHY distances between (non-)probability measures (2)• suppose we want to describe the proximity/distance/closeness/similarity D (P , Q )of two (non-)probability distributions P and Q22e.g. P = N (µ1 , σ1 ), Q = N (µ2 , σ2 )• either two “theoretical” distributions• or two (empirical) distributions representing data(e.g. derived from frequencies,histograms, . . . )• or one of each −→ today• P, Q may live on Rd , or on “spaces of functions with appropriate properties”:e.g. potential future scenarios of a time series, or a cont.-time stochastic processe.g. functional data• exemplary statistical uses of distances D (P , Q ) −→29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|5 WHY distances between probability measures (3)Applic. 1: plane = all probability distributions (on R, Rd , a path space, . . . )we have a “distance” on this, say D (P , Q )orige.g. P := PNemp:= PN :=1N·Ni =1 δXi [·]X1, . . . , XN of size N from Qθtrue ;. . . empirical distribution of an iid sampleputs equal “weight”θ = minimum distance estimator1Non each data point.emp(e.g. θ = MLE for D (PN , Qθ ) = Kullback-Leib.)emphowever, D (PN , Qθ ) may still be large −→ “bad goodness of fit” −→ test29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|6 Time Series and Nonlinear Regressions (1)in time series, the data (describing random var.) . . . , X1, X2, . . . are non-iid:e.g. autoregressive model AR(2) of order 2:Xm+1 − ψ1 · Xm − ψ2 · Xm−1 = εm+1,m ≥ k,where (εm+1)m≥k is a family of independent and identically distributed (i.i.d.)random variables on some space Y having parametric distribution Qθ (θ ∈ Θ).compact notation: take the parameter vector £ := (2, ψ1, ψ2),the backshift operator B defined by B Xm := Xm−1,the identity operator 1 given by 1Xm := Xmthe 2−polynomial ψ1 · B + ψ2 · B 2,−→ left-hand side becomes F£ Xm+1, Xm , Xm−1, . . . , Xk = 1 −2jj =1 ψj BXm+1−→ as data-derived distribution we take the empirical distribution of left-hand sideorigPN ,£ [ · ]1:= P [ · ; Xk −1, . . . , Xk +N ; £] := ·NNδi =1[·]F£ Xk +i ,Xk +i −1 ,...,Xkwith histogram-according probability mass function (relative frequencies)£pN (y ) =29/10/2015|# i ∈ {1, . . . , N } : F£ Xk +i , . . . , Xk = yNWolfgang Stummer and Anna-Lena Kißlinger|GSI 2015# i : Xk +i − γ1 · Xk +i −1 − γ2 · Xk +i −2 = y=|N7 Time Series and Nonlinear Regressions (2)−→ 2 issues: which time series models Xi and which distances D (·, ·)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|8 Time Series and Nonlinear Regressions (3)more general: nonlinear autorecursions in the sense ofF£m+1 m+1, Xm+1, Xm , Xm−1, . . . , Xk , Zk −, am+1, am , am−1, . . . , ak = εm+1, m ≥ k ,• where (F£m+1 )m≥k is a sequence of nonlinear functions parametrized by £m+1 ∈ Γ,• (εm+1)m≥k are iid with parametric distribution Qθ (θ ∈ Θ),• (ak )m≥k are independent variables which are non-stochastic (deterministic) today,• the “backlog-input” Zk − denotes the additional input on X and a before k to get therecursion started.today, we assume k = −∞, and EQθ [εm+1] = 0, andthat the initial data Xk as well as the backlog-input Zk − are deterministic.Special case: Xm+1 = g f£m+1 (m+1, Xm , Xm−1, . . . , Xk , Zk −, am+1, am , am−1, . . . , ak ), εm+1for some appropriate functions f£m+1 and g, e.g. g (u , v ) := u + v , g (u , v ) := u · v−→ (εm+1)m≥k can be interpreted as “randomness-driving innovations (noise)”29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|9 Time Series and Nonlinear Regressions (4)our general context covers in particular• NARX models = nonlinear autoregressive models with exogenous input:is the above special case with constant parameter vector £m+1 ≡ £ and additive g.Especially:• nonlinear regressions with deterministic independent variables:the only involved X is Xm+1• AR(r) = linear autoregressive models (time series) of order r ∈ N(recall the above example with r = 2)• ARIMA(r,d,0) = linear autoregressive integrated models (time series) of order r ∈ N0and d ∈ N0• SARIMA(r,d,0)(R,D,0)s = linear seasonal autoregressive integrated models (timeseries) of order d ∈ N0 of non-seasonal differencing, order r ∈ N0 of the non-seasonalAR-part, length s ∈ N0 of a season, order D ∈ N0 of seasonal differencing and orderR ∈ N0 of the seasonal AR-part.29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|10 Divergences / similarity measures (1)• so far: motiviations for “WHY to measure theproximity/distance/closeness/similarity D (P , Q )”orighere: P = PN ,£ [ · ] (= empirical distribution of iid noises)Q = Qθ( = candidate for true distribution of iid noises)• now: “HOW to measure”, which “distance” D (P , Q ) to use ?• prominent examples for D (P , Q ):relative entropy (Kullback-Leibler information discrimination) –> MDE = MLE !!,Hellinger distance, Pearson’s Chi-Square divergence, Csiszar’s f −divergences ...−→ all will be covered by our much more general context• DESIRE: to have a toolbox {Dφ,M (P , Q ) : φ ∈ Φ, M ∈ M} which is far-reachingand flexible (reflected bydifferent choices of the “generator” φ and the scaling measure M)should also cover robustness issues !!29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|11 Divergences / similarity measures (2)• from now on: probability distributions P , Q on (X , A)non-probability distribution/(σ−)finite measure M on (X , A)we assume that all three of them have densities w.r.t. a σ−finite measure λdPdQdMp (x ) =(x ), q (x ) =(x ) and m(x ) =(x ) for a. all x ∈ Xdλdλdλ(for today: mostly X ⊂ R)• furthermore we take a “divergence (distance) generating function”φ : (0, ∞) → R which (for today) is twice differentiable, strictly convexwithout loss of generality we also assume φ(1) = 0the limit φ(0) := limt ↓0 φ(t ) always exists (but may be ∞)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|12 Scaled Bregman Divergences (1)Definition (Stu. 07, extended in Stu. & Vajda 2012 IEEE Trans. Inf. Th.)The Bregman divergence (distance) of probability distributions P , Q scaled bythe (σ−)finite measure M on (X , A) is defined byBφ (P , Q | M ) :=m (x ) φXp (x )m (x )−φq (x )m (x )−φq (x )q (x )p (x )−·m(x )m(x ) m(x )d λ(x )• if X = {x1, x2, . . . xs } where s may be infinite, and “λ is a counting measure”−→ p(·), q (·), m(·) are classical probability mass functions (“counting densities”):sBφ (P , Q | M ) =m(xi ) φi =1p(xi )m(xi )−φq (xi )m(xi )e.g. φ(t ) = (t − 1) −→ Bφ (P , Q | M ) =1N·Ni =1 δεi [·]q (xi )p(xi )q (xi )·−m(xi )m(xi ) m(xi )(p(xi )−q (xi ))2si =1m(xi )2empEx.: P := PN :=−φweighted Pearson χ2. . . empirical distribution of an iid sampleof size N from Qθtrue ; corresponding pmf = relative frequencyempp(x ) := pN (x ) :=1N· #{j ∈ {1, . . . , N } : εj = x };Q := Qθ where the “hypothetical candidate distribution” Qθ has pmf q (x ) := qθ (x )empempM := W (PN , Qθ ) with pmf m(x ) = w (pN (x ), qθ (x )) > 0 for some funct. w (·, ·)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|13 discrete case with φ(t ) = φα (t ) and m(x ) = wβ (p(x ), q (x ))3D presentation; exemplary goal: ≈ 0 for allα, β103D presentation; exemplary goal: ≈ 0 for all α, β29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|14 Bφ (P , Q | M ) with composite scalings M = W (P , Q ) (1)• from now on: M = W (P , Q ), i.e. m(x ) = w (p(x ), q (x )) for some function w (·, ·)• w (u , v ) = 1 −→ unscaled/classical Bregman distance (discr.: Pardo/Vajda 97,03)e.g. for generator φ1(t ) = t log t + 1 − t −→ Kullback-Leibler divergence (MLE)e.g. for the power functions φα (t ) :=t α −1+α−α·tα(α−1) ,α = 0, 1,−→ density power divergences of Basu et al. 98, Basu et al. 2013/14/15• new example (Kißlinger/Stu. (2015c): scaling by weighted r-th-power means:wβ,r (u , v ) := (β · u r + (1 − β) · v r )1/r , β ∈ [0, 1], r ∈ R\{0}• e.g. r = 1: arithmetic-mean-scaling (mixture scaling)subcase β = 0: w0,1 (u , v ) = v −→ all Csiszar φ−divergences/disparitiesfor φ2 (t ) one gets Pearson’s chi-square divergencesubcase β = 1 and φ2 (t ) −→ Neyman’s chi-square divergencesubcase β ∈ [0, 1] and φ2 (t ) −→ blended weight chi-square divergence, Lindsay 94subcase β ∈ [0, 1] and φα (t ) −→ Stu./Vajda (2012), Kißlinger/Stu. (2013, 2015a)√√• e.g. r = 1/2: wβ,1/2(u , v ) = (β · u + (1 − β) · v )2subcase β ∈ [0, 1] and φ2 (t ) −→ blended weight Hellinger distance: Lindsay (1994),Basu/Lindsay (1994)• e.g. r → 0: geometric-mean scaling wβ,0(u , v ) = u β · v 1−β29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|Kißlinger/Stu. (2015b)15 Some scale connectors w (u , v ) (for any generator φ)(b). w0.45,1(u, v ) = 0.45 · u + 0.55 · v(a) w0,1 (u , v ) = v Csiszar diverg.(c) w0.45,0.5 (u , v )(d) w0.45,0 (u , v ) = u 0.45 · v 0.55√√= (0.45 u + 0.55 v )229/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger(1)|GSI 2015|16 Scale connectors w (u , v ), NOT r −th power means(e) WEXPM: w0.45,˜6 (u , v )f=16med(g) w0.45 (u , v )log 0.45e6u + 0.55e6v= med{min{u , v }, 0.45, max{u , v }}.smooth(j) wadj(u , v ) with hin = −0.5, hout = 0.3, δ = 10−7 , etc.29/10/2015|(k) Parameter description forwadj (u , v )Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|17 Robustnessto obtain the robustness against outliers and inliers(i.e. high unusualnesses in data, surprising observations),as well as the (asymptotic) efficiency of our procedureis a question of a good choice of the scale connector w (·, ·)−→ another long paper Kiss. and Stu. 2015b −→ another talkwe end up with a new transparent, far-reaching 3D computer-graphical “geometric”method called density-pair adjustment functionthis is vaguely a similar task to choosing a good copulain (inter-)dependence-modelling frameworks29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|18 Universal model search UMSPD (1)recall: which time series model Xi and which distance D (·, ·)now: model search in detail;basic idea (for finite discrete distributions):under the correct (“true”) modelF£0 +1 , Qθ0 we get thatmthe sequence Fγk +i (k + i , Xk +i , Xk +i −1, ..., Xk , Zk −, ak +i , ..., ak )0i =1...Nbehaves like a size-N-sample from an iid sequence under the distribution Qθ0 , i.e.1PN [·] :=NN£0δF£0i =1k +i(k +i ,Xk +i ,Xk +i −1 ,...,Xk ,Zk − ,ak +i ,...,ak ) [·]N →∞− − Qθ0 [·]−→and thus£Dα,β PN , Qθ0 − − 0−→0N →∞for a very broad family D := Dα,β (·, ·) : α ∈ [α, α] , β ∈ β, βof distances,emp££where we use the SBDs Dα,β (PN , Qθ ) := Bφα PN , Qθ0 || Wβ (PN , Qθ0 )00for a α−family of generators φα (·) (today: the above power functions)and a β−family of scale connectors Wβ (·, ·) (today: geometric-mean scalingwβ,0(u , v ) = u β · v 1−β )29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|19 Universal model search UMSPD (2)We introduce the universal model-search by probability distance (UMSPD):1. choose F£m+1m ≥kfrom a principal parametric-function-family class2. choose some prefixed class of parametric candidate distributions {Qθ : θ ∈ Θ}3. find a parameter sequence £ := (£m+1)m≥k (often constant) and a θ ∈ Θ such that£Dα,β PN , Qθ ≈ 0for large enough sample size Nand all (α, β) ∈ [α, α] × β, β4. preselect the modelF£m+1 , Qθ if the “3D score surface” (the “mountains”)£S := {(α, β, Dα,β (PN , Qθ )) : α ∈ [α, α] , β ∈ β, β } is smaller thansome appropriatly chosen threshold T29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|(namely, a chisquare-quantile, see below)GSI 2015|20 Universal model search UMSPD (3)Graphical implementation by plotting the 3D preselection-score surface S29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|21 Universal model search UMSPD (4)ADVANTAGE OF UMSPD:after the preselection process one can continue to work with the same Dα,β (·, ·)in order to perform amongst all preselected candidate modelsa statistically sound inference in terms ofsimultaneous exact parameter-estimation and goodness-of-fit.one issue remains to be discussed for UMSPD:the choice of the threshold T29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|22 Universal model search UMSPD (5)exemplarily show how to quantify the above-mentioned preselection criterion“the 3D surface S should be smaller than a threshold T ”by some sound asymptotic analysis for the above special choices φα (·) and wβ (·, ·)the cornerstone is the following limit theoremTheoremLet Qθ0 be a finite discrete distribution with c := |Y| ≥ 2 possible outcomes and strictlypositive densities qθ0 (y ) > 0 for all y ∈ Y . Then for each α > 0, α = 1 and each β ∈ [0, 1[the random scaled Bregman power distance££12N · Bφα PN0 , Qθ0 | (PN0 )β · Qθ0−β=: 2N · B (α, β; £0, θ0; N )is asymptotically chi-squared distributed in the sense that2N · B (α, β; £0, θ0; N )L−−−→N →∞χ2−1 .cin terms of the corresponding χ2−1−quantiles, one can derive the threshold Tcwhich the 3D preselection-score surface S has to (partially) exceedin order to believe with appropriate level of confidencethat the investigated model ((F£m+1 )m≥k , Qθ ) is not good enough to be preselected.29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|23 Further Topics• can use scaled Bregman divergences for robust statistical inferenceswith “completely general asymptotic results” for other choices of φ(·) and w (·, ·)−→ Kißlinger & Stu. (2015b)• can use scaled Bregman divergences for change detection in data streams−→ Kißlinger & Stu. (2015c)• explicit formulae for Bφα (Pθ1 , Pθ2 |Pθ0 )where Pθ1 , Pθ2 , Pθ0 stem from the same arbitrary exponential family,cf. Stu. & Vajda (2012), Kißlinger & Stu. (2013);including stochastic processes (Levy processes)• we can do Bayesian decision making with important processes• non-stationary stochastic differential equations• e.g. non-stationary branching processes −→ Kammerer & Stu. (2010)• e.g. inhomogeneous binomial diffusion approximations −→ Stu. & Lao (2012)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|24 Summary• introduced a new method for model search (model preselection,structure detection) in data streams/clouds:key technical tool: density-based probability distances/divergences with “scaling”• gives much flexibility for interdisciplinary situation-based applications(also with cost functions, utility, etc.)• gave a new parameter-free asymptotic distribution result for involveddata-derived distances/divergences• outlined a corresponding information-geometric 3D computer-graphical selectionprocedure29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|25 Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distributionfrom another. J. Roy. Statist. Soc. B-28,131-140 (1966)Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation byminimising a density power divergence. Biometrika 85, 549–559 (1998)Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum DistanceApproach. CRC Press, Boca Raton (2011)Billings, S.A.: Nonlinear System Identification. Wiley, Chichester (2013)Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung aufden Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar.Acad. Sci. A-8, 85–108 (1963)Kißlinger, A.-L., Stummer, W.: Some Decision Procedures Based on ScaledBregman Distance Surfaces. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2013,LNCS 8085, pp. 479–486. Springer, Berlin (2013)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|26 Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models,regressions and autoregressions. In: F. Nielsen and F. Barbaresco (Eds.): GSI2015, LNCS 9389, Springer, Berlin (2015a)Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaledBregman divergences. Preprint (2015b).Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of ChangeDetection. Preprint (2015c).Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987)Nock, R., Piro, P., Nielsen, F., Ali, W.B.H., Barlaud, M.: Boosting k −NN forcategorization of natural sciences. Int J. Comput. Vis. 100, 294 – 314 (2012)Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall,Boca Raton (2006)Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoreticdivergences. IEEE Transaction on Information Theory 49(7), 1860 – 1868 (2003)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|27 Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete MultivariateData. Springer, New York (1988)Stummer, W.: Some Bregman distances between financial diffusion processes.Proc. Appl. Math. Mech. 7(1), 1050503 – 1050504 (2007)Stummer, W., Vajda, I.: On Bregman Distances and Divergences of ProbabilityMeasures. IEEE Transaction on Information Theory 58 (3), 1277–1288 (2012)29/10/2015|Wolfgang Stummer and Anna-Lena Kißlinger|GSI 2015|28

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In the context of sensor networks, gossip algorithms are a popular, well established technique, for achieving consensus when sensor data are encoded in linear spaces. Gossip algorithms also have several extensions to non linear data spaces. Most of these extensions deal with Riemannian manifolds and use Riemannian gradient descent. This paper, instead, studies gossip in a broader CAT(k) metric setting, encompassing, but not restricted to, several interesting cases of Riemannian manifolds. As it turns out, convergence can be guaranteed as soon as the data lie in a small enough ball of a mere CAT(k) metric space. We also study convergence speed in this setting and establish linear rates of convergence.
 

Gossip in CAT (κ) metric spacesAnass BellachehabJ´r´mie JakubowiczeeT´l´com SudParis, Institut Mines-T´l´com & CNRS UMR 5157eeeeGSI 2015Palaiseau October 281 / 21 ProblemWe consider a network of N agents such that:The network is represented by a connected, undirected graphG = (V , E ), where V = {1, . . . , N} stands for the set ofagents and E denotes the set of available communication linksbetween agents.At any given time t an agent v stores stores data representedas an element xv (t) of a data space M.Xt = (x1 (t), . . . , xN (t)) is the tuple of data values of thewhole network at instant t.2 / 21 Problem (cont’d)Each agent has its own Poisson clock that ticks with acommon intensity λ (the clocks are identically made)independently of other clocks.When an agent clock ticks, the agent is able to perform somecomputations and wake up some neighboring agents.The goal is to take the system from an initial state X (0) to aconsensus state; meaning a state of the form X∞ = (x∞ , . . . , x∞ )with: x∞ ∈ M.3 / 21 Random Pairwise Gossip (Xiao & Boyd’04)−1 −1 1 −1x0 = −2 1 124 / 21 Random Pairwise Gossip (Xiao & Boyd’04)−1 −1 1 −1x0 = −2 1 124 / 21 Random Pairwise Gossip (Xiao & Boyd’04)0 −1 0 −1x1 = −2 1 124 / 21 Random Pairwise Gossip (Xiao & Boyd’04)0 −1 0 −1x1 = −2 1 124 / 21 Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5 0 −1x1 = −2 1 0.5 0.54 / 21 Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5 0 −1x1 = −2 1 0.5 0.54 / 21 Random Pairwise Gossip (Xiao & Boyd’04)0.5 0.5−1 0 x2 = −1 0 0.5 0.54 / 21 Random Pairwise Gossip (Xiao & Boyd’04)x∞−0.25−0.25=−0.25−0.250.250.250.250.254 / 21 Random Pairwise Gossip (Xiao & Boyd’04)x∞xn =−0.25−0.25=−0.25−0.251I − (δin − δjn )(δin − δjn )T20.250.250.250.25xn−14 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 A natural extension in a metric setting5 / 21 Outline1. Motivation2. State of the art3. CAT(κ) spaces4. Previous result for κ = 05. Why the κ > 0 case is more complex6. Our result6 / 21 MotivationIn its Euclidean setting, Random Pairwise Midpoint cannot addressseveral useful type of data:Sphere positions (Sphere)Line orientations (Projective space)Solid orientations (Rotations)Subspaces (Grassmanians)Phylogenetic Trees (Metric space)Cayley graphs (Metric space)Reconfigurable systems (Metric space)7 / 21 State of the artConsensus optimization on manifolds :[Sarlette-Sepulchre’08],[Tron et al.’12],[Bonnabel’13]Synchronization on the circle : [Sarlette et al.’08]Synchronization on SO(3) : [Tron et al.’12]Our previous work: Distibuted pairwise gossip on CAT (0)spacesCaveat: In this work, we deal the problem of synchonization, i.e.attaining a consensus, whatever its value; contrarily to theEuclidean case where it is known that random pairwise midpointsconverges to x0 .¯8 / 21 CAT(κ) spacesModel spacesConsider a model surface Mκ with constant sectional curvature κ:κ < 0 corresponds to a hyperbolic spaceκ = 0 corresponds to a Euclidean spaceκ > 0 corresponds to a sphereGeodesicsAssume M is a metric space equipped with metric d. A mapγ : [0, l] → M such that:∀0 ≤ t, t ≤ l,d γ(t), γ(t ) = |t − t |is called a geodesic in M; a = γ(0) and b = γ(l) are its endpoints.If there exists one and only one geodesic linking a to b, it isdenoted [a, b].9 / 21 CAT(κ) spaces (cont’d)TrianglesA triple of geodesics γ, γ and γ with respective endpoints a, band c is called a triangle and is denoted (γ, γ , γ ) or (a, b, c)when there is no ambiguity.Comparison trianglesWhen κ ≤ 0, given a triangle (γ, γ , γ ), there always exist atriangle (aκ , bκ , cκ ) in Mκ such that d(a, b) = d(aκ , bκ ),d(b, c) = d(bκ , cκ ) and d(c, a) = d(cκ , aκ ) with a = γ(0),b = γ (0) and c = γ (0).aκallbllclbκlcκ10 / 21 CAT(κ) spaces (cont’d)CAT(κ) inequalityA triangle (γ, γ , γ ) in a metric space M satisfies the CAT(κ)inequality if for any x ∈ [a, b] and y ∈ [a, c] one has:d(x, y ) ≤ d(xκ , yκ )where xκ ∈ [aκ , bκ ] is such that d(aκ , xκ ) = d(a, x) andyκ ∈ [aκ , cκ ] is such that d(aκ , yκ ) = d(a, y ).axbdxκycd ≤ dκbκaκdκyκcκA metric space is said CAT(κ) if every pair of points can be joinedby a geodesic and every triangle with perimeter less than2π2Dκ = √κ satisfy the CAT(κ) inequality.11 / 21 Formal settingAssumptions1. Time is discrete t = 0, 1, . . .2. At each time each agent holds a “value” xt,v in a CAT (κ)metric space M3. At each time t, an agent Vt randomly wakes up and wakes upa neighbor Wt , according to the probability distribution:P[{Vk , Wk } = {v , w }] =Pv ,w > 00if v ∼ wotherwiseAlgorithm descriptionxt,v =Midpoint(xt−1,Vt , xt−1,Wt ) if v ∈ {Vt , Wt }xt−1,votherwise12 / 21 Previous resultThe algorithm is soundBecause geodesics exist and are unique in CAT(0) spaces.ConvergenceThe algorithm converges to a consensus with probability 1,whatever the initial state x0 .Rate of convergenceConvergence occur at a linear rate: defineσ 2 (x) =d 2 (xv , xw ) ;v ∼wthen, there exists a constant L < 0 such thatEσ 2 (Xk ) ≤ C0 exp(Lk)13 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 What changes for the κ > 0 (the case of the sphere)14 / 21 Our resultProvided the diameter of the initial set of values is less than Dκ /2,The algorithm is soundBecause geodesics exist and are unique using this restriction.ConvergenceThe algorithm converges to a consensus with probability 1.Rate of convergenceConvergence occur at a linear rate: defineσ 2 (x) =χκ (d(xv , xw )) ;v ∼wwith:√χκ (x) = 1 − cos( κx)then, there exists a constant L ∈ (−1, 0) such that:Eσ 2 (Xk ) ≤ C0 exp(Lk)15 / 21 Before iterationxt−1,Vt •xt−1,u•• xt−1,Wt16 / 21 After iteration•xt,u••xt,Vt xt,Wt•16 / 21 Net balancext−1,Vt •xt−1,u••xt,Vt xt,Wt• xt−1,Wt16 / 21 Sketch of proof (Net balance)Let us look at the increments:22N(σκ (Xt ) − σκ (Xt−1 )) = −χκ (d(XVt (t − 1), XWt (t − 1)))+Tκ (Vt , Wt , u)u∈Vu=Vt ,u=Wtwith:Tκ (Vt , Wt , u) = 2χκ (d(Xu (t), Mt )) − χκ (d(Xu (t), XVt (t − 1)))−χκ (d(Xu (t), XWt (t − 1)))Using the inequality:χκ dp+q2,r≤χκ (d(p, r )) + χκ (d(q, r ))217 / 21 Sketch of proof (Two propositions)We can prove the a first propostion:22E[σκ (Xk+1 ) − σκ (Xk )] ≤ −with:∆κ (x) =12N1E∆κ (Xk )NPv ,w χκ (d(xv , xw ))v ∼w{v ,w }∈EUsing graph connectedeness we prove a second proposition:Assume G = (V , E ) is an undirected connected graph, there existsa constant CG ≥ 1 depending on the graph only such that:∀x ∈ MN ,12∆κ (x) ≤ σκ (x) ≤ CG ∆κ (x)218 / 21 Sketch of proof (cont’d)The following lemmaAssume an is a sequence of nonnegative numbers such thatan+1 − an ≤ −βan with β ∈ (0, 1). Then,∀n ≥ 0,an ≤ a0 exp(−βn)Combined with the two propositions, gives the desired result.Eσ 2 (Xk ) ≤ exp(Lk)19 / 21 Simulation resultsSphere20 / 21 Simulation resultsRotations20 / 21 SummaryWe have proved that, when the data belong to completeCAT(κ) metric space, provided the initial values are closeenough, the same algorithm makes sense and also convergelinearly.We have checked that our results are consistent withsimulations.21 / 21

Optimal Transport (chaired by Jean-François Marcotorchino, Alfred Galichon)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
In this paper we relate the Equilibrium Assignment Problem (EAP), which is underlying in several economics models, to a system of nonlinear equations that we call the “nonlinear Bernstein-Schrödinger system”, which is well-known in the linear case, but whose nonlinear extension does not seem to have been studied. We apply this connection to derive an existence result for the EAP, and an efficient computational method.
 

T OPICS IN E QUILIBRIUM T RANSPORTATIONAlfred Galichon (NYU and Sciences Po)GSI, Ecole polytechnique, October 29, 2015G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 1/ 22 T HIS TALKThis talk is based on the following two papers:AG, Scott Kominers and Simon Weber (2015a). Costly Concessions: AnEmpirical Framework for Matching with Imperfectly Transferable Utility.AG, Scott Kominers and Simon Weber (2015b). The NonlinearBernstein-Schr¨dinger Equation in Economics, GSI proceedings.oG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 2/ 22 T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22 T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22 T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22 T HIS TALKAgenda:1. Economic motivation2. The mathematical problem3. Computation4. EstimationG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 3/ 22 Section 1E CONOMIC MOTIVATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 4/ 22 M OTIVATION : A MODEL OF LABOUR MARKETConsider a very simple model of labour market. Assume that apopulation of workers is characterized by their type x ∈ X , whereX = Rd for simplicity. There is a distribution P over the workers,which is assumed to sum to one.A population of firms is characterized by their types y ∈ Y (sayY = Rd ), and their distribution Q. It is assumed that there is the sametotal mass of workers and firms, so Q sums to one.Each worker must work for one firm; each firm must hire one worker.Let π (x, y ) be the probability of observing a matched (x, y ) pair. πshould have marginal P and Q, which is denotedπ ∈ M (P, Q ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 5/ 22 O PTIMALITYIn the simplest case, the utility of a worker x working for a firm y atwage w (x, y ) will beα (x, y ) + w (x, y )while the corresponding profit of firm y isγ (x, y ) − w (x, y ) .In this case, the total surplus generated by a pair (x, y ) isα (x, y ) + w + γ (x, y ) − w = α (x, y ) + γ (x, y ) =: Φ (x, y )which does not depend on w (no transfer frictions). A central plannermay thus like to choose assignment π ∈ M (P, Q ) so tomaxπ ∈M(P ,Q )Φ (x, y ) d π (x, y ) .But why would this be the equilibrium solution?G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 6/ 22 E QUILIBRIUMThe equilibrium assignment is determined by an important quantity: thewages. Let w (x, y ) be the wage of employee x working for firm of typey.Let the indirect surpluses of worker x and firm y be respectivelyu (x ) = max {α (x, y ) + w (x, y )}yv (y ) = max {γ (x, y ) − w (x, y )}xso that (π, w ) is an equilibrium whenu (x ) ≥ α (x, y ) + w (x, y ) with equality if (x, y ) ∈ Supp (π )v (y ) ≥ γ (x, y ) − w (x, y ) with equality if (x, y ) ∈ Supp (π )By summation,u (x ) + v (y ) ≥ Φ (x, y ) with equality if (x, y ) ∈ Supp (π ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 7/ 22 T HE M ONGE -K ANTOROVICH THEOREM OF O PTIMAL T RANSPORTATIONOne can show that the equilibrium outcome (π, u, v ) is such that π issolution to the primal Monge-Kantorovich Optimal Transportationproblemmaxπ ∈M(P ,Q )Φ (x, y ) d π (x, y )and (u, v ) is solution to the dual OT problemminu ,vu (x ) dP (x ) +v (y ) dQ (y )s.t. u (x ) + v (y ) ≥ Φ (x, y )Feasibility+Complementary slackness yield the desired equilibriumconditionsπ ∈ M (P, Q )u (x ) + v (y ) ≥ Φ (x, y )(x, y ) ∈ Supp (π ) =⇒ u (x ) + v (y ) = Φ (x, y )“Second welfare theorem”, “invisible hand”, etc.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 8/ 22 E QUILIBRIUM VS . OPTIMALITYIs equilibrium always the solution to an optimization problem?It is not. This is why this talk is about “Equilibrium Transportation,”which contains, but is strictly more general than “OptimalTransportation”.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 9/ 22 E QUILIBRIUM VS . OPTIMALITYIs equilibrium always the solution to an optimization problem?It is not. This is why this talk is about “Equilibrium Transportation,”which contains, but is strictly more general than “OptimalTransportation”.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 9/ 22 I MPERFECTLY TRANSFERABLE UTILITYConsider the same setting as above, but instead of assuming thatworkers’ and firm’s payoffs are linear in surplus, assumeu (x ) = max {Uxy (w (x, y ))}yv (y ) = max {Vxy (w (x, y ))}xwhere Uxy (w ) is nondecreasing and continuous, and Vxy (w ) isnonincreasing and continuous.Motivation: taxes, decreasing marginal returns, risk aversion, etc. Ofcourse, Optimal Transportation case is recovered whenUxy (w ) = αxy + wVxy (w ) = γxy − w .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 10/ 22 I MPERFECTLY TRANSFERABLE UTILITYFor (u, v ) ∈ R2 , letΨxy (u, v ) = min {t ∈ R : ∃w , u − t ≤ Uxy (w ) and v − t ≤ Vxy (w )}so that Ψ is nondecreasing in both variables and(u, v ) = (Uxy (w ) , Vxy (w )) for some w if and only if Ψxy (u, v ) = 0.Optimal Transportation case is recovered whenΨxy (u, v ) = (u + v − Φxy ) /2.As before, (π, w ) is an equilibrium whenu (x ) ≥ Uxy (w (x, y )) with equality if (x, y ) ∈ Supp (π )v (y ) ≥ Vxy (w (x, y )) with equality if (x, y ) ∈ Supp (π )We have therefore that (π, u, v ) is an equilibrium whenΨxy (u (x ) , v (y )) ≥ 0 with equality if (x, y ) ∈ Supp (π ) .G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 11/ 22 Section 2T HE MATHEMATICAL PROBLEMG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 12/ 22 E QUILIBRIUM TRANSPORTATION : DEFINITIONWe have therefore that (π, u, v ) is an equilibrium outcome when π ∈ M (P, Q )Ψxy (u (x ) , v (y )) ≥ 0.(x, y ) ∈ Supp (π ) =⇒ Ψxy (u (x ) , v (y )) = 0Problem: existence of an equilibrium outcome? This paper: yes in thediscrete case (X and Y finite), via entropic regularization.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 13/ 22 R EMARK 1: LINK WITH G ALOIS CONNECTIONSAs soon as Ψxy is strictly increasing in both variables, Ψxy (u, v ) = 0expresses as−u = Gxy (v ) and v = Gxy1 (u )−where the generating functions Gxy and Gxy1 are decreasing and continuousfunctions. In this case, relations−u (x ) = max Gxy (v (y )) and v (y ) = max Gxy1 (u (x ))y ∈Yx ∈Xgeneralize the Legendre-Fenchel conjugacy. This pair of relations form aGalois connection; see Singer (1997) and Noeldeke and Samuelson (2015).G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 14/ 22 R EMARK 2: T RUDINGER ’ S LOCAL THEORY OF PRESCRIBED J ACOBIANSAssuming everything is smooth, and letting fP and fQ be the densities of Pand Q we have under some conditions that the equilibrium transportationplan is given by y = T (x ), where mass balance yields|det DT (x )| =f (x )g (T (x ))and optimality yieds−−∂x GxT1(x ) (u (x )) + ∂u GxT1(x ) (u (x ))u (x ) = 0which thus inverts intoT (x ) = e (x, u (x ) ,u (x )) .Trudinger (2014) studies Monge-Ampere equations of the form|det De (., u,u )| =fg (e (., u,u )).(more general than Optimal Transport where no dependence on u).G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 15/ 22 D ISCRETE CASEOur work (GKW 2015a and b) focuses on the discrete case, when Pand Q have finite support. Call px and qy the mass of x ∈ X andy ∈ Y respectively.In the discrete case, problem boils down to looking for (π, u, v ) suchthat πxy ≥ 0, ∑y πxy = px , ∑x πxy = qy.Ψ (u , v ) ≥ 0 xy x yπxy > 0 =⇒ Ψxy (ux , vy ) = 0G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 16/ 22 Section 3C OMPUTATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 17/ 22 E NTROPIC REGULARIZATIONTake temperature parameter T > 0 and look for π under the formπxy = exp −Ψxy (ux , vy )TNote that when T → 0, the limit of Ψxy (ux , vy ) is nonnegative, andthe limit of πxy Ψxy (ux , vy ) is zero.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 18/ 22 ¨T HE NONLINEAR B ERNSTEIN -S CHR ODINGER EQUATIONIf πxy = exp (−Ψxy (ux , vy ) /T ) , condition π ∈ M (P, Q ) boils downto set of nonlinear equations in (u, v ) ∑y ∈Y exp − Ψxy (ux ,vy ) = pxT ∑x ∈X exp − Ψxy (ux ,vy )T= qywhich we call the nonlinear Bernstein-Schr¨dinger equation.oIn the optimal transportation case, this becomes the classical B-Sequation ∑y ∈Y exp Φxy −ux −vy = px2T ∑x ∈X expG ALICHONΦxy −ux −vy2TE QUILIBRIUM T RANSPORTATION= qySLIDE 19/ 22 A LGORITHMΨ ( u ,v )Note that Fx : ux → ∑y ∈Y exp − xy Tx yis a decreasing andcontinuous function. Mild conditions on Ψ therefore ensure theexistence of ux so that Fx (ux ) = px .Our algorithm is thus a nonlinear Jacobi algorithm:0- Make an initial guess of vyk +1 to fit the p margins, based on the v k- Determine the uxxykk- Update the vy +1 to fit the qy margins, based on the ux +1 .- Repeat until v k +1 is close enough to v k .0kOne can proof that if vy is high enough, then the vy decrease to fixedpoint. Convergence is very fast in practice.G ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 20/ 22 Section 4S TATISTICAL E STIMATIONG ALICHONE QUILIBRIUM T RANSPORTATIONSLIDE 21/ 22 M AXIMUM L IKELIHOOD ESTIMATIONˆIn practice, one observes πxy and would like to estimate Ψ. Assumethat Ψ belongs to a parametric family Ψθ , so thatθθθ θπxy = exp −Ψxy ux , vy ∈ M (P, Q ).ˆThe log-likelihood l (θ ) associated to observation πxy isl (θ ) =θˆ∑ πxy log πxyxyθθ θˆ= − ∑ πxy Ψxy ux , vyxyand thus the maximum likelihood procedure consists inθθ θˆmin ∑ πxy Ψxy ux , vy .θG ALICHONxyE QUILIBRIUM T RANSPORTATIONSLIDE 22/ 22

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This note presents a short review of the Schrödinger problem and of the first steps that might lead to interesting consequences in terms of geometry. We stress the analogies between this entropy minimization problem and the renowned optimal transport problem, in search for a theory of lower bounded curvature for metric spaces, including discrete graphs.
 

..Christian L´onardeUniversit´ Paris OuesteGSI’15´Ecole Polytechnique. October 28-30, 2015.....Some geometric aspects of theSchr¨dinger problemo Interpolations in P(X )X :Riemannian manifold(state space)P(X ) : set of all probability measures on Xµ0 , µ1 ∈ P(X )interpolate between µ0 and µ1 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1t Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=0 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=1 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=0 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.25 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.5 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt = 0.75 Interpolations in P(X )Standard affine interpolation between µ0 and µ1µaff := (1 − t)µ0 + tµ1 ∈ P(X ), 0 ≤ t ≤ 1tt=1 Interpolations in P(X )....Affine interpolations require mass transference with infinite speed... Interpolations in P(X )..Denial of the geometry of XWe need interpolations built upon trans -portation, not tele -portation..Affine interpolations require mass transference with infinite speed... Interpolations in P(X )We seek interpolations of this type Interpolations in P(X )We seek interpolations of this typet=0 Interpolations in P(X )We seek interpolations of this typet = 0.25 Interpolations in P(X )We seek interpolations of this typet = 0.5 Interpolations in P(X )We seek interpolations of this typet = 0.75 Interpolations in P(X )We seek interpolations of this typet=1 Displacement interpolationµ1µ0 Displacement interpolationy = T (x)xyµ1µ0 Displacement interpolationgeodesicsµ1µ0 Displacement interpolationgeodesicsµ1µ0 Displacement interpolation Displacement interpolationxyγtxy Displacement interpolation Displacement interpolation Curvaturegeodesics and curvature are intimately linkedseveral geodesics give information on the curvature Curvaturegeodesics and curvature are intimately linkedseveral geodesics give information on the curvatureδ(t)θpδ(t) =...√()σp (S) cos2 (θ/2) 242(1 − cos θ) t 1 −t + O(t )6.... Displacement interpolationy = T (x)xyµ1µ0 Displacement interpolation.Respect geometry..we have already used geodesicshow to choose y = T (x) such that interpolations encrypt curvatureas best as possible?.no shock..... Displacement interpolation.Respect geometry..we have already used geodesicshow to choose y = T (x) such that interpolations encrypt curvatureas best as possible?......no shockperform optimal transportd : Riemannian distance.T : T# µ0 = µ1...Monge’s problem.∫.2. X d (x, T (x)) µ0 (dx) → min;.. Lazy gas experimentt=00

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This article leans on some previous results already presented in [10], based on the Fréchet’s works,Wilson’s entropy and Minimal Trade models in connectionwith theMKPtransportation problem (MKP, stands for Monge-Kantorovich Problem). Using the duality between “independance” and “indetermination” structures, shown in this former paper, we are in a position to derive a novel approach to design a copula, suitable and efficient for anomaly detection in IT systems analysis.
 

Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaOptimal Transport, Independance versusIndetermination duality, impact on a new CopulaDesignBenoit Huyot, Yves MabialaThales Communications and Security29 October 2015Benoit Huyot, Yves Mabiala1 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications Ba1 Cybersecurity problem overviewCurrent Intrusion Detection SystemsAnomaly based IDSIDS as a classification problem2 Properties of Copula FunctionCopula theory historicSklar’s Theorem and Frechet’s BoundsRegularity properties on copula function3 Copula theory used in anomalies detection applicationsClassification AUC with copula paradigmExperimental resultsBenoit Huyot, Yves Mabiala2 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCurrent Intrusion Detection SystemsRule based approachesSuitable to detect previously known patternsRules are easily understandableEasy addition of new rulesButUnable to detect unknown patternsBenoit Huyot, Yves Mabiala3 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaAnomaly based IDSAnomaly based approachesSuitable to detect unknown patternsTime consuming to update modelAlerts are difficult to understand through existing toolsToo many false alertsButOur approach is an attempt to overcome these problemsBenoit Huyot, Yves Mabiala4 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaAnomaly based IDSAnomaly detection as a classification problemY is a binary random variable where Y = 0 if the event isabnormal Y = 1 else.p0 is the a priori attack probability define by p0 = P(Y ≤ 0)X represents the difference characteristics of the networkeventIf X is a p-dimensional random vector, the cumulativedistribution function will be denotedF (x) = P(X1 ≤ x1 , ..., Xp ≤ xp )Benoit Huyot, Yves Mabiala5 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaIDS as a classification problemScoring functionScoring function is defined as P(Y = 0|X = x)P(Y = 0, X = x)By definition we have P(Y = 0|X = x) =P(X = x)Anomalies are identified thanks to the classical Bayes’s rulemodelEmpirical estimation is difficult due to the ”Curse ofDimensionnality”Joint probabilities will be computed using copula theory toease computationsBenoit Huyot, Yves Mabiala6 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCopula theory historicIntroduction to Copula theoryOriginated by M.Fr´chet in 1951eFr´chet, M. (1951): ”Sur les tableaux de corr´lations dont leseemarges sont donn´es”, Annales de l’Universit´ de Lyon,eeSection A no 14, 53-77A.Sklar gave a breakthrough in 1959Sklar, A. (1959), ”Fonctions de r´partition ` n dimensions etealeurs marges”, Publ. Inst. Statist. Univ. Paris 8: 229-231Benoit Huyot, Yves Mabiala7 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaSklar’s Theorem and Frechet’s BoundsMain results on copula functionTheorem (Sklar’s theorem)Given two continuous random variables X and Y in L1 , withcumulative distribution functions written F and G . It exists anunique function C, called, copula such as:P(X ≤ x, Y ≤ y ) = C(F (x), G (y ))Theorem (Fr´chet-Hoeffding’s Bounds)eGiven a copula function C, ∀(u, v ) ∈ [0, 1]2 we have the followingFr´chet’s bounds:eMax(u + v − 1, 0) ≤ C(u, v ) ≤ Min(u, v )Benoit Huyot, Yves Mabiala8 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaRegularity properties on copula function2-increasing property or Monge’s conditionsB + D = C(u1 , v2 )D + C = C(v1 , u2 )A + B + C + D = C(v1 , v2 )D = C(u1 , u2 )A = (A + B + C + D) − (B + D) − (D + C ) + D and A ≥ 0∀(u1 , v1 ) as 0 ≤ u1 ≤ v1 ≤ 1∀(u2 , v2 ) as 0 ≤ u2 ≤ v2 ≤ 1C(v1 , v2 ) − C(u1 , v2 ) − C(v1 , u2 ) + C(u1 , u2 ) ≥ 0Benoit Huyot, Yves Mabiala9 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaRegularity properties on copula functionCopula is an Holderian functionB + C + E = C(u2 , v2 ) − C(u1 , v1 )A + C + E = C(u2 , 1) − C(u1 , 1)B + C + D = C(v2 , 1) − C(v1 , 1)B + C + E ≤ (B + C + D) + (A + C + E )We obtain a 1-Holderian condition for the Copula C:∀(u1 , v1 , u2 , v2 ) ∈ [0, 1]4|C(u2 , v2 )−C(u1 , v1 )| ≤ |u2 −u1 |+|v2 −v1 |Benoit Huyot, Yves Mabiala10 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaCopula theory used in anomalies detection applicationsOnly unfrequent events could have a score greater than12Looking for attack remains to looking for rare eventsFr´chet’s Bounds gives useP(Y = 0|X ) ≤min(P(X ), P(Y = 0))P(X )and we get:P(Y = 0|X ) ≥1⇒ P(X ) ≤ 2.P(Y = 0)2Benoit Huyot, Yves Mabiala11 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaLower bound for anomalies detectionIt’s possible to show limitThe ”lower tail dependance” is defined as: λL = Limv →0λL ≤ Limv →0C(v , v )vC(u, v )vBenoit Huyot, Yves Mabiala12 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaVariation of the score functionWe want to study to variation of v →1v2v∂C(u, v ) − C(u, v )∂vC(u, v )in [0, 2p0 ]v≤0∂CC(u, v )(u, v ) ≤link to convexity∂vv∂⇔ v log C(u, v ) ≤ 1 link to Fisher’s information∂v⇔Benoit Huyot, Yves Mabiala13 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaClassification AUC with copula paradigmROC curve and AUCC(p0 , s)p01-Specificity (anti-Specificity): FalsesPositive Rate,(1 − C(p0 , s))1 − p0Sensitivity: True Positive Rate,AUC =121 − p0 −2p0 (1 − p0 )1(C(p0 , s) − 1)2 ds0In case of a bivariate random vector X we get:11AUC = K1 (p0 )−K2 (p0 )0(C2 (s1 , s2 ) − 1)20Benoit Huyot, Yves Mabiala∂2C2 (s1 , s2 )ds1 ds2∂s1 ∂s214 Optimal transport problemIn the Monge-Kantorovich problem we want to minimize followingquantity:AB1 2h(x, y ) −minhAB00Under constraints:A BThe solution is given by:h(x, y ) = 110 0Ah(x, y ) = g (y )20Bh(x, y ) = f (x)30h∗ (x, y ) =f (x) g (y )1+−BAABThe cumulative distribution functionassociated to the solution is:H ∗ (x, y ) = yG (y )xyF (x)+x−BAAB Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaClassification AUC with copula paradigmAlgorithm principleBenoit Huyot, Yves Mabiala16 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaExperimental resultsExperimental resultsQuantile level used for copula benchmarkQuantile level10−45.10−4 10−35.10−3Optimal Transport CopulaDetection rate18.64% 73.86% 74.32% 74.82%False alarms rate 23.15% 2.32%4.38%3.72%Clayton CopulaDetection rate0.0%0.0%19.28% 71.73%False alarms rate 0.0%0.0%0.63%36.76%Frechet’s upper bound CopulaDetection rate30.35% 31.39% 32.73% 36.93%False alarms rate 41.26% 38.68% 31.89% 27.48%Benoit Huyot, Yves Mabiala10−275.09%4.71%79.86%34.20%79.11%27.95%17 Outline Cybersecurity problem overview Properties of Copula Function Copula theory used in anomalies detection applications BaExperimental resultsThanks for your attention!Benoit Huyot, Yves Mabiala18 Link to Fisher’s InformationWe will use the following equation:∂∂vC(u, v ) =logC (u, v ).vC(u, v ) ∂v∂vThis condition is the statistical scoreThe variance of this quantity gives the Fisher’s Information SensitivitySensitivity represents how many events are well assigned toanomaliesˆSensitivity : P(Y = 0|Y = 0)ˆY = 0 when F (X ) ≤ s for a given threshold sˆY = 0 when X ∈ F −1 ([0; s])Sensitivity: P(X ∈ F −1 ([0; s])|p0 ) SensitivitySensitivity appears so as :ˆP(Y = 0|Y = 0) =ˆP(Y = 0, Y = 0)P(Y = 0)−1P(Y = 0, X ≤ FX (s))P(Y = 0)C(p0 , s)=p0= Specificity/AntispecificityAntispecificity represents how many misclassifications aregiven by the algorithmˆSpecificity : P(Y = 1|Y = 1)ˆY = 1 when F (X ) ≥ s for a given threshold sˆY = 1 when X ∈ F −1 ([s; 1])Specificity: P(X ∈ F −1 ([s; 1])|p0 ) AntispecificityAntispecificity appears using survival copula function as:ˆˆ1 − P(Y = 1|Y = 1) = P(Y = 0|Y > 0)ˆP(Y = 0)ˆP(Y > 0|Y = 0)=P(Y > 0)s=(1 − C(p0 , s))1 − p0 Area under ROC Curve (AUC)1PD (PF )dPFAUC =0Using an integration by substitution we obtain:1AUC =PD (s).0∂PF (s)ds∂sC(p0 , s)p0sAntispecificity PF (s) =(1 − C(p0 , s))1 − p0Sensitivity: PD (s)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present an overview of our recent work on implementable solutions to the Schrödinger bridge problem and their potential application to optimal transport and various generalizations.
 

Optimal Mass Transport overBridgesMichele PavonDepartment of MathematicsUniversity of Padova, ItalyGSI’15, Paris, October 29, 2015 Joint work with Yongxin Chen, Tryphon Georgiou, Department ofElectrical and Computer Engineering, University of MinnesotaA Venetian Schr¨dinger bridgeo Dynamic version of OMT“Fluid-dynamic” version of OMT (Benamou and Brenier (2000)):1inf(ρ,v)∂ρ+Rn012v(x, t) 2ρ(x, t)dtdx,· (vρ) = 0,∂tρ(x, 0) = ρ0(x),(1a)(1b)ρ(y, 1) = ρ1(y).(1c)Proposition 1 Let ρ∗(x, t) with t ∈ [0, 1] and x ∈ Rn, satisfy∂ρ∗∂t+· ( ψρ∗) = 0,ρ∗(x, 0) = ρ0(x),where ψ is the (viscosity) solution of the Hamilton-Jacobi equation∂ψ∂t+12ψ 2=0for some boundary condition ψ(x, 1) = ψ1(x). If ρ∗(x, 1) = ρ1(x),then the pair (ρ∗, v ∗) with v ∗(x, t) =ψ(x, t) is optimal for (1). Schr¨dinger’s Bridgeso• Cloud of N independent Brownian particles;• empirical distr. ρ0(x)dx and ρ1(y)dy at t = 0 and t = 1, resp.• ρ0 and ρ1 not compatible with transition mechanism1ρ1(y) =0p(t0, x, t1, y)ρ0(x)dx,where−n2p(s, y, t, x) = [2π(t − s)]exp −|x − y|22(t − s),s

Information Geometry in Image Analysis (chaired by Yannick Berthoumieu, Geert Verdoolaege)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The current paper introduces new prior distributions on the zero-mean multivariate Gaussian model, with the aim of applying them to the classification of covariance matrices populations. These new prior distributions are entirely based on the Riemannian geometry of the multivariate Gaussian model. More precisely, the proposed Riemannian Gaussian distribution has two parameters, the centre of mass ˉY and the dispersion parameter σ. Its density with respect to Riemannian volume is proportional to exp(−d2(Y;ˉY)), where d2(Y;ˉY) is the square of Rao’s Riemannian distance. We derive its maximum likelihood estimators and propose an experiment on the VisTex database for the classification of texture images.
 

Geometric Science of Information 2015Non supervised classificationin the space of SPD matricesSalem Said – Lionel Bombrun – Yannick BerthoumieuLaboratoire IMS CNRS UMR 5218 – Universit´ de Bordeauxe29 October 2015Said et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 201529 October 20150 / 11 Context of our workOur project : Statistical learning in the space of SPD matricesOur team :3 members of IMS laboratory + 2 post docs (Hatem Hajri, Paolo Zanini)Target applications : remote sensing , radar signal processing , Neuroscience (BCI)Our partners : IMB (Marc Arnaudon + PhD student), Gipsa-lab, Ecole des MinesOur recent workhttp ://arxiv.org/abs/1507.01760Riemannian Gaussian distributions on the space of SPD matrices (in review, IEEE IT)Some of our problems :Given a population of SPD matrices (any size or structure)− Non-supervised learning of its class structure− Semi-parametric learning of its densityPlease look up our paper on Arxiv :-)Said et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 201529 October 20151 / 11 Geometric toolsStatistical manifold : Θ = SPD, Toeplitz, Block-Toeplitz, etc, matricesHessian or Fisher metric :ds 2 (θ ) = Hess Φ (dθ,dθ )Φ model entropy— Θ becomes a Riemannian homogeneous space of negative curvature ! !Example : 2 × 2 correlation (baby Toeplitz)Θ =1θ∗⇒ ds 2 (θ ) =θ1|θ | < 1|dθ | 2[1 − |θ | 2 ] 2Φ(θ ) = − log[1 − |θ | 2 ]Poincar´ disc modeleWhy do we use this ?– Suitable mathematical properties– Relation to entropy or “information”– Often leads to excellent performanceSaid et al. (IMS Bordeaux – CNRS UMR 5218)Geometric Science of Information 2015First place in IEEE BCI challenge29 October 20152 / 11 ContributionI-Introduction of Riemannian Gaussian distributionsA statistical model of a class/cluster :RiemannianGaussiandistribution¯G(θ, σ )[Pennec 2006]¯d 2 (θ, θ )¯p(θ | θ, σ ) = Z −1 (σ ) × exp −2σ 2Expression unknownin the literature¯d (θ , θ ) Riemannian distanceComputing Z (σ )Case where Θis the spaceof m × mcovariancematricesZ (σ ) =exp −Θ¯d 2 (θ, θ )dv (θ )2σ 2¯¯d (θ, θ ) = tr log θ −1 θ22dv (θ ) = det(θ ) −m+12dθ iji

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a new texture discrimination method for textured color images in the wavelet domain. In each wavelet subband, the correlation between the color bands is modeled by a multivariate generalized Gaussian distribution with fixed shape parameter (Gaussian, Laplacian). On the corresponding Riemannian manifold, the shape of texture clusters is characterized by means of principal geodesic analysis, specifically by the principal geodesic along which the cluster exhibits its largest variance. Then, the similarity of a texture to a class is defined in terms of the Rao geodesic distance on the manifold from the texture’s distribution to its projection on the principal geodesic of that class. This similarity measure is used in a classification scheme, referred to as principal geodesic classification (PGC). It is shown to perform significantly better than several other classifiers.
 

FACULTY OF ENGINEERING ANDARCHITECTUREColor Texture Discrimination using thePrincipal Geodesic Distance on a MultivariateGeneralized Gaussian ManifoldGeert Verdoolaege1,2 and Aqsa Shabbir1,31 Departmentof Applied Physics, Ghent University, Ghent, Belgiumfor Plasma Physics, Royal Military Academy (LPP–ERM/KMS),Brussels, Belgium3 Max-Planck-Institut für Plasmaphysik, D-85748 Garching, Germany2 LaboratoryGeometric Science of InformationParis, October 28–30, 2015 Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classification4Classification experiments5Conclusions2 Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classification4Classification experiments5Conclusions3 VisTex database128 × 128 subimages extracted from RGB images from 40 classes(textures)4 CUReT database200 × 200 RGB images from 61 classes with varying illumination andviewpoint5 Texture modelingStructure at various scalesStochasticityCorrelations between colors, neighboring pixels, etc.⇒ Multivariate wavelet distributions6 Overview1Color texture2Geometry of wavelet distributions3Principal geodesic classification4Classification experiments5Conclusions7 Generalized Gaussian distributionsUnivariate: generalized Gaussian distribution (zero mean):p(x|α, β) =βexp −2αΓ(1/β)|x|αβm-variate multivariate generalized Gaussian (MGGD, zero-mean):m2m2ββ1x Σ−1 x2π Γ2 |Σ|Shape parameterβ = 1: Gaussian; β = 1/2: Laplace (heavy tails)p(x|Σ, β) =Γm2βm2β12exp −8 MGGD geometry: coordinate system(Σ1 , β1 ) → (Σ2 , β2 ): find K such thatK Σ1 K = Im ,K Σ2 K ≡ Φ2 ≡ diag(λ1 , . . . , λp ),22λi2 eigenvalues of Σ−1 Σ21In fact,∀ Σ(t), t ∈ [0, 1]: K Σ(t)K ≡ Φ(t) ≡ diag(λ1 , . . . , λp ),22λi2 eigenvalues of Σ−1 Σ(t)1r i (t) ≡ ln[λi (t)]M. Berkane et al., J. Multivar. Anal., 63, 35–46, 1997G. Verdoolaege and P. Scheunders, J. Math. Imaging Vis., 43, 180–193, 20129 MGGD geometry: Fisher information metricgββ (β) =+1β21+m2β2Ψ1mm[ln(2)]2 + Ψ 1 +2β2βm2β+mln(2) + Ψβln(4) + Ψ 1 +1m1 + ln(2) + Ψ 1 +2β2β1gii (β) = 3bh −41gij (β) = bh − , i = j4m2βm2β+ Ψ1 1 +m2βgβi (β) = −bh ≡1 m + 2β4 m+210 MGGD geometry: geodesics and exponential mapGeodesic equations for fixed β:r i (t) ≡ ln(λi2 ) tGeodesic distance:1GD(Σ1 , Σ2 ) =  3bh −41/2i1i(r2 )2 + 2 bh −4i jr2 r2 i

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Practical estimation of mixture models may be problematic when a large number of observations are involved: for such cases, online versions of Expectation-Maximization may be preferred, avoiding the need to store all the observations before running the algorithms. We introduce a new online method well-suited when both the number of observations is large and lots of mixture models need to be learned from different sets of points. Inspired by dictionary methods, our algorithm begins with a training step which is used to build a dictionary of components. The next step, which can be done online, amounts to populating the weights of the components given each arriving observation. The usage of the dictionary of components shows all its interest when lots of mixtures need to be learned using the same dictionary in order to maximize the return on investment of the training step. We evaluate the proposed method on an artificial dataset built from random Gaussian mixture models.
 

Information Geometry for mixturesCo-Mixture ModelsBag of componentsBag-of-components: an online algorithm forbatch learning of mixture modelsOlivier SchwanderFrank NielsenUniversité Pierre et Marie Curie, Paris, FranceÉcole polytechnique, Palaiseau, FranceOctober 29, 20151 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsExponential familiesDefinitionp(x ; λ) = pF (x ; θ) = exp ( t(x )|θ − F (θ) + k(x ))λ source parametert(x ) sufficient statisticθ natural parameterF (θ) log-normalizerk(x ) carrier measureF is a stricly convex and differentiable function·|· is a scalar product2 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMultiple parameterizations: dual parameter spacesMultiple source parameterizationsSource Parameters (not unique)λ1 ∈ Λ1 , λ2 ∈ Λ2 , . . . , λn ∈ Λnθ=F (η)Legendre Transform(F, Θ) ↔ (F , H)θ∈ΘNatural Parametersη=F (θ)η∈HExpectation ParametersTwo canonical parameterizations3 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman divergencesDefinition and propertiesBF (x y ) = F (x ) − F (y ) − x − y , F (y )F is a stricly convex and differentiable functionNo symmetry!Contains a lot of common divergencesSquared Euclidean, Mahalanobis, Kullback-Leibler,Itakura-Saito. . .4 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman centroidsLeft-sided centroidmincωi BF (c xi )iRight-sided centroidmincωi BF (xi c)iClosed-formcL = F ∗ωi F (xi )icR =ωi xii5 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsLink with exponential families[Banerjee 2005]Bijection with exponential familieslog pF (x |θ) = −BF ∗ (t(x ) η) + F ∗ (t(x )) + k(x )Kullback-Leibler between exponential familiesbetween members of the same exponential familyKL(pF (x , θ1 ), pF (x , θ2 )) = BF (θ2 θ1 ) = BF (η1 η2 )Kullback-Leibler centroidsIn closed-form through the Bregman divergence6 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMaximum likelihood estimatorA Bregman centroidη = arg maxˆlog pF (xi , η)ηiBF ∗ (t(xi ) η) −F ∗ (t(xi )) − k(xi )= arg minηiBF ∗ (t(xi ) η)= arg minη=does not depend on ηit(xi )iˆAnd θ =F (ˆ)η7 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsMixtures of exponential familiesm(x ; ω, θ) =ωi pF (x ; θi )1≤i≤kFixedFamily of the components PFNumber of components k(model selection techniquesto choose)ParametersWeightsiωi = 1Component parameters θiLearning a mixtureInput: observations x1 , . . . , xNOutput: ωi and θi8 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsExponential familiesBregman divergencesMixture modelsBregman Soft Clustering: EM for exponential families[Banerjee 2005]E-stepp(i, j) =M-stepηj = arg maxωj pF (xi , θj )m(xi )p(i, j) log pF (xi , θj )ηi= arg minη=ip(i, j) BF ∗ (t(xi ) η) −F ∗ (t(xi )) − k(xi )idoes not depend on ηp(i, j)t(xu )u p(u, j)9 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsJoint estimation of mixture modelsExploit shared information between multiple pointsetsto improve qualityto improve speedInspirationEfficient algorithmsDictionary methodsBuildingTransfer learningComparing10 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsCo-MixturesSharing components of all the mixtureskm1 (x |ω(1)(1)ωi pF (x | ηj ), η) =i=1...kmS (x |ω (S) , η) =(S)ωi pF (x | ηj )i=1Same η1 . . . ηk everywhereDifferent weights ω (l)11 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsco-Expectation-MaximizationMaximize the mean of the likelihoods on each mixturesE-stepA posterior matrix for each dataset(l)p (l) (i, j) =ωj pF (xi , θj )(l)m(xi |ω (l) , η)M-stepMaximization on each dataset(l)ηj=ip(i, j)(l)t(xu )p (l) (u, j)uAggregationηj =1SS(l)ηjl=112 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsVariational approximation of Kullback-Leibler[Hershey Olsen 2007]KKLVariationnal (m1 , m2 ) =(1)logpF (·; θj ))i=1(2)pF (·; θj ))jωj e −KL(pF (·; θi )j(1)ωiωj e −KL(pF (·; θi )With shared parametersPrecompute Dij = e −KL(pF (·| ηi ),pF (·| ηj ))Fast version(1)(1)KLvar (m1 m2 ) =ωiilogjωj e −Dijjωj e −Dij(2)13 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsco-SegmentationSegmentation from 5D RGBxy mixturesOriginalEMCo-EM14 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsMotivationAlgorithmsApplicationsTransfer learningIncrease the quality of one particular mixture of interestFirst image: only 1% of the pointsTwo other images: full set of pointsNot enough points for EM15 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsAlgorithmExperimentsBag of ComponentsTraining stepComix on some training setKeep the parametersCostly but offlineD = {θ1 , . . . , θK }Online learning of mixturesFor a new pointsetFor each observation arriving:arg max pF (xj , θ)θ∈Dorarg min BF (t(xj ), θ)θ∈D16 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsAlgorithmExperimentsNearest neighbor searchNaive versionLinear searchO(number of samples × number of components)Same order of magnitude as one step of EMImprovementComputational Bregman Geometry to speed-up the searchBregman Ball TreesHierarchical clusteringApproximate nearest neighbor17 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsAlgorithmExperimentsImage segmentationSegmentation on a random subset of the pixels100%10%1%EMBoC18 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsAlgorithmExperimentsComputation times120TrainingEMBoC100806040200Training100%10%1%19 / 20 Information Geometry for mixturesCo-Mixture ModelsBag of componentsAlgorithmExperimentsSummaryComixMixtures with shared componentsCompact description of a lot of mixturesFast KL approximationsDictionary-like methodsBag of ComponentsOnline methodPredictable time (no iteration)Works with only a few pointsFast20 / 20

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Stochastic watershed is an image segmentation technique based on mathematical morphology which produces a probability density function of image contours. Estimated probabilities depend mainly on local distances between pixels. This paper introduces a variant of stochastic watershed where the probabilities of contours are computed from a gaussian model of image regions. In this framework, the basic ingredient is the distance between pairs of regions, hence a distance between normal distributions. Hence several alternatives of statistical distances for normal distributions are compared, namely Bhattacharyya distance, Hellinger metric distance and Wasserstein metric distance.
 
no preview

Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStatistical Gaussian Model of Image Regionsin Stochastic Watershed SegmentationJesús Angulojesus.angulo@mines-paristech.fr ; http://cmm.ensmp.fr/∼anguloMINES ParisTech, PSL-Research University,CMM-Centre de Morphologie MathématiqueGSI'2015 - 2nd Conference on Geometric Science of InformationEcole Polytechnique, Paris-Saclay (France) - October 28th-30th 20151 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: Color imageLarge homogenous areas, well contrasted objects as well as textured zonesand fuzzy boundaries2 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: its color gradient imageLarge homogenous areas, well contrasted objects as well as textured zonesand fuzzy boundaries3 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: pdf of contours using stochastic watershedUsing watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured4 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: h-dynamics watershed cut from SW pdf,h = 0.1Using watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured5 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMotivation: Unsupervised segmentation of genericimagesCustard: h-dynamics watershed cut from SW pdf,h = 0.3Using watershed based techniques large homogeneous areas areoversegmented and textured zones are not always well contoured6 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goal7 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goalContext:Segmentation approaches based on statistical modeling of pixels andregions, e.g, mean shift and statistical region mergingHierarchical contour detection and segmentation, e.g., machinelearned edge detection, watershed transformStochastic watershed (SW): to estimate a probability densityfunction of contours7 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationContext and goalContext:Segmentation approaches based on statistical modeling of pixels andregions, e.g, mean shift and statistical region mergingHierarchical contour detection and segmentation, e.g., machinelearned edge detection, watershed transformStochastic watershed (SW): to estimate a probability densityfunction of contoursGoal: Take into account regional information in the probabilityestimation by SW by means of a statistical gaussian model⇒Moreperceptual strength function of contours7 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPlan1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives8 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo Simulations1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives9 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsUniform random germsGenerate realizations of a Poisson point process with a constantintensityθ(i.e., average number of points per unit area)Random number of pointsBorel set), with areaparameterθ|D|,|D|,N(D)falling in a domainD(boundedfollows a Poisson distribution withi.e.,nPr{N(D)= n} = e −θ|D|Conditionally to the fact thatN(D) = n,(−θ|D|)n!theindependently and uniformly distributed overnumber of points inDisθ|D|n points areD , and the average(i.e., the mean and variance of aPoisson distribution is its parameter10 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsRegionalized random germsLet us suppose that the densityθis not constant; but considered asmeasurable positive-valued function, dened inlet us writeθ(D) =θ(x)d xNumber of points falling in a Borel setdensity functionθBaccording to a regionalizedθ(D),nPr{N(D)N(D) = n,For simplicity,follows a Poisson distribution of parameteri.e.,IfRd .then= n} = e −θ(D)(−θ(D))n!are independently distributed overDwith theprobability density function:θ(x) = θ(x)/θ(D)11 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsRegionalized Poisson pointsN random germs in an image m : E → {0, 1}θ(x) using inverse transform samplingGeneratedensity1Initialization:2Compute cumulative distribution function:3forj =1toaccording tom(xi ) = 0 ∀xi ∈ E ; P = Card(E )cdf (xi ) =k≤iPk=1θ(xk )θ(xk )N4rj ∼ U (1, P)5Find the value6m(xsj ) = 1sjsuch thatrj ≤ cdf (xsj ).12 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsStochastic watershed paradigmSpreading random germs as markers on the watershed segmentation.This arbitrary choice is stochastically balanced by the use of a givennumberMof realizations, in order to lter out non signicantuctuationsEach piece of contour may then be assigned the number of times itappears during the various simulations in order to estimate aprobability density function (pdf ) of contoursIn the case of uniformly distributed random germs, large regions willbe sampled more frequently than smaller regions and will be selectedmore oftenImage gradient as density for regionalization of random germsinvolves sampling high contrasted image areas: probability ofselecting a contour will oer a trade-o between strength of thecontours and size of the adjacent regions13 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedLet{mrkn (x)}M 1n=be a series ofMrealizations ofNspatiallydistributed random markers according to its gradient imagegEach realization of random germs considered as the marker imagefor a watershed segmentation of gradient imagegin order to obtainthe binary image:1if0WS(g , mrkn )(x) =ifx ∈ Watershedx ∈ Watershed/lineslinesProbability density function of contours is computed by the kerneldensity estimation method:pdf (x) =1MMWS(g , mrkn )(x) ∗ Kσ (x).n=1where the smoothing kernelwidthKσ (x)is a spatial Gaussian function ofσ14 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Color gradientg (x)15 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershed{mrkn (x)}M 1 : Mn=θ(x) = g (x)realizations ofNregionalized Poisson points of···{WS(g , mrkn )}1≤n≤M :Watershed segmentations···16 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.117 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationStochastic Watershed using MonteCarlo SimulationsProbability density of contours using MonteCarlosimulations of watershedColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.318 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SW1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives19 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SW⇒Watershed transformTessellationτofETessellationfrom watershedWS(x):(Finite) family of disjointopen sets (or classes, or regions)τ = {Rr }1≤r ≤N ,withi = j ⇒ R i ∩ Rj = ∅such thatE = ∪r RrWS(x) ⇔ WS(x) = E \ ∪r Rr = ∪li,jBoundary between regionsRiandRj(1≤ i, j ≤ N , i = j):Irregular arcsegmentli,j = ∂Ri ∩ ∂Rj20 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWColor regions as multivariate normal distributionsThe color image values restricted to each region of the partition,Pi = f (Ri ),can be modeled by dierent statistical distributionsHere we focuss on a multivariate normal modelPi ∼ N (µi , Σi ),of meanµiand covariance matrixΣiDierent (statistical) distances are dened in the space ofN (µi , Σi )Boundary li,j will be weighted with a function depending on thedistance betweenN (µi , Σi )andN (µj , Σj )21 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsBhattacharyya distanceDB (P1 , P2 )It measures the similarity of two discrete or continuous probabilitydistributionsP1andP2by computing the amount of overlapbetween the two statistical populations:DB (P1 , P2 ) = − logP1 (x)P2 (x)dxFor multivariate normal distributionsDB (P1 , P2 ) =18(µ1 −µ2 )T Σ−1 (µ1 −µ2 )+whereΣ=12log√det Σdet Σ1 det Σ2,Σ1 + Σ22Note that the rst term in the Bhattacharyya distance is related tothe Mahalanobis distance, both are the same when the covariance ofboth distributions is the same22 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsHellinger metric distance0≤ DB ≤ ∞DH (P1 , P2 )and it is symmetricDB (P1 , P2 ),butDBdoes not obeythe triangle inequality and therefore it is not a metricBhattacharyya distance can be metrized by transforming it into tothe following Hellinger metric distanceDH (P1 , P2 ) =1− exp (−DB (P1 , P2 )),For multivariate normal distributionsDH (P1 , P2 ) =1−√det Σdet Σ1 det Σ2Hellinger distance is anα=0α-divergence,−1/21e (− 4 (µ1 −µ2 )1 +Σ2 )−1 (µ1 −µ2 ))T (Σwhich corresponds to the caseand it is the solely being a metric distance. Hellinger distancecan be related to measure theory and asymptotic statistics23 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWDistances for multivariate normal distributionsWasserstein metric distanceDW (P1 , P2 )Wasserstein metric is a distance function dened between probabilitymeasuresµandνRnonis based on the notion optimal transport:W2 (µ, ν) = inf E( X − Y2 1 /2)where the inmum runs over all random vectorswithX ∼µand,(X , Y ) ∈ Rn × RnY ∼νFor the case of discrete distributions, it corresponds to thewell-known earth mover's distanceFor two multivariate normal distributions:µ1 − µ2DW (P1 , P2 ) =2+ Tr (Σ1 + Σ2 − 2Σ1,2 ),where1/21/2Σ1,2 = Σ1 Σ2 Σ1In particular, in the commutative case2DW (P1 , P2 ) = µ1 − µ221/2.Σ 1 Σ 2 = Σ 2 Σ1one has1/21.+ Σ−Σ1 /2 2F224 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimation25 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :πi,j =whereD(Pi , Pj )D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )is any of the distances discussed above25 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :πi,j =whereD(Pi , Pj )D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )is any of the distances discussed aboveFor any realizationnof SW, denotedWS(x, n),one can compute animage of weighted contours:Pr (x, n) =πi,jif0ifnx ∈ li,jnx ∈ li,j/25 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationTo assign to each piece of contour li,j between regionsRiandRjthenormalized statistical distance between the color gaussiandistributionsPiandPj :D(Pi , Pj ),lk,l ∈WS D(Pk , Pl )πi,j =whereD(Pi , Pj )is any of the distances discussed aboveFor any realizationnof SW, denotedWS(x, n),one can compute animage of weighted contours:πi,jIntegrating across theMif0Pr (x, n) =ifnx ∈ li,jnx ∈ li,j/realizations, the MonteCarlo estimate ofthe probability density function of contours:pdf (x) =1MMPr (x, n) ∗ Kσ (x)n=125 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimation{WS(g , mrkn )}1≤n≤M :Watershed segmentations···{Pr (x, n)}1≤n≤M :Weighted contours (Bhattacharyya distance)···26 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationBhattacharyya distanceColor imagef (x)Density of contourspdf (x)Color gradientSegmented withg (x)h = 0.0227 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationDistance of meansHellinger distance1Bhattacharyya distanceWasserstein distance1 F. López-Mir, V. Naranjo, S. Morales, J. Angulo. Probability Density Function ofObject Contours Using Regional Regularized Stochastic Watershed. In IEEE ICIP'14.28 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo estimationDistance of meansHellinger distance2Bhattacharyya distanceWasserstein distance2 F. López-Mir, V. Naranjo, S. Morales, J. Angulo. Probability Density Function ofObject Contours Using Regional Regularized Stochastic Watershed. In IEEE ICIP'14.29 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo RobustestimationDistance of meansBhattacharyya distanceHellinger distanceWasserstein distance30 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWProbability density function MonteCarlo RobustestimationDistance of meansBhattacharyya distanceHellinger distanceWasserstein distance31 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationMultivariate Gaussian Model of Regions in SWComparison with SRMStatistical Region Merging (SRM)3depending on scale parameterSegmentation forSum of contours from nineQQ = 128Q256, 128, 64, 32, 16, 8, 4, 2, 1Segmentation for3 R.Nock, F. Nielsen. Statistical Region Merging.26(11):14521458, 2004.Q = 32IEEE Trans. on PAMI,32 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPerspectives1Stochastic Watershed using MonteCarlo Simulations2Multivariate Gaussian Model of Regions in SW3Perspectives33 / 34 Statistical Gaussian Model of Image Regions in Stochastic Watershed SegmentationPerspectivesPerspectivesIn addition to the color, each pixeltensorxdescribed also by its structureT (x) ∈ SPD(2):Each regionEach regionRi : N (0, Σi ), Σi = |Ri |−1 x∈Ri T (x)Ri : The histogram of structure tensors {T (x)}x∈RiFrom to color to multi/hyper-spectral images: High-dimensionalcovariance matrix estimated locally in regionsSupervised segmentation: Distance learning from training images ofannotated contoursMATLAB code available.34 / 34

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
A technique of spatial-spectral quantization of hyperspectral images is introduced. Thus a quantized hyperspectral image is just summarized by K spectra which represent the spatial and spectral structures of the image. The proposed technique is based on α-connected components on a region adjacency graph. The main ingredient is a dissimilarity metric. In order to choose the metric that best fit the hyperspectral data manifold, a comparison of different probabilistic dissimilarity measures is achieved.
 
no preview

ÉÙ ÒØ Þ Ø ÓÒ ÓÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð Ñ Ñ Ò ÓÐ Ù× ÒÔÖÓ Ð ×Ø
×Ø Ò
×ÒÒ Ê Æ ÀÁ¸ Â × × Æ ÍÄÇÅŹÅÁÆÒØÖÅÓÖÔ ÓÐÓË È Ö ×Ì
¸ ÈËÄ¹Ê ×Å ØÖ
Ñ Ø ÕÙ ¸ÍÒ Ú Ö× ØÝËÁ ¾¼½ ¾Ò
ÓÒ Ö Ò
ÓÒ ÓÑ ØÖ
Ë
Ò
Ó ÁÒ ÓÖÑ Ø ÓÒÇ
ØÓ Ö ¾¼½½»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÈÐ Ò½ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×¾»¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×Ñ Ò ÓÐ Ù× Ò ÔÖÓÀÝÔ Ö×Ô
ØÖ Ð ÑÐ ×Ø
×Ø Ò
××ÀÝÔ Ö×Ô
ØÖ Ð Ñ
ÓÒ× ×Ø× Ó × ÑÙÐØ Ò ÓÙ×
ÕÙ × Ø ÓÒ Ó ×Ô
ØÖÙÑÓ Ö
Ø Ð Ø Ø
Ô Ü Ð Ó Ø Ñ ºÙÖ Ì Ò Ó Ò ÙÐØÖ » ÝÔ Ö¹×Ô
ØÖ Ð ÑÝ × Ø ÐРؽ½ Å ÒÓР׸ º¸ Å Ö Ò¸ º¸ ² Ë Û¸ º º ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð ÑÔÖÓ
×× ÒÓÖ ÙØÓÑ Ø
Ø Ö Ø Ø
Ø ÓÒ ÔÔÐ
Ø ÓÒ׺ Ä Ò
ÓÐÒ Ä ÓÖ ØÓÖÝ ÂÓÙÖÒ Ð¸ ½ ´½µ¸ ¹½½ º¿»¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×Ñ Ò ÓÐ Ù× Ò ÔÖÓÀÝÔ Ö×Ô
ØÖ Ð ÑÌ ÖÐ ×Ø
×Ø Ò
××Ö ØÛÓ Û Ý× ØÓ Ñ Ò ÔÙÐ Ø ÀÝÔ Ö×Ô
ØÖ Ð Ñ×ÙÖ Ê ÔÖ × ÒØ Ø ÓÒ Ó Ò ÝÔ Ö×Ô
ØÖ Ð Ñ¾× Ú ÒØØ × ØÝÔ Ó Ñ
ÓÒØ Ò× Ö ÙÒ ÒØ Ò ÓÖÑ Ø ÓÒ
ÓÖÖ Ð Ø Ú Ö Ð ×ÀÑ Ò× ÓÒ Ð ØÝ ÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ÙÒ Ö Ó
ÒÒ Ð× »ÙÐØÖ ×Ô
ØÖ Ð Ô
ØÙÖØ ÓÙ× Ò Ó
ÒÒ Ð×¾ Å ÒÓР׸ º¸ Å Ö Ò¸ º¸ ² Ë Û¸ º º ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð ÑÔÖÓ
×× ÒÓÖ ÙØÓÑ Ø
Ø Ö Ø Ø
Ø ÓÒ ÔÔÐ
Ø ÓÒ׺ Ä Ò
ÓÐÒ Ä ÓÖ ØÓÖÝ ÂÓÙÖÒ Ð¸ ½ ´½µ¸ ¹½½ º»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒËØ Ø Ó ØÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐËØ Ø Ó ØØ×Ø Ò
×Ð ×Ø
×Ø Ò
×ÖØ ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ø×Ø Ò
×Ò ¸ º Áº ´¾¼¼¿µº ÀÝÔ Ö×Ô
ØÖ Ð Ñ Ò Ø
Ò ÕÙ × ÓÖ ×Ô
ØÖ ÐØ
Ø ÓÒ Ò
Ð ××
Ø ÓÒ ´ÎÓк ½µº ËÔÖ Ò Ö Ë
Ò
² Ù× Ò ××Å ºÈ
Ð ¸ Ⱥ¸ ² Ù Ò¸ ʺ Ⱥ ´¾¼¼¿µº ×× Ñ Ð Ö Øݹ ×
Ð ××
Ø ÓÒ Ó×Ô
ØÖ
ÓÑÔÙØ Ø ÓÒ Ð ××٠׺ Ê Ð¹Ì Ñ ÁÑ Ò ¸ ´ µ¸ ¾¿ ¹¾ ºÅ ¸ ĺ¸ Ö Û ÓÖ ¸ ź ź¸ ² Ì Ò¸ º ´¾¼½¼µº ÄÓ
Ð Ñ Ò ÓÐÐ ÖÒ Ò ¹ × ¹Ò Ö ×عÒÓÖ ÓÖ ÝÔ Ö×Ô
ØÖ Ð Ñ
Ð ××
Ø ÓÒº Ó×
Ò
Ò Ê ÑÓØ Ë Ò× Ò ¸ ÁÌÖ Ò×
Ø ÓÒ×ÓÒ¸ ´½½µ¸ ¼ ¹ ½¼ ºÖ Û ÓÖ ¸ ź ź¸ Å ¸ ĺ¸ ² à Ѹ Ϻ ´¾¼½½µº ÜÔÐÓÖ Ò ÒÓÒÐ Ò ÖÑ Ò ÓÐ Ð ÖÒ Ò ÓÖ
Ð ××
Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð Ø º ÁÒ ÇÔØ
ÐÊ ÑÓØ Ë Ò× Ò ´ÔÔº ¾¼ ¹¾¿ µº ËÔÖ Ò Ö ÖÐ Ò À Ð Ö ºÙ Ù Ò¸ ĺ¸ Î Ð ×
Ó¹ ÓÖ ÖÓ¸ ˺¸ ² ËÓ ÐÐ ¸ Ⱥ ´¾¼½ µº ÄÓ
Ð ÑÙØÙ ÐÒ ÓÖÑ Ø ÓÒ ÓÖ ×× Ñ Ð Ö Øݹ × Ñ× Ñ ÒØ Ø ÓÒº ÂÓÙÖÒ Ð ÓÑ Ø Ñ Ø
Ð Ñ Ò Ò Ú × ÓÒ¸ ´¿µ¸ ¾ ¹ º»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓØÅÓ Ð Ó ØÄ Ø Ù×ÒÓØÐ ×Ø
×Ø Ò
×ØÝÒ ∈ N Ø ÒÙÑ Ö Ó Ô Ü Ð ÓÒ Ø Ñ º ËÓ Ø ÒÙÑ Ö Ó ×Ô
ØÖÖ ÒØ= (Ü , . . . , Ü ) ∈ R ¸ Ò= (Ý , . . . , Ý ) ∈ R ¸ ØÛÓ½½×Ô
ØÖ Ó Ø ÑÁØ × ÔÓ×× Ð ØÓ ÒÓÖÑ Ð Þ Ø Ñ¸ ×Ù
Ø Ø Ø Ý)∈R ÒÐ Ø Ù×
ÓÒ× Ö È = ( ½ , . . . ,½È =(,...,) ∈ R ¸ Û
Ö ÔÖ × ÒØ ØØ × Ú
ØÓÖ׺ÜÝÝÝÝÜÜÜÝÜÚ Ø× Ñ ÒÓÖÑ ½¸ÒÓÖÑ Ð Þ Ú Ö× ÓÒ Ó»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑØÅÓ Ð Ó ØÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ØÙÖ Ê ÔÖ × ÒØ Ø ÓÒ Ó Ò ÝÔ Ö×Ô
ØÖ Ð Ñ¿¿ ÄÌÅ ÆƸ º¸ Ç ÁÇƸ ƺ¸ Å Ä Í ÀÄÁƸ ˺¸ ² ÌÇÍÊÆ Ê Ì¸ º ºÑ Ð Ò ÒÓÒ¹Ð Ò Ö ³ Ñ × ÝÔ Ö×Ô
ØÖ Ð × Ð³ÓÒ
Ø ÓÒ× Ö Ð ××Ø ÑÓ Ò Ö ×
ÖÖ × ÓÖØ Ó ÓÒ Ùܺ»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÅÓ Ð Ó ØÝÔ Ö×Ô
ØÖ Ð ÑØÅÓ Ð Ó ØÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ØÄ Ø Ù× ÒÓØ Ý Ê ∈ N Ø ÒÙÑ Ö Ó
Ð ×× ÓÒ ØÐ Ò Û Ø Ø ÒÙÑ Ö Ó Ñ Ø Ö Ð×ºÄ Ø Ù×
ÓÒ× Ö Ø Ø ÛÚÖÓÙÒ ØÖÙØ º´ µÑ¸Û
×´ µÙÖ Ë
ØØ Ö ÔÐÓØ Ó ×Ø Ò
× Ó Ô Ü Ð× ´Ó È Ú Ñ µ Ó
ÐÙ×Ø Ö ½ ´ ÒÐÙ µ ØÓ Ø
ÒØÖÓ Ó
ÐÙ×Ø Ö ½ Ò ¸ ØÓ ØÓ Ø
ÒØÖÓ Ó
ÐÙ×Ø Ö ¾ Ò ¸ ØÓØ
ÒØÖÓ Ó
ÐÙ×Ø Ö ¿ Ò º Ë Ñ Ð ÖÐݸ Ò Ö ÓÖ
ÐÙ×Ø Ö ¾¸ Ò Ò Ö Ò ÓÖ
ÐÙ×Ø Ö ¿º ÁÒ ´ µ Û Ù× Ø Ä¾ ÒÓÖÑ × ×× Ñ Ð Ö ØÝ Ñ ×ÙÖ ¸ Ò ´ µ Û Ù×Ø ÃÙÐÐ
¹Ä Ð Ö Ú Ö Ò
º»¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÁÒØÖÓ Ù
Ø ÓÒÓ ÐÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÓÐÖ×ظ Û Û ÒØ ØÓ Ò×Ø Ò
Ø Ø ÛÓÙÐ × Ô Ö ØØ ÖÓÑÖ ÒØ
ÐÙ×Ø Ö׸ Û Ø ÓÙØ ÒÝ ÔÖ ÓÖ ÒÓÛÐºÌ Ò ÓÙÖ Ó Ð ÛÓÙÐØÓ ÕÙ ÒØ Þ ØÝÔ Ö×Ô
ØÖ Ð Ñ

ÓÖ Ò ØÓ Ø × ×Ø Ò
¸ Ò ØÓ ×Ô Ø Ð Ò ÓÖÑ Ø ÓÒº»¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×½Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×½¼ » ¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Å Ò ÓÛ× ÒÓÖÑ× ÓÒ Ô×Ø Ò
Ò Ë Å Ò ×Ô Ö
ÐÄÔÏ Ù×ØÔ−ÒÓÖѺÈÜ − ÈݽÔ=|ÈÜ , − ÈÝ , |ÔÔºÏ
Ò
ÓÒ× Ö Ø Ø
Ô Ü Ð ÓÐÐÓÛ ÑÙÐØ ÒÓÑ Ð ×ØÖ ÙØ ÓÒ ÓÔ Ö Ñ Ø Ö ÈÜ º ÁØ × Ø Ò ÔÓ×× Ð ØÓÒ× Ö¹Ê Ó ×Ø Ò
ØÛ Ò , ¸Ö ÔÖ × ÒØÝ Ø Ö Ô ÈÜ , ÈÝ ¸ Û
× Ø ×Ô Ö
Ð ×Ø Ò
ËÔ Ö ( , ) = ¾ Ö

Ó×(ÈÜ , ÈÝ , )Ì Ö Ö ×ÓÑ × Ñ Ð Ö Ø × Û Ø×Ø Ò
Ø Ø ×
Ð ××
ÐÐÝ Ù× ÓÒÝÔ Ö×Ô
ØÖ Ð Ñ × Ø Ë Å¸ Û

ÒÒØÛ Ò ,×½)=( ¾, ¾)Ë Å ( , ) = Ö

Ó×(¾ ËÔ Ö
ÐÌ × Ñ ØÖ
× ÒÚ Ö ÒØ ØÓ ×Ô
ØÖ Ð ÑÙÐØ ÔÐ
Ø ÓÒ × Ò
∗Ë Å (α , ) = Ë Å ( , )¸ ∀α ∈ R º ËÓ Ø × ÒÚ Ö ÒØ ØÓ ÐÐÙÑ Ò Ø ÓÒ
Ò ×¸ Û

Ò ÔÖÓ Ð Ñ Ø
ÓÒ Ö ÑÓØ × Ò× Ò º½½ » ¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ÒÝ Ú Ö Ò
×Ï
Ò Ù× ÓÒ ÝÔ Ö×Ô
ØÖ Ð ÑÚ Ö Ò
¸ Ø Ø Ö
ÐÐ Ø Ê ÒÝ Ú Ö Ò
Ó×ØÖ ÙØ ÓÒ ÈÜ ÖÓÑ×ØÖ ÙØ ÓÒ ÈÝÓÖ Ö α¸ α > ¼ Ó½½αÈÜ , ÈÝ ,−α .Ëα (ÈÜ ÈÝ ) =ÐÓα−½Ëα→½ (ÈÜ ÈÝ ) = Ë (ÈÜ ÈÝ ) Û Ö Ë × Ø ÙÐÐ
Ú Ö Ò
ÒÝÈÜ ,Ë (ÈÜ ÈÝ ) = ÈÜ , ÐÓÈÝ ,α = ½/¾ ÛÚ Ëα=½/¾ (ÈÜ ÈÝ ) = −¾ ÐÓ ½ − À ÐÐ Ò Ö ( , )/¾ Û ÖÒÝÀ ÐÐ Ò Ö × Ø À ÐÐ Ò Ö ×Ø Ò
À ÐÐ ( ,√) = (½/ ¾) (=½ÈÜ ,−ÈÝ ,½/¾)¾ ,ÕÙ Ö Ø
Ê ÒÝ Ú Ö Ò
¸ Û
× Ëα=¾ (ÈÜ ÈÝ ) =× Ø χ¾ ×Ø Ò
ÒÝÒ Ø
× α = ¾ Ð × ØÓ ØÐÓ ½ + χ¾ ( , ) ¸ Û Öχ¾χ¾ (,ÈÜ ,)==½ÛØ Ñ =ÈÜ ,−ѾÑ+ ÈÝ ,¾,½¾ » ¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×ÅÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ð ÒÓ × ×Ø Ò
Ï
Ò
ÓÒ× Ö
Ð ××
Ð ÑÓ Ð ÓÒ ÝÔ Ö×Ô
ØÖ Ð ÑÛ
××ÙÑ ×Ø Ø
×Ô
ØÖÙÑÓÐÐÓÛ× ×
ÓÖÖÙÔØÝ ÒØ Ú ÑÙÐØ Ú Ö ØÒÓÖÑ Ð ÒÓ × Ó Ñ Ò ¼ Ò Û ØÜ
ÓÚ Ö Ò
ÓÖ ÐÐ Ø ×Ô
ØÖ ºËÓ Û ÛÖ ØØ Ö Ð ×Ô
ØÖ ÒØ Ó × ÖÚ Ø ÓÒ Û Ø Ø= + Æ Ò ×Ó ∼ N (µ , Σ)¸ Û Ø µ = º ÁØ ØÙÖÒ× ÓÙØ Ø Ø ØÔÔ Ò× ØÓ Ø Å Ð ÒÓ ×× Ö¹Ê Ó ×Ø Ò
ØÛ Ò ,×Ø Ò
ÒÝÅÌ ÒØ Ö ×Ø̽−Ð ÒÓ ×( , ) = (µ − µ ) Σ (µ − µ )´½µÕÙ ×Ø ÓÒ ÓÒ ÓÛ ØÓ ×× ×× Σº½¿ » ¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÌ ÈÖÓÐ ØÝ ×Ø Ò
×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÖØ ÅÓÚ Ö ×Ø Ò
Ä Ø Ù×
ÓÒ× Ö ØÛÓ ×Ô
ØÖÒÌ Ö ÖØ ÅÓÚ Ö ×Ø Ò

Ò¸ Ö ÔÖ × ÒØÒÝÒÅ (ÈÜ , ÈÝ ) = α Ñ∈M,ÝØÖ Ö ×Ô
Ø Ú ÔÈÜ , ÈÝ ºα,( , )=½ =½Û Ö M = {α , ≥ ¼; =½ α , = ÈÝ , ; =½ α , == ÈÜ , } Ò× Ø
Ó×ØÙÒ
Ø ÓÒºÖ ÒØ
Ó
× Ó
Ó×Ø ÙÒ
Ø ÓÒ× ÚÒ
ÓÒ× Ö º À Ö Û Û ÐÐ
ÓÓ× ØÛÓÖ ÒØ
Ó×Ø ÙÒ
Ø ÓÒ׺ ÌÖ×Ø ÓÒ
ÒÒ×½½( , ) =ÁÒ Ø ×
× ¸ ØØÔÔ Ò× Ø Ø ØÖØ ÅÓÚ Ö½Å ½ (ÈÜ , ÈÝ ) =ÓØ Ö
Ó×Ø ÙÒ
Ø ÓÒ × Ð
ØÚ ÐÙ Ó Ø×Ø Ò
×Ü− ݽ׾( , ) =Û Ö × ×Ø| − || − || − |≤×× ÓØ ÖÛ ×Ø Ö × ÓÐ º Ï Û ÐÐ ÛÖ Ø Ø × ×Ø Ò
Å ¾º½ »¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×½Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×½ »¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÉÙ ÒØ Þ Ø ÓÒ × Ø ÔÖÓ
×× Û
ÐÐÓÛ× ØÓ ÔÔÖÓ
× Ò Ð Û Ø Ð Ö× Ø Ó Ú ÐÙ × Ý × Ò Ð ÓÒ ×Ñ ÐÐ Ö × Øº ÁÑ × Ö × Ò Ð× ÓÒ ×Ô Ø ÐÓÑ Ò¸ ×Ó Ø Ö ÕÙ ÒØ Þ Ø ÓÒ × ÓÙÐ Ø × ÒØÓ

ÓÙÒØ Ø ÜÔ
Ø×Ô Ø Ð
Ó Ö Ò
º ÌÓ
Ú Ø × Ó Ð¸ Û
ÓÓ× ØÓ Ù× α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× Ö ÔÖ × ÒØ Ø ÓÒ ¸ Ø Ø ÔÖÓ Ù
× Ò ÑÔ ÖØ Ø ÓÒ ÒØÓÓÑÓ ÒÓÙ× ×Ô Ø Ð
Ð ×× ×ºËÓ ÐÐ ¸ Ⱥ ´¾¼¼ µº ÓÒ×ØÖ Ò
ÓÒÒ
Ø Ú ØÝ ÓÖ Ö Ö

Ð Ñ Ô ÖØ Ø ÓÒ Ò Ò× ÑÔÐ
Ø ÓÒº È ØØ ÖÒ Ò ÐÝ× × Ò Å
Ò ÁÒØ ÐÐ Ò
¸ ÁÌÖ Ò×
Ø ÓÒ× ÓÒ¸ ¿¼´ µ¸½½¿¾¹½½ ºÙ Ù Ò¸ ĺ¸ Î Ð ×
Ó¹ ÓÖ ÖÓ¸ ˺¸ ² ËÓ ÐÐ ¸ Ⱥ ´¾¼½ µº ÄÓ
Ð ÑÙØÙ Ð Ò ÓÖÑ Ø ÓÒ ÓÖ×× Ñ Ð Ö Øݹ × Ñ × Ñ ÒØ Ø ÓÒº ÂÓÙÖÒ Ð Ó Ñ Ø Ñ Ø
Ð Ñ Ò Ò Ú × ÓÒ¸´¿µ¸ ¾ ¹ º½ »¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÚÒ×Ø Ò
: R × R −→ R¸ ØÛÓ Ô Ü Ð× ( (Ü ), (Ý )) ∈ (R )ÐÓÒ ØÓ Ø × Ñ α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× ÓÒ ÓÒÐÝ Ø Ö ×Ô Ø (Ô , . . . , Ô ) ∈×Ù
× Ô = Ü Ò Ô = Ý Ò∀ ∈ [½, Ò − ½]¸ ( (Ô ), (Ô + )) ≤ α Ò α ∈ R+¾¼ÒÒ¼Ò½ÙÖ ÉÙ × ¹ Ø ÞÓÒ Ó Ø
ÐÙ×Ø Ö×ºÔ ÚÝÔ Ö×Ô
ØÖ Ð Ñ¸Ø ÖÖ¿½½ »¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÖ×Ø Ô ÖØ Ø ÓÒ Ó Ò ÑØÓ ÛÓÖ Û Ø ×ÙÔ ÖÔ Ü Ð×Ø Ò Û ØÖ Ò× ÓÖÑ Ø ÑÖ ÔÖ × ÒØ Ø ÓÒ ÒØÓ Ö ÔÖ ÔÖ × ÒØ Ø ÓÒ
ÐÐ Ø Ö ÓÒ
Ò
Ý Ö Ô ´Ê µº ÁØ ×Ö Ô Û Ö
ÒÓ × ×ÙÔ ÖÔ Ü Ð¸ Ò× Ö ÔÖ × ÒØ Ø×× Ñ Ð Ö ØÝ ØÛ Ò ×ÙÔ ÖÔ Ü Ð׺´ µ´ µ½ »¾ ÉÙ ÒØ Þ Ø ÓÒ Ó ÝÔ Ö×Ô
ØÖ Ð ÑÉÙ ÒØ Þ Ø ÓÒ Ø
Ò ÕÙ Ù×Ñ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÉÙ ÒØ Þ Ø ÓÒÏ Ö ÔÖ × ÒØ
ÒÓ ´Û
× × Ø Ó Ô Ü Ð×µ Ó ØÖ Ô ÝØÖÝ
ÒØÖ

ÓÖ Ò ØÓ Ø Ñ ØÖ
Ó Ø × Ø Ó Ô Ü Ð× È ¸ Û
×Ý(Ñ, Ü ).Ñ= Ö ÑÒÑ∈ÈÜÒ´¾µ∈ÈÁ ÐÐ Ø Û Ø× Ö Õ٠и Û × Ý × ÑÔÐÝ Ø Ø Ñ × ØÌ Ò Û
Ð
ÙÐ Ø Ø α−
ÓÒÒ
Ø
ÓÑÔÓÒ ÒØ× Ó ØÓÑ ØÖ
ÑÖ Ô ºÒº½ »¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×½ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×ÁÒØÖÓ Ù
Ø ÓÒÀÝÔ Ö×Ô
ØÖ Ð Ñ ×ËØ Ø Ó ØÖØ ÓÒ ÝÔ Ö×Ô
ØÖ ÐÅÓ Ð Ó ØØÓ Ð¾ÌÈÖÓ¿Ø×Ø Ò
×Ð ØÝ ×Ø Ò
×ÉÙ ÒØ
Ø ÓÒ Ø
Ò ÕÙ Ù×Ê ×ÙÐØ×¾¼ » ¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×ÌÓ Ú ÐÙ Ø ØÑ ×ÙÖ × Û Ù×Ö ÒØ
Ö Ø Ö ºÈÖ
Ø
Ð ××C½
ØÙ Ð
Ð ××ÌC½C¾C¿C¿½½½¾½¿¾½¾¾¾¿¿½¿¾¿¿ÇÚ Ö ÐÐ

ÙÖ
ÝÇ¿=½=¿¿=½ÌC¾× ½¼¼´¿µ=½Ð ××

ÙÖ
Ý Ó
Ð ××=¿× ½¼¼´ µ× ½¼¼´ µ=½ÌÐ ××

ÙÖ
Ý Ó
Ð ×׿==½¿¾½ » ¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×Ê ×ÙÐØ× ÓÒ È ÚĽÇÊ ÒËÆÊľÄ∞¼.¼¼½¼º¼¼½¼º ¿¾¾º ¾¼º¼¼½¼º¼¼½¼º ¿¾¾º¼º¼¼¿¼º¼½¿¼º ½¾¾ºËÁÇÊ ÒËÆÊÌ Ð¼º¼½¾¼º¼¼º¿¾½º¾ËÔ ÖË Å¼º¼½¾¼º¼¼º¼¼º½¾¼º ½¼º ¾¾¾º ¾½ º ¼Ê ×ÙÐØ× ÓÒ È ÚŠнŠо¼º¼¼¼º¼¼º¼¼¿¼º¾¾¼º¼º¿¾½ºÓÑÔ Ö ×ÓÒ Ó ÔÖÓÑÀ Ðмº¼¼½¼º¼¼½¼º¾½º ¾ÑÃÓÐÑÓ¼º¼¼¼º¼¼º ¼¾¾º¼χ¾¼º¾¿¼º½½¼º¿¿¾¿º ¼Å ½¼º¼¼¼º½¼º ¼¾¾ºËËα=½/¾Ëα=¾¾½º¼º ¼¼º¾¾¼º¾½º¼º ¼¼º¾¾¼º¾¾¾½º¾¼¼º ½¼º¾¼º¾½Å ¾¼º¼¼¼º¼¼º ½Ð ×Ø
×Ø Ò
× ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ñ׺¾¾ » ¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ׼ÇÊ ÒËÆÊľÄ∞¼º¼½¼º¼½¾¼º½¾º¼º¼½¼º¼¼¾¾¼º½¾º¼º¼½½¼º¼½¼º½¾º ¿ËÁÇÊ ÒËÆÊÌ Ð¼º¼½¼¼º¼¼º½¾º ¿Ê ×ÙÐØ× ÓÒ ÁÒËÔ ÖÒ ÈÒ ×Ë ÅÑÀ Ðмº¼½¼º¼¼º¼½¼º¼½¼º½¿¼º¼½¼º¼º¾¾¼º½¾º ¿º½¾ºÊ ×ÙÐØ× ÓÒ ÁÒ Ò È Ò × ÑŠнŠоÃÓÐÑÓ¼º¼½¼º¼¼º¼½ ¾¼º¼½¼º½¿¼º¼¼½¼º¿¼º¿¼º½¾º½½º¼½ÓÑÔ Ö ×ÓÒ Ó ÔÖÓχ¾¼º¿¼¼º¾¼º½½ º¼½Å ½¼º¼¾¼º¼¼º½¾ºËËα=½/¾Ëα=¾¼º¼½¼¼º¼¼º½¾º¼º¼½¼¼º¼¼º½¾º ½¼º¼½¼¼º¼¼º½¾ºÅ ¾¼º¼½ ¾¼º¼½¼º ¾Ð ×Ø
×Ø Ò
× ÓÒ ÝÔ Ö×Ô
ØÖ Ð Ñ׺¾¿ » ¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×´ µ´ µ´
µ´ µ´µÙÖ ´ µ Ð× Ê
ÓÐÓÖ Ñ´Ù× Ò Ø Ö ×Ô
ØÖ Ð Ò ×µ Ó ÁÒ Ò È Ò ×ÝÔ Ö×Ô
ØÖ Ð Ñ º Ð× Ê
ÓÐÓÖ Ñ Ó Ø ÕÙ ÒØ ÞÝÔ Ö×Ô
ØÖ ÐÑ Ø Ò × ØÓ Ò ´ µ Ø ÒÓÖÑ ¾¸ ´
µ Ø Ë Å¸ ´ µ Ø χ¾ ×Ø Ò
¸ ´ µ ØÅ º¾ »¾ ÉÙ ÒØ Þ Ø ÓÒ ÓÊ ×ÙÐØ×ÝÔ Ö×Ô
ØÖ Ð ÑÑ Ò ÓÐ Ù× Ò ÔÖÓÐ ×Ø
×Ø Ò
×Ê ×ÙÐØ×´ µ´ µ´
µ´ µ´µÙÖ ´ µ Ð× Ê
ÓÐÓÖ Ñ´Ù× Ò Ø Ö ×Ô
ØÖ Ð Ò ×µ Ó È ÚÝÔ Ö×Ô
ØÖ Ð Ñ º Ð× Ê
ÓÐÓÖ Ñ Ó Ø ÕÙ ÒØ ÞÝÔ Ö×Ô
ØÖ ÐÑ Ã = ¿¼¼¼ Ø Ò × ØÓ Ò ´ µ Ø ÒÓÖÑ ¾¸ ´
µ Ø Ë Å¸ ´ µ Ø χ¾×Ø Ò
¸ ´ µ ØÅ º¾ »¾

Optimal Transport and applications in Imagery/Statistics (chaired by Bertrand Maury, Jérémie Bigot)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Optimal transport (OT) is a major statistical tool to measure similarity between features or to match and average features. However, OT requires some relaxation and regularization to be robust to outliers. With relaxed methods, as one feature can be matched to several ones, important interpolations between different features arise. This is not an issue for comparison purposes, but it involves strong and unwanted smoothing for transfer applications. We thus introduce a new regularized method based on a non-convex formulation that minimizes transport dispersion by enforcing the one-to-one matching of features. The interest of the approach is demonstrated for color transfer purposes.
 

Introduction1 / 30Adaptive color transferwith relaxed optimal transportJulien Rabin1 , Sira Ferradans2 and Nicolas Papadakis31GREYC, University of Caen, 2 Data group, ENS, 3 CNRS, Institut de Mathématiques de BordeauxConference on Geometric Science of InformationJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Introduction2 / 30Optimal transport on histogramsMonge-Kantorovitch (MK) discrete mass transportation problem:Map µ0 onto µ1 while minimizing the total transport cost�������������The two histograms must have the same mass.Optimal transport cost is called the Wasserstein distance (Earth Mover’sDistance)Optimal transport map is the application mapping µ0 onto µ1J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Introduction3 / 30Applications in Image Processing and Computer VisionOptimal transport as a framework to define statistical-based toolsApplications to many imaging and computer vision problems:• Robust dissimilarity measure (Optimal transport cost):Image retrieval [Rubner et al., 2000] [Pele and Werman, 2009]SIFT matching [Pele and Werman, 2008] [Rabin et al., 2009]3D shape recognition, Feature detection [Tomasi]Object segmentation [Ni et al., 2009] [Swoboda and Schnorr, 2013]• Tool for matching/interpolation (Optimal transport map):Non-rigid shape matching, image registration [Angenent et al., 2004]Texture synthesis and mixing [Ferradans et al., 2013]Histogram specification and averaging [Delon, 2004]Color transfer [Pitié et al., 2007], [Rabin et al., 2011b]Not to mention other applications (physics, economy, etc).J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Introduction4 / 30Color transferOptimal transport of µ onto νTarget image (µ)Target image after color transferSource image (ν)Limitations:• Mass conservation artifacts• Irregularity of optimal transport mapJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Introduction5 / 30OutlineOutline:Part I. Computation of optimal transport between histogramsPart II. Optimal transport relaxation and regularizationApplication to color transferJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Optimal transport framework6 / 30Part IWasserstein distance between histogramsJ. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Optimal transport framework7 / 30Formulation for clouds of pointsDefinition: L2 -Wasserstein Distance Given two clouds of points1X , Y ⊂ Rd×N of N elements in Rd with equal masses N , the quadraticWasserstein distance is defined asW2 (X , Y )2 = minσ∈ΣN1NNXi − Yσ(i)2i=1where ΣN is the set of all permutations of N elements.J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport(1) Optimal transport framework7 / 30Formulation for clouds of pointsDefinition: L2 -Wasserstein Distance Given two clouds of points1X , Y ⊂ Rd×N of N elements in Rd with equal masses N , the quadraticWasserstein distance is defined asW2 (X , Y )2 = minσ∈ΣN1NNXi − Yσ(i)2(1)i=1where ΣN is the set of all permutations of N elements.⇔ Optimal Assignment problem, can be computed using standardsorting algorithms when d = 1J. Rabin, S. Ferradans, N. PapadakisAdaptive color transfer with relaxed optimal transport Optimal transport framework8 / 30Exact solution in unidimensional case (d = 1) for histogramsHistograms may be seen as clouds of points with non-uniform masses, sothatMmi δXi (x),µ(x)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We introduce the generalized Pareto distributions as a statistical model to describe thresholded edge-magnitude image filter results. Compared to the more commonWeibull or generalized extreme value distributions these distributions have at least two important advantages, the usage of the high threshold value assures that only the most important edge points enter the statistical analysis and the estimation is computationally more efficient since a much smaller number of data points have to be processed. The generalized Pareto distributions with a common threshold zero form a two-dimensional Riemann manifold with the metric given by the Fisher information matrix. We compute the Fisher matrix for shape parameters greater than -0.5 and show that the determinant of its inverse is a product of a polynomial in the shape parameter and the squared scale parameter. We apply this result by using the determinant as a sharpness function in an autofocus algorithm. We test the method on a large database of microscopy images with given ground truth focus results. We found that for a vast majority of the focus sequences the results are in the correct focal range. Cases where the algorithm fails are specimen with too few objects and sequences where contributions from different layers result in a multi-modal sharpness curve. Using the geometry of the manifold of generalized Pareto distributions more efficient autofocus algorithms can be constructed but these optimizations are not included here.
 

Generalized Pareto Distributions, Image Statistics andAutofocusing in Automated MicroscopyReiner Lenz Microscopy34 slices changing focus along the optical axis Focal Sequence – First 4x16 images 4Focal Sequence – Next 4x16 images 5Focal Sequence – Final 4x16 images Total Focus6 7Observations• Auto-focus is easy••••It is independent on image content (what is in the image)It is independent of imaging method (how image is produced)It is fast (‘real-time’)It is local (which part of the image is in focus)• It is obviously useful in applications (microscopy, camera, …)• It is useful in understanding low-level vision processes• It is illustrates relation between scene-statistics and vision 8Processing Pipeline / TechniquesFilteringGroupRepresentationsThresholdingExtreme ValueStatisticsCritical PointsInformationGeometry 9FilteringRepresentations of dihedral GroupsMost images are defined on square gridsThe symmetry group of square grids is the dihedral group D(4)Consists of 8 elements: 4 rotations and 4 (rotation+reflection)For a 5x5 array choose six filter pairs resultingin a 6x2 vector at each pixel

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We study barycenters in the Wasserstein space Pp(E) of a locally compact geodesic space (E, d). In this framework, we define the barycenter of a measure ℙ on Pp(E) as its Fréchet mean. The paper establishes its existence and states consistency with respect to ℙ. We thus extends previous results on ℝ d , with conditions on ℙ or on the sequence converging to ℙ for consistency.
 

Barycenter in Wasserstein spaces:existence and consistencyThibaut Le Gouic and Jean-Michel Loubes*Institut de Math´matiques de Marseillee´Ecole Centrale MarseilleInstitut Math´matique de Toulouse*eOctober 29th 20151 / 23 Barycenter in Wasserstein spacesBarycenterThe barycenter of a set {xi }1≤i≤J of Rd for J points endowed withweights (λi )1≤i≤J is defined asλi xi .1≤i≤JIt is characterized by being the minimizer ofx→λi x − xi2.1≤i≤J2 / 23 Barycenter in Wasserstein spacesBarycenterThe barycenter of a set {xi }1≤i≤J of Rd for J points endowed withweights (λi )1≤i≤J is defined asλi xi .1≤i≤JIt is characterized by being the minimizer ofx→λi x − xi2.1≤i≤JReplace (Rd , . ) by a metric space (E , d), and minimizeλi d(x, xi )2 .x→1≤i≤J2 / 23 Barycenter in Wasserstein spacesBarycenterLikewise, given a random variable/vector of law µ on Rd , itsexpectation EX is characterized by being the minimizer ofx →E X −x2.3 / 23 Barycenter in Wasserstein spacesBarycenterLikewise, given a random variable/vector of law µ on Rd , itsexpectation EX is characterized by being the minimizer ofx →E X −x2.→ extension to a metric space (it summarizes the informationstaying in a geodesic space)3 / 23 Barycenter in Wasserstein spacesBarycenterDefinition (p-barycenter)Given a probability measure µ on a geodesic space (E , d), the setarg min x ∈ E ;d(x, y )p dµ(y ) ,is called the set of p-barycenters of µ.4 / 23 Barycenter in Wasserstein spacesBarycenterDefinition (p-barycenter)Given a probability measure µ on a geodesic space (E , d), the setarg min x ∈ E ;d(x, y )p dµ(y ) ,is called the set of p-barycenters of µ.Existence ?4 / 23 1Geodesic space2Wasserstein space3Applications5 / 23 Barycenter in Wasserstein spacesGeodesic spaceDefinition (Geodesic space)A complete metric space (E , d) is said to be geodesic if for allx, y ∈ E , there exists z ∈ E such that1d(x, y ) = d(x, z) = d(z, y ).26 / 23 Barycenter in Wasserstein spacesGeodesic spaceDefinition (Geodesic space)A complete metric space (E , d) is said to be geodesic if for allx, y ∈ E , there exists z ∈ E such that1d(x, y ) = d(x, z) = d(z, y ).2Include many spaces (vectorial normed spaces, compactmanifolds, ...),6 / 23 Barycenter in Wasserstein spacesGeodesic spaceProposition (Existence)The p-barycenter of any probability measure on a locally compactgeodesic space, with finite moments of order p, exists.7 / 23 Barycenter in Wasserstein spacesGeodesic spaceProposition (Existence)The p-barycenter of any probability measure on a locally compactgeodesic space, with finite moments of order p, exists.Not unique e.g. the sphereNon positively curved space → unique barycenter,1-Lipschitz on 2-Wasserstein space.7 / 23 1Geodesic space2Wasserstein space3Applications8 / 23 Barycenter in Wasserstein spacesWasserstein metricDefinition (Wasserstein metric)Let µ and ν be two probability measures on a metric space (E , d)and p ≥ 1.The p-Wasserstein distance between µ and ν is defined aspWp (µ, ν) =infπ∈Γ(µ,ν)dE (x, y )p dπ(x, y ),where Γ(µ, ν) is the set of all probability measures on E × E withmarginals µ and ν.9 / 23 Barycenter in Wasserstein spacesWasserstein metricDefinition (Wasserstein metric)Let µ and ν be two probability measures on a metric space (E , d)and p ≥ 1.The p-Wasserstein distance between µ and ν is defined aspWp (µ, ν) =infπ∈Γ(µ,ν)dE (x, y )p dπ(x, y ),where Γ(µ, ν) is the set of all probability measures on E × E withmarginals µ and ν.Defined for any measure for which moments of order p arefinite : Ed(X , x0 )p < ∞ (denote this set Pp (E )),It is a metric on Pp (E ) ; (Pp (E ), Wp ) is called theWasserstein space,The topology of this metric is the weak convergence topologyand convergence of moments of order p.9 / 23 Barycenter in Wasserstein spacesWasserstein metricThe Wasserstein space of a complete geodesic space is acomplete geodesic space.(Pp (E ), Wp ) is locally compact ⇔ (E , d) is compact.(E , d) ⊂ (Pp (E ), Wp ) isometrically.Existence of the barycenter on (Pp (E ), Wp ) ?10 / 23 Barycenter in Wasserstein spacesMeasurable barycenter applicationDefinition (Measurable barycenter application)Let (E , d) be a geodesic space. (E , d) is said to admitmeasurable barycenter applications if for any J ≥ 1 and anyweights (λj )1≤j≤J , there exists a measurable application T fromE J to E such that for all (x1 , ..., xJ ) ∈ E J ,Jminx∈EJpλj d(T (x1 , ..., xJ ), xj )p .λj d(x, xj ) =j=1j=111 / 23 Barycenter in Wasserstein spacesMeasurable barycenter applicationDefinition (Measurable barycenter application)Let (E , d) be a geodesic space. (E , d) is said to admitmeasurable barycenter applications if for any J ≥ 1 and anyweights (λj )1≤j≤J , there exists a measurable application T fromE J to E such that for all (x1 , ..., xJ ) ∈ E J ,Jminx∈EJpλj d(T (x1 , ..., xJ ), xj )p .λj d(x, xj ) =j=1j=1Locally compact geodesic spaces admit measurable barycenterapplications.11 / 23 Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.12 / 23 Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.Barycenter is not unique e.g. :1E = Rd with P = 1 δµ1 + 2 δµ2 ,21µ1 = 1 δ(−1,−1) + 1 δ(1,1) and µ2 = 2 δ(1,−1) + δ(−1,1)2212 / 23 Barycenter in Wasserstein spacesExistence of barycenterTheorem (Existence of barycenter)Let (E , d) be a geodesic space that admits measurable barycenterapplications. Then any probability measure P on (Pp (E ), Wp ) hasa barycenter.Barycenter is not unique e.g. :1E = Rd with P = 1 δµ1 + 2 δµ2 ,21µ1 = 1 δ(−1,−1) + 1 δ(1,1) and µ2 = 2 δ(1,−1) + δ(−1,1)22Consistency of the barycenter ?12 / 23 Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem2Weak consistency3Approximation by finitely supported measures13 / 23 Barycenter in Wasserstein spacesPush forwardDefinition (Push forward)Given a measure ν on E and an measurable applicationT : E → (F , F), the push forward of ν by T is given byT#ν (A) = ν T −1 (A) , ∀A ∈ F.Probabilist version : X is a r.v. on (Ω, A, P), then PX = X#P .14 / 23 Barycenter in Wasserstein spacesMultimarginal problemTheorem (Barycenter and multi-marginal problem[Agueh and Carlier, 2011])Let (E , d) be a complete separable geodesic space, p ≥ 1 andJ ∈ N∗ . Given (µi )1≤i≤J ∈ Pp (E )J and weights (λi )1≤i≤J , thereexists a measure γ ∈ Γ(µ1 , ..., µJ ) minimizingγ→ˆλi d(xi , x)p d γ (x1 , ..., xJ ).ˆinfx∈E1≤i≤JIf (E , d) admits a measurable barycenter applicationT : E J → E then the measure ν = T# γ is a barycenter of(µi )1≤i≤JIf T is unique, ν is of the form ν = T# γ.15 / 23 Barycenter in Wasserstein spacesWeak consistencyTheorem (Weak consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) converging to P ∈ Pp (E ). Take anybarycenter µj of Pj .Then the sequence (µj )j≥1 is (weakly) tight and any limit point isa barycenter of P.16 / 23 Barycenter in Wasserstein spacesApproximation by finitely supported measureProposition (Approximation by finitely supported measure)For any measure P on Pp (E ) there exists a sequence of finitelysupported measures (Pj )j≥1 ⊂ Pp (E ) such thatWp (Pj , P) → 0 as j → ∞.17 / 23 Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem2Weak consistency3Approximation by finitely supported measures18 / 23 Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P finitely supported.2Weak consistency3Approximation by finitely supported measures18 / 23 Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P finitely supported.2Weak consistency→ existence of barycenter for probabilities that can beapproximated by measures with barycenters.3Approximation by finitely supported measures18 / 23 Barycenter in Wasserstein spaces3 steps for existence1Multimarginal problem→ existence of barycenter for P finitely supported.2Weak consistency→ existence of barycenter for probabilities that can beapproximated by measures with barycenters.3Approximation by finitely supported measures→ any probability can be approximated by a finitely supportedprobability measure.18 / 23 Barycenter in Wasserstein spacesConsistency of the barycenterTheorem (Consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) and P ∈ Pp (E ). Take any barycenter µj ofPj .Then the sequence (µj )j≥1 is totally bounded in (Pp (E ), Wp ) andany limit point is a barycenter of P.19 / 23 Barycenter in Wasserstein spacesConsistency of the barycenterTheorem (Consistency of the barycenter)Let (E , d) be a geodesic space that admits measurable barycenter.Take (Pj )j≥1 ⊂ Pp (E ) and P ∈ Pp (E ). Take any barycenter µj ofPj .Then the sequence (µj )j≥1 is totally bounded in (Pp (E ), Wp ) andany limit point is a barycenter of P.Imply continuity of barycenter when barycenter are unique.No rate of convergence (barycenter Lipschitz on (E , d)Lipschitz on Pp (E )).Imply compactness of the set of barycenters.19 / 23 1Geodesic space2Wasserstein space3Applications20 / 23 Barycenter in Wasserstein spacesStatistical application : improvement of measures accuracyTake (µn )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J .iSet µn the barycenter of (µn )1≤j≤J .iBThen, as n → ∞,µ n → µB .B21 / 23 Barycenter in Wasserstein spacesStatistical application : improvement of measures accuracyTake (µn )1≤j≤J → µj when n → ∞ and weights (λj )1≤j≤J .iSet µn the barycenter of (µn )1≤j≤J .iBThen, as n → ∞,µ n → µB .BTexture mixing [Rabin et al., 2011]21 / 23 Barycenter in Wasserstein spacesStatistical application : growing number of measuresTake (µn )n≥1 such that1nnµi → P.i=1Set µn the barycenter ofB1nnδµi .i=1Then, as n → ∞,µ n → µBB22 / 23 Barycenter in Wasserstein spacesStatistical application : growing number of measuresTake (µn )n≥1 such that1nnµi → P.i=1Set µn the barycenter ofB1nnδµi .i=1Then, as n → ∞,µ n → µBBAverage of template deformation[Bigot and Klein, 2012],[Agull´-Antol´ et al., 2015]oın22 / 23 Agueh, M. and Carlier, G. (2011).Barycenters in the wasserstein space.SIAM Journal on Mathematical Analysis, 43(2) :904–924.Agull´-Antol´ M., Cuesta-Albertos, J. A., Lescornel, H., andoın,Loubes, J.-M. (2015).A parametric registration model for warped distributions withWasserstein’s distance.J. Multivariate Anal., 135 :117–130.Bigot, J. and Klein, T. (2012).Consistent estimation of a population barycenter in theWasserstein space.ArXiv e-prints.Rabin, J., Peyr´, G., Delon, J., and Bernot, M. (2011).eWasserstein Barycenter and its Application to Texture Mixing.SSVM’11, pages 435–446.23 / 23

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Univariate L-moments are expressed as projections of the quantile function onto an orthogonal basis of univariate polynomials. We present multivariate versions of L-moments expressed as collections of orthogonal projections of a multivariate quantile function on a basis of multivariate polynomials. We propose to consider quantile functions defined as transports from the uniform distribution on [0; 1] d onto the distribution of interest and present some properties of the subsequent L-moments. The properties of estimated L-moments are illustrated for heavy-tailed distributions.
 

Multivariate L-Moments Based on TransportsAlexis DecurningeHuawei TechnologiesGeometric Science of InformationOctober 29th, 2015 Outline1 L-momentsDefinition of L-moments2 Quantiles and multivariate L-momentsDefinitions and propertiesRosenblatt quantiles and L-momentsMonotone quantiles and L-momentsEstimation of L-momentsNumerical applications Definition of L-momentsL-moments of a distribution :if X1 ,...,Xr are real random variables with common cumulativedistribution function Fλr =1rr −1(−1)kk=0r −1E[Xr −k:r ]kwith X1:r ≤ X2:r ≤ ... ≤ Xr :r : order statisticsλ1 = E[X ] : localizationλ2 = E[X2:2 − X1:2 ] : dispersionτ3 =λ3λ2=E[X3:3 −2X2:3 +X1:3 ]E[X2:2 −X1:2 ]τ4 =λ4λ2=E[X4:4 −3X3:4 +3X2:4 −X1:4 ]E[X2:2 −X1:2 ]Existence if|x|dF (x) < ∞: asymmetry: kurtosis Characterization of L-momentsL-moments are projections of the quantile function on anorthogonal basis1λr=F −1 (t)Lr (t)dt0F −1 generalized inverse of FF −1 (t) = inf {x ∈ R such that F (x) ≥ t}Lr Legendre polynomial (orthogonal basis in L2 ([0, 1]))r(−1)kLr (t) =k=0rk2t r −k (1 − t)kL-moments completely characterize a distribution∞F −1 (t) =(2r + 1)λr Lr (t)r =1 Definition of L-moments (discrete distributions)L-moments for a multinomial distribution of supportx1 ≤ x2 ≤ ... ≤ xn and weights π1 , ..., πn ( n πi = 1)i=1nn(r )wi xi =λr =i=1iKri=1i−1πa− Kra=1with Kr the respective primitive of Lr : Kr = Lrπaa=1xi Empirical L-momentsU-statistics : mean of all subsequences of size r withoutreplacement1nrˆλr =1≤i1 <··· 1, many multivariatequantiles has been proposedQuantiles coming from depth functions (Tukey, Zuo andSerfling)Spatial Quantiles (Chaudhuri)Generalized quantile processes (Einmahl and Mason)Quantiles as quadratic optimal transports (Galichon andHenry) Multivariate quantilesWe define a quantile related to a probability measure ν as atransport from the uniform measure unif on [0; 1]d into ν.DefinitionLet U and X are random variables with respective measure µ andν.T is a transport from µ into ν if T (U) = X (we note T #µ = ν).Example of transport familiesOptimal/monotone transportsRosenblatt transportsMoser transports... Multivariate L-momentsX r.v. of interest with related measure ν such thatE[ X ] < ∞.DefinitionQ : [0; 1]d → Rd a transport from unif in [0; 1]d into ν.L-moment λα of multi-index α = (i1 , ..., id ) associated to Q :λα :=[0;1]dwith Lα (t1 , ..., td ) =Q(t1 , ..., td )Lα (t1 , ..., td )dt1 ...dtd ∈ Rd .dk=1Lik (tk ).⇒ Definition compatible with the univariate case : theunivariate quantile is a transport from the uniform measure on [0; 1]dinto the measure of interest (F −1 (U) = X ) Multivariate L-momentsL-moment of degree 1λ1 (= λ1,1,...,1 ) =[0;1]dQ(t1 , ..., td )dt1 ...dtd = E[X ].L-moments of degree 2 can be regrouped in a matrixΛ2 =[0;1]dwithQi (t1 , ..., td )(2tj − 1)dt1 ...dtd.1≤i,j≤dQ1 (t1 , ..., td )..Q(t1 , ..., td ) = .Qd (t1 , ..., td ) Multivariate L-moments : characterizationPropositionAssume that two quantiles Q and Q have same multivariateL-moments (λα )α∈Nd then Q = Q .∗Moreoverd(2ik + 1) L(i1 ,...,id ) (t1 , ..., td )λ(i1 ,...,id )Q(t1 , ..., td ) =(i1 ,...,id )∈Nd∗k=1A one-to-one correspondence between quantiles and randomvectors is sufficient to guarantee the characteriation of adistribution by its L-moments Monotone transportPropositionLet µ, ν be two probability measures on Rd , such that µ does notgive mass to "small sets".Then, there is exactly one measurable map T such that T #µ = νand T = ϕ for some convex function ϕ.These transports, gradient of convex functions, are calledmonotone transports by analogy with the univariate caseIf defined, the transport is solution to the quadratic optimaltransportϕ∗ = argu − T (u) 2 d µ(u)infT :T #µ=νRd Example : monotone quantile for a random vector withindependent marginalsX = (X1 , ..., Xd ) random vector with independent marginals.The monotone quantile of X is the collection of its marginalsquantiles Q1 (t1 )φ1 (t1 ) ....Q(t1 , ..., td ) = =..Qd (td )φd (td )Indeed, if φ(t1 , ..., td ) = φ1 (t1 ) + · · · + φd (td )φ=QThe associated L-moments are then= E[X ] λ1,...,1λ1...1,r ,1,...,1 = (0, . . . , 0, λr (Xi ), 0, . . . , 0)Tλα= 0 otherwise Monotone transport from the standard Gaussian distributionQ N the monotone distribution from unif onto the standardGaussian distribution N (0, Id ) defined by −1N (t1 )..Q N (t1 , .., td ) = .N −1 (td )T 0 the monotone transport from the standard Gaussiandistribution from ν (rotation equivariant)QTN0([0; 1]d , du) → (Rd , d N ) → (Rd , d ν)⇒ Q = T 0 ◦ Q N is then a quantile. Monotone transport from the standard Gaussiandistribution : Gaussian distribution with a random covarianceFor x ∈ Rd , A positive symmetric matrixϕ(x)= m.x + 1 x T Ax2T 0 (x) =ϕ(x) = m + Axd⇒ T 0 (Nd (0, Id )) = Nd (m, AAT ).The L-moments of a Gaussian with mean m and covarianceAAT are :λα =mif α = (1, ..., 1)Aλα (Nd (0, Id )) otherwiseIn particular, the L-moments of degree 2 :1Λ2 = (λ2,1...,1 . . . λ1,...,1,2 ) = √ A.π Monotone transport from the standard Gaussiandistribution : quasi-elliptic distributionFor x ∈ Rd , u convexϕ(x)= m.x + 1 u(x T Ax)2T 0 (x) = m + u (x T Ax)Ax.The L-moments of this distribution are thenλα =mARdu(x T Ax)Lif α = (1, ..., 1)α (N (x))xd N (x) otherwiseSi A = Id , T 0 (X ) follows a spherical distribution Monotone transport from the standard Gaussiandistribution : quasi-elliptic distributionAxFigure: Samples with T 0 (x) = − x T Ax and A = Id (left) or1 0.8A=(right)0.8 1 Estimation : general casex 1 , ..., x n ∈ Rd an iid sample issued from a same r.v. X withmeasure ν of quantile Q.Empirical measure : νn

Probability Density Estimation (chaired by Jesús Angulo, S. Said)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The two main techniques of probability density estimation on symmetric spaces are reviewed in the hyperbolic case. For computational reasons we chose to focus on the kernel density estimation and we provide the expression of Pelletier estimator on hyperbolic space. The method is applied to density estimation of reflection coefficients derived from radar observations.
 

Probability density estimation on the hyperbolicspace applied to radar processingOctober 28, 2015Emmanuel Chevalliera , Frédéric Barbarescob , Jesús AnguloaabCMM-Centre de Morphologie Mathématique, MINES ParisTech; FranceThales Air Systems, Surface Radar Domain, Technical Directorate,Advanced Developments Department, 91470 Limours, Franceemmanuel.chevallier@mines-paristech.fr1/20Probability density estimation on the hyperbolic space Three techniques of non-parametric probability densityestimation:histogramskernelsorthogonal seriesThe Hyperbolic space of dimension 2Histograms, kernels and orthogonal series in the hyperbolicspaceDensity estimation of radar data in the Poincaré disk2/20Probability density estimation on the hyperbolic space Three techniques of non-parametric probability densityestimationHistograms:partition of the space into a set of binscounting the number of samples per bins3/20Probability density estimation on the hyperbolic space Kernels:a kernel is placed over each samplethe density is evaluated by summing the kernels4/20Probability density estimation on the hyperbolic space Orthogonal series: the true density f is studied through theestimation of the scalar products between f and an orthonormalbasis of real functions.Let f be the true densityf ,g =f g dµlet {ei } is a orthogonal Hilbert basis of real functions∞f =f , ei ei ,i=−∞sincefI , ei =fI ei dµ = E (ei (I )) ≈1nnei (I (pj ))j=1we can estimate f by:Nf ≈i=−N5/201nnei (I (pj )) ei = fˆ.j=1Probability density estimation on the hyperbolic space Homogeneity and isotropy considerationnon homogeneous binsnon istropic binsAbsence of prior on f : the estimation should be as homogeneousand isotropic as possible.→ choice of bins, kernels or orthogonal basis6/20Probability density estimation on the hyperbolic space Remark on homogeneity and isotropyFigure: Random variableX ∈ Circle .The underlying space is nothomogeneous and not isotropic, the density estimation can not considerevery points and directions in an equivalent way.7/20Probability density estimation on the hyperbolic space The 2 dimensional hyperbolic space and the Poincaré diskThe only space of constant negative sectional curvatureThe Poincaré disk is a model of hyperbolic geometry2dsD = 4dx 2 + dy 2(1 − x 2 − y 2 )2Homogeneous and isotropic8/20Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: histogramsA good tilling: homogeneous and isotropicThere are many polygonal tilings:There is no homotetic transformations for all λ ∈ RProblem: not always possible to scale the tiling to the studieddensity9/20Probability density estimation on the hyperbolic space Density estimation in the hyperbolic space: orthogonal series10/20Standard choice of basis: eigenfunctions of the Laplacian operator∆In Rn : (ei ) = Fourier basis → characteristic function densityestimator.f , [a, b] → R,∞f =f , ei ei ,i=−∞f , R → R,∞f =f , eω eω dω,ω=−∞Compact case: estimation of a sumNon compact case: estimation of an integralProbability density estimation on the hyperbolic space Density estimation in the hyperbolic space: orthogonal seriesOn the Poincaré disk D, solutions of ∆f = λf are known forf ,D → Rbut not for f , D ⊂ D → R with D compactComputational problem: the estimation involves an integral, evenfor bounded support functions11/20Probability density estimation on the hyperbolic space Kernel density estimation on Riemannian manifoldsK : R+ → R+ such that:i) Rd K (||x||)dx = 1,ii) Rd xK (||x||)dx = 0,iii) K (x > 1) = 0, sup(K (x)) = K (0).Euclidean kernel estimator:1fˆ =kk1irdK||x, xi ||rRiemannian case:K12/20||x − xi ||r→Kd(x − xi )rProbability density estimation on the hyperbolic space Figure: Volume changeθxiinduced by the exponential mapexpxθx : volume change (T M, Lebesgue) −→ (M, vol)Kernel density estimator proposed by Pelletier:1fˆ =kk13/201i1rdθxi (x)Kd(x, xi )rProbability density estimation on the hyperbolic space θxin the hyperbolic spaceθx can easily be computed in hyperbolic geometry.Polar coordinates at p ∈ D:at p ∈ D, if the geodesic of angle α of length r leads to q ,(r , α) ↔ qIn polar coordinates:ds 2 = dr 2 + sinh(r )2 dα2thusdvolpolar = sinh(r )drdαandθp ((r , θ)) =14/20sinh(r )rProbability density estimation on the hyperbolic space Density estimation in the hyperbolic space: kernelsKernel density estimator:1fˆ =kk1irdd(x, xi )Ksinh(d(x, xi ))Formulation as a convolutionFourier −Helgason←→d(x, xi )r0rthogonal seriesReasonable computational cost15/20Probability density estimation on the hyperbolic space Radar dataSuccession of input vector z = (z0 , .., zn−1 ) ∈ Cnz : background or target?Assumptions: z = (z0 , .., zn−1 ) is a centered Gaussian process.Centered → dened by its covariancer0r1..Rn = E [ZZ ] = rn−1∗r1 .r0 r1..r1rn−1rn−2 . r1 r0Rn ∈ T n : Toeplitz (additional stationary assumption) and SPDmatrix16/20Probability density estimation on the hyperbolic space Auto regressive model17/20Auto regressive model of order k:kajk zl−jzl = −ˆj=1k -th reection coecient :kµk = akDieomorphism ϕ:ϕ : T n → R∗ × Dn−1 , Rn → (P0 , µ1 , · · · , µn−1 )+(z0 , ..., zn−1 ) ↔ (P0 , µ1 , · · · , µn−1 )Probability density estimation on the hyperbolic space Geometry on18/20Tnϕ : T n → R∗ × Dn−1 , Rn → (P0 , µ1 , · · · , µn−1 )+metric on T n : product metric on R∗ × Dn−1+Multiple acquisitions of an identical background:distribution of the µk ?Potential use: identication of a non-background objectsProbability density estimation on the hyperbolic space Application of density estimation to radar dataµ1 , N = 0.007µ2 , N = 1.61µ3 , N = 14.86µ1 , N = 0.18µ2 , N = 2.13µ3 , N = 4.81Figure: First row: ground, second row: Rain19/20Probability density estimation on the hyperbolic space ConclusionThe density estimation on the hyperbolic space is not afundamentally dicult problemEasiest solution: kernelsFuture works:computation of the volume change in kernels for Riemannianmanifoldsdeepen the application for radar signalsThank you for your attention20/20Probability density estimation on the hyperbolic space

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We address here the problem of perceptual colour histograms. The Riemannian structure of perceptual distances is measured through standards sets of ellipses, such as Macadam ellipses. We propose an approach based on local Euclidean approximations that enables to take into account the Riemannian structure of perceptual distances, without introducing computational complexity during the construction of the histogram.
 

Color Histograms using the perceptual metricOctober 28, 2015Emmanuel Chevalliera , Ivar Farupb , Jesús AnguloaCMM-Centre de Morphologie Mathématique, MINES ParisTech; FranceGjovik University College; Franceemmanuel.chevallier@mines-paristech.frab1/16Color Histograms using the perceptual metric Plan of the presentationFormalization of the notion of image histogramPerceptual metric and Macadam ellipsesDensity estimation in the space of colors2/16Color Histograms using the perceptual metric Image histogram : formalizationI :Ω → Vp → I (p)Ω: support space of pixels: rectangle/parallelepiped.V: the value space(Ω, µΩ ), (V , µV ), µΩ and µV are induced by the choosengeometries on Ω and V .Transport of µΩ on V : I ∗ (µΩ )Image histogram: estimation off =3/16dI ∗ (µΩ )dµVColor Histograms using the perceptual metric pixels: p ∈ Ω, uniformly distributed with respect to µΩ{I (p), p a pixel }: set of independent draws of the "randomvariable" I∗(µEstimation of f = dIdµVΩ ) from {I (p), p a pixel }:→ standard problem of probability density estimation4/16Color Histograms using the perceptual metric Perceptual color histogramsI :Ω → (M = colors, gperceptual )p →I (p)Assumption: the perceptual distances between colors is induced bya Riemannian metricThe manifold of colors was one of the rst example of Riemannianmanifold, suggested by Riemann5/16Color Histograms using the perceptual metric Macadam ellipses: just noticeable dierences6/16Chromaticity diagram (constant luminance):Ellipses: elementary unit balls → local L2 metricColor Histograms using the perceptual metric Lab spaceThe Euclidean metric of the Lab parametrization is supposed to bemore perceptual than other parametrizationsFigure: Macadam ellipses in the ab planHowever, the ellipses are clearly not balls7/16Color Histograms using the perceptual metric Modiction of the density estimator8/16Density → local notion. No need of knowing long geodesicsSmall distances → local approximation by an Euclidean metricNotations:dR : Perceptual metric||.||Lab : Canonical Euclidean metric of Lab||.||c : Euclidean metric on Lab induced by the ellipse at cSmall distances around c : ||.||c is "better" than ||.||LabColor Histograms using the perceptual metric Modiction of the density estimatorStandard kernel estimator:1fˆ(x) =k1pi ∈{pixels}r2K||x − I (pi )||LabrPossible modicationK||x − I (pi )||Labr→K||x − I (pi )||I (pi )rwhere ||.||I (pi ) is an Euclidean distance dened by the interpolatedellipse at I (pi ).9/16Color Histograms using the perceptual metric Generally, at c a color:limx→c||x − c||Lab||x − c||c= 1 = limx→cdR (x, c)dR (x, c)Thus, ∃A > 0 such that,∀R > 0, ∃x ∈ BLab (c, R), A <||x − c||−1 .dR (x, c)while ∃Rc = Rc,A such that,∀x ∈ BLab (c, Rc ),||x − c||c− 1 < A.dR (x, c)hencesupBLab (c,Rc )10/16||x − c||c−1dR (x, c)< A < supBLab (c,Rc )||x − c||−1dR (x, c)Color Histograms using the perceptual metric. When the scaling factor r is small enough:r ≤ Rc and Bc (c, r ) ⊂ BLab (c, Rc )x ∈ B(c, Rc ), Kbetter than Kx ∈ B(c, Rc ), K/11/16||x−c||cr||x−c||cr=K||x−c||Labr||x−c||Labr.=0Color Histograms using the perceptual metric Interpolation of a set of local metric: a deep question...What is a good interpolation?Interpolating a function: minimizing variation with respect toa metric.Interpolating a metric? No intrinsic method: depends on achoice of parametrization.Subject of the next study12/16Color Histograms using the perceptual metric Barycentric interpolation in the Lab space13/16Color Histograms using the perceptual metric Volume change(a)(b)Figure: (a): color photography (b): Zoom of the density change adaptedto colours present in the photography14/16Color Histograms using the perceptual metric experimental results(a)(b)(c)Figure: The canonical Euclidean metric of the ab projective plane in (a),the canonical metric followed by a division by the local density of theperceptual metric in (b) and the modied kernel formula in (c).15/16Color Histograms using the perceptual metric ConclusionA simple observation which improve the consistency of thehistogram without requiring additional computational costsFuture works will focus on:The interpolation of the ellipsesThe construction of the geodesics and their applicationsThank you for your attention16/16Color Histograms using the perceptual metric

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
Air traffic management (ATM) aims at providing companies with a safe and ideally optimal aircraft trajectory planning. Air traffic controllers act on flight paths in such a way that no pair of aircraft come closer than the regulatory separation norm. With the increase of traffic, it is expected that the system will reach its limits in a near future: a paradigm change in ATM is planned with the introduction of trajectory based operations. This paper investigate a mean of producing realistic air routes from the output of an automated trajectory design tool. For that purpose, an entropy associated with a system of curves is defined and a mean of iteratively minimizing it is presented. The network produced is suitable for use in a semi-automated ATM system with human in the loop.
 

Entropy minimizing curvesApplication to automated ight path designS. PuechmorelENAC29th October 2015 Problem StatementFlight path planning•••Trac is expected to double by 2050 ;In future systems, trajectories will be negotiated and optimizedwell before the ights start ;But humans will be in the loop : generated ight plans mustcomply with operational constraints ;Muti-agent systems•••A promising approach to address the planning problem ;Does not end up with a human friendly trac !Idea : start with the proposed solution and rebuild a routenetwork from it. A curve optimization problemAn entropy criterion•••Route networks and currently made of straight segmentsconnecting beacons ;May be viewed as a maximally concentrated spatial densitydistribution ;Minimizing the entropy with such a density will intuitively yielda ight path system close to what is expected. Problem modelingDensity associated with a curve system••••A classical measure : counting the number of aircraft in eachbin of a spatial grid and averaging over time ;Suers from a severe aw : aircraft with low velocity willover-contribute ;May be corrected by enforcing invariance underre-parametrization of curves ;Combined with a non-parametric kernel estimate to yield :˜d: x →1Ni=1 0 K1Ni=1 Ω 0 K( x − γi (t) ) γi (t) dt( x − γi (t) ) γi (t) dtdx(1) Problem modeling IIThe entropy criterion••Kernel K is normalized over the domain Ω so as to have a unitintegral ;Density is directly related to lengths li , i = 1 . . . n of curvesγi , i = 1 . . . N :˜d: x →•1Ni=1 0 K( x − γi (t) ) γi (t) dtNi=1 li(2)Associated entropy is :E (γ1 , . . . , γN ) = −Ω˜˜d(x) log d(x) dx(3) Optimal curve displacement eldEntropy variation˜• d has•integral 1 over the domain Ω ;It implies that :−•∂E (γ1 , . . . , γN )( ) =∂γjΩ˜∂ d(x)˜( ) log d(x) dx∂γj(4)where is an admissible variation of curve γi .˜The denominator in the expression of d has derivative :γj (t)[0,1]γj (t), (t) dt = −γj (t)[0,1]γj (t),Ndt(5) Optimal curve displacement eldEntropy variation•˜The numerator of d has derivative :[0,1]−γj (t) − xγj (t) − x,γj (t)[0,1]γj (t)K ( γj (t) − x ) γj (t) dt (6)N,NK ( γj (t) − x ) dt(7) Optimal curve displacement eld IINormal move•Final expression yield a displacement eld normal to the curve :γj (t) − xγj (t) − xΩ˜K ( γj (t) − x ) log d(x)dx γj (t)N(8)−Ω+Ωγj (t)˜K ( γj (t) − x ) log d(x))dx˜˜d(x) log(d(x))dxNnγj (t)γj (t)(9)γj (t)liNi=1(10) ImplementationA gradient algorithm••••The move is based on a tangent vector in the tangent space toImm([0, 1], R3 )/Di+ ([0, 1) ;It is not directly implementable on a computer ;A simple, landmark based approach with evenly spaced pointswas used ;A compactly supported kernel (epanechnikov) was selected : it˜allows the computation of density d on GPUs as a textureoperation that is very fast. A output from the multi-agent systemIntegration in the complete system•Route building from initially conicting trajectories :Figure  Initial ight plans and nal ones Conclusion and future workAn integrated algorithm•••Entropy minimizer is now a part of the overall route designsystem ;Only a simple post-processing is necessary to output a usableairways network ;The complete algorithm is being ported to GPU.Future work : take the headings into account••The behavior is not completely satisfactory when routes areconverging in opposite directions ;An improved version will make use of entropy of a distributionin a Lie group (publication in progress).

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
We introduce a novel kernel density estimator for a large class of symmetric spaces and prove a minimax rate of convergence as fast as the minimax rate on Euclidean space. We prove a minimax rate of convergence proven without any compactness assumptions on the space or Hölder-class assumptions on the densities. A main tool used in proving the convergence rate is the Helgason-Fourier transform, a generalization of the Fourier transform for semisimple Lie groups modulo maximal compact subgroups. This paper obtains a simplified formula in the special case when the symmetric space is the 2-dimensional hyperboloid.
 

Kernel Density Estimation on Symmetric SpacesDena Marie AstaDepartment of StatisticsOhio State UniversitySupported by NSF grant DMS-1418124 and NSF Graduate Research Fellowship under grant DGE-1252522. Geometric Methods for Statistical Analysisq  Classical statistics assumes data is unrestricted on Euclidean spacenX¯= 1XXin i=1var[X] = E[X 2 ]E[X]2q  Exploiting the geometry of the data leads to faster and more accurate toolsimplicit geometry in non-Euclidean dataexplicit geometry in networks2 Motivation: Non-Euclidean DataDirectionalHeadingssphereMaterial Stress,Gravitational Lensing3x3 symmetricpositive definitematricesDiffusion TensorImagingNormalDistributions3x3 symmetricpositive definitematriceshyperboloid3 Nonparametric Methods: Non-Euclidean Dataq  Classical non-parametric estimators assume Euclidean structurekernel densityestimatorkernelregressionconditional densityestimatorq  Sometimes the given data has other geometric structure to exploit.4 Motivation: Non-Euclidean DistancesEuclidean distances are often not the right notion of distance between data points.DirectionalHeadingssphereMaterial Stress,Gravitational Lensing3x3 symmetricpositive definitematricesDiffusion TensorImagingNormalDistributions3x3 symmetricpositive definitematriceshyperboloid5 Motivation: Non-Euclidean DistancesEuclidean distances are often not the right notion of distance between data points.DirectionalHeadingssphereDistance between directional headings should be shortest path-length.6 Motivation: Non-Euclidean DistancesEuclidean distances are often not the right notion of distance between data points.standard deviationNormalDistributionshyperboloidmeanAn isometric representation of the hyperboloid is the Poincare Half-Plane. Each pointin either model represents a normal distribution. Distance is the Fisher Distance, whichis similar to KL-Divergence.7 Motivation: Non-Euclidean DistancesEuclidean distance not the right distance à Euclidean volume not the right volumeWe want to minimize risk for density estimation on a (Riemmanian) manifold.EfZtrue density(fMmanifoldˆfn )2 dµvolume measurebased onintrinsic distanceestimator basedon n samples8 Existing EstimatorsO(n-2s/(2s+d))optimal rate of convergence1(s=smoothness parameter, d=dimension)Euclidean KDEsubtraction undefined for general MnX ✓ x Xi ◆1ˆhf(X1 ,...,Xn ) (x) =Knh i=1hdivision by h undefined for general M9 Exploiting Geometry: Symmetriesq  symmetries = geometryq  symmetries make the smoothing of data (convolution by a kernel) tractableq  translations in Euclidean space are specific examples of symmetriesq  other spaces call for other symmetries10 Exploiting symmetries to convolveKernel density estimation is about convolving a kernel with the data.ˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )density on thespace oftranslations on Rn(g ⇤ f )(x) =ZRng(t)f (xt) dtdensity on RnMore general spaces, depending on their geometry, we will requiresymmetries other than translations…11 Exploiting symmetries to convolveKernel density estimation is about convolving a kernel with the data.density on thespace oftranslations on Rnˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )(g ⇤ f )(x) =ZRndensity on Rng(t)f (xt) dt =ZRng(Tt )f (Tt 1 (x)) dtTv (w) = v + wIdentify t with Tt and interpret g as a density on the space of Tt’s.More general spaces, depending on their geometry, we will requiresymmetries other than translations…12 Exploiting symmetries to convolveGeneralized kernel density estimation involves convolving a generalized kernel with the data.ˆhf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )(“empirical density”)density on the space G(g ⇤ f )(x) =density on XZg(T )f (TG1(x)) dTspace of symmetries on XX is a symmetric space, a space having a suitable space G of symmetries.13 G-Kernel Density Estimator: general formbandwidth and cutoff parameters“empirical density” on symmetric space Xˆh,Cf(X1 ,...,Xn ) = Kh ⇤ empirical(X1 , . . . , Xn )sample observationsdensity on group ofsymmetries GWe can use harmonic analysis on symmetric spaces to define andanalyze this estimator.1Asta,D., 2014.
  Harmonic Analysis on Symmetric SpacesFourier Transform: an isometryF : L2 (R) ⌧ L2 (R) : F1Helgason-Fourier Transform: for symmetric space X, an isometryH : L2 (X) ⌧ L2 (· · · ) : H1frequency space depends on the geometry of XThe (Helgason-)Fourier Transform sends convolutions to products.1Terras,A., 1985.
 15 Generalization: G-Kernel Density Estimatorassumptions on kernel and true density:q  The true density is sufficiently smooth (in Sobolev ball).q  The kernel transforms nicely with the space of dataq  The kernel is sufficiently smooth1Asta,D., 2014.
 16 G-Kernel Density EstimatorTHEOREM: G-KDE achieves the same minimax rate on symmetricspaces as the ordinary KDE achieves on Rd.1ˆh,Cf(X1 ,...,Xn ) = H1[(X1 ,...,Xn ) H[Kh ]IC ]O(n-2s/(2s+d))optimal rate of convergence1(s=Sobolev smoothness parameter, d=dimension)1Asta,D., 2014.
 17 Kernels on SymmetriesSymmetric Positive Definite (nxn) Matrices SPDn:Kernels are densities on space G=GLn of nxn invertiblematrices.Each GLn-matrix M determines an isometry (distance-preserving function):M: SPDn ⇠ SPDn=M (X)= M T XMHyperboloid H2:Kernels are densities on space G=SL2 of 2x2invertible matrices having determinant 1.Each SL2-matrix M determines an isometry (distance-preserving function):M: H2 ⇠ H2=M11 x + M12M (x) =M21 x + M2218 Kernels on SymmetriesHyperboloid H2:Kernels are densities on space G=SL2 of 2x2invertible matrices having determinant 1.Each SL2-matrix M determines an isometry (distance-preserving function):M: H2 ⇠ H2=M (x) =M11 x + M12M21 x + M22example of kernel K (hyperbolic version of the gaussian):solution to the heat equation on SL2: H[Kh ](s, k✓ ) / eh2 s2 h¯¯ssamples from K (pointsin SL2) represented inH2=SL2/SO219 Recap: G-KDEExploiting the geometric structure of the data type:q  Tractable data smoothing = convolving a kernel on a space of symmetriesq  Harmonic analysis on symmetric spaces allows us to prove minimax rateq  Symmetric spaces are general enough to include:DirectionalHeadingsMaterial Stress,Gravitational Lensing1Asta,Diffusion TensorImagingD., 2014.
 NormalDistributions20

Keynote speach Tudor Ratiu (chaired by Xavier Pennec)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The goal of these lectures is to show the influence of symmetry in various aspects of theoretical mechanics. Canonical actions of Lie groups on Poisson manifolds often give rise to conservation laws, encoded in modern language by the concept of momentum maps. Reduction methods lead to a deeper understanding of the dynamics of mechanical systems. Basic results in singular Hamiltonian reduction will be presented. The Lagrangian version of reduction and its associated variational principles will also be discussed. The understanding of symmetric bifurcation phenomena in for Hamiltonian systems are based on these reduction techniques. Time permitting, discrete versions of these geometric methods will also be discussed in the context of examples from elasticity. 
 
no preview

SYMMETRY METHODS INGEOMETRIC MECHANICSTudor S. RatiuSection de Math´matiqueseEcole Polytechnique F´d´rale de Lausanne, Switzerlande etudor.ratiu@epfl.chGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20151 PLAN OF THE PRESENTATION• Lie group actions and reduction of dynamics• The above in the Hamiltonian case• Properties of the momentum map• Regular reduction• Singular reduction• Regular cotangent bundle reductionGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20152 M , N manifolds, N ⊂ M as subsets.N is an initial submanifold of M if the inclusion map i : N → Mis an immersion satisfying the following condition: for any smoothmanifold P and any map g : P → N , g is smooth if and only ifi ◦ g : P → M is smooth. The smooth manifold structure thatmakes N into an initial submanifold of M is unique.g◦iPg.//iM==NThe integral manifolds of an integrable generalized distribution(thus forming a generalized foliation) are initial.Infinitesimal generator ξM ∈ X(M ) associated to ξ ∈ g : Lie(G)dξM (m) :=Φexp tξ (m) = TeΦm · ξ.dt t=0ξM is a complete vector field with flow (t, m) → exp tξ · m.ξ ∈ g → ξM ∈ X(M ) is a Lie algebra antihomomorphismGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20153 Isotropy, stabilizer, symmetry subgroup of m ∈ MGm := {g ∈ G | Φg (m) = m} ⊂ G,Gg·m = gGmg −1, ∀g ∈ Gclosed subgroup of G whose Lie algebra gm equalsgm = {ξ ∈ g | ξM (m) = 0}.Om ≡ G · m := {Φg (m) | g ∈ G} G-orbit of m∼Om g · m ←→ gGm ∈ G/Gm diffeomorphismOm initial submanifold of M• Transitive action: only one orbit, that is, Om = M• Free action: Gm = {e} for all m ∈ M(g, m) −→ (m, g · m) ∈ M × M is• Proper action: if Φ : G × Mproper. Equivalent to: for any two convergent sequences {mn} and{gn · mn} in M , there exists a convergent subsequence {gnk } in G.Examples of proper actions: compact group actions, SE(n) actingon Rn, Lie groups acting on themselves by translation.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20154 Fundamental facts about proper Lie group actions(i) The isotropy subgroups Gm are compact.(ii) The orbit space M/G is a Hausdorff topological space.(iii) If the action is free, M/G is a smooth manifold, and the canonical projection π : M → M/G defines on M the structure of a smoothleft principal G–bundle.(iv) If all the isotropy subgroups of the elements of M under the G–action are conjugate to a given subgroup H, then M/G is a smoothmanifold and π : M → M/G defines the structure of a smooth locallytrivial fiber bundle with structure group N (H)/H and fiber G/H.Normalizer of H is N (H) := {g ∈ G | gH = Hg}.(v) If the manifold M is paracompact then there exists a G-invariantRiemannian metric on it. (Palais)(vi) If the manifold M is paracompact then smooth G-invariantfunctions separate the G-orbits.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20155 Twisted productH ⊂ G Lie subgroup acting (left) on the manifold A. Right twistedaction of H on G × A, defined by(g, a) · h = (gh, h−1 · a),g, h ∈ G,a ∈ A,is free and proper. Twisted product G ×H A := (G × A)/H.TubeG acts properly on M . For m ∈ M , let H := Gm. A tube aroundthe orbit G · m is a G-equivariant diffeomorphismϕ : G ×H A −→ U,where U is a G-invariant neighborhood of G · m and A is somemanifold on which H acts.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20156 Slice TheoremLet G be a Lie group acting properly on M at the point m ∈ M ,H := Gm. Then there exists a tubeϕ : G ×H B −→ Uabout G · m. B is an open H-invariant neighborhood of 0 in a vectorspace which is H-equivariantly isomorphic to TmM/Tm(G·m), wherethe H-representation is given byh · (v + Tm(G · m)) := TmΦh · v + Tm(G · m).SliceS := ϕ([e, B]) so that U = G · S.From now on, we assume that G acts on M properly.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20157 Dynamical consequencesLet X ∈ X(U )G, U ⊂ M open G-invariant, S slice at m ∈ U . Then:• ∃ XT ∈ X(G·S)G, XT (z) = ξ(z)M (z) for z ∈ G·S, where ξ : G·S → gis smooth G-equivariant and ξ(z) ∈ Lie(N (Gz )) for all z ∈ G · S. Theflow Tt of XT is given by Tt(z) = exp tξ(z) · z, so XT is complete.• ∃ XN ∈ X(S)Gm .• If z = g · s, for g ∈ G and s ∈ S, thenX(z) = XT (z) + TsΦg (XN (s)) = TsΦg (XT (s) + XN (s))• If Nt is the flow of XN (on S) then the integral curve of X ∈ X(U )Gthrough g · s ∈ G · S isFt(g · s) = g(t) · Nt(s),where g(t) ∈ G is the solution ofg(t) = TeLg(t) ξ(Nt(s)) ,˙g(0) = g.This is the tangential-normal decomposition of a G-invariant vector field (or Krupa decomposition in bifurcation theory).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20158 Geometric consequencesM(H) = {z ∈ M | Gz ∈ (H)},orbit type setM H = {z ∈ M | H ⊂ Gz },fixed point setMH = {z ∈ M | H = Gz },isotropy type setare (embedded) submanifolds of M , MH open in M H , but, in general, MH is not closed in M .Let N (H) := {g ∈ G | gH = Hg} be the normalizer of H in G.N (H)/H acts freely and properly on MH .m ∈ M is regular if ∃Um such that dim Oz = dim Om, ∀z ∈ U .Principal Orbit Theorem: M connected. M reg := {m ∈ M |m regular} is connected, open, and dense in M . M/G contains onlyone principal orbit type, which is connected, open, dense in M/G.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 20159 The Stratification Theorem: The connected components of allorbit type manifolds M(H) and their projections onto M(H)/G constitute a Whitney stratification of M and M/G, respectively. Thisstratification of M/G is minimal among all Whitney stratificationsof M/G.G-Codostribution Theorem: Let G be a Lie group acting properly on the smooth manifold M and m ∈ M a point with isotropysubgroup H := Gm. ThenTm(G · m)◦ H= df (m) | f ∈ C ∞(M )G .This is due to Ortega [1998].Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201510 Reduction of general vector fieldsG × M → M proper, X ∈ X(M )G (G-equivariant) with flow FtLaw of conservation of isotropy: Every isotropy type submanifoldMH := {m ∈ M | Gm = H} is preserved by Ft.πH : MH → MH /(N (H)/H) projection , iH : MH → M inclusionX induces a unique H-isotropy type reduced vector field X H onMH /(N (H)/H) byX H ◦ πH = T πH ◦ X ◦ iH ,whose flow FtH is given byFtH ◦ πH = πH ◦ Ft ◦ iH .G compact linear action, then the construction of MH /(N (H)/H)can be implemented by using the invariant polynomials of the action and the theorems of Hilbert and Schwarz-Mather.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201511 The Hamiltonian case(M, ω) symplectic manifold, G connected Lie group with Lie algebrag, G × M → M left free proper symplectic action: Φ∗ ω = ω, ∀g ∈ G.gJ : M → g∗ momentum map: XJξ = ξM , where Jξ := J, ξ .Non-equivariance (Souriau) group g∗-valued 1-cocycle:c(g) := J(g · m) − Ad∗−1 J(m), independent of m ∈ M if M connected.gΘ(M, ω) connected. G × g∗(g, µ) −→ Ad∗−1 µ + c(g) ∈ g∗ affineg∗ is Θ-equivariant.action. J : M → gNoether’s Theorem: J is conserved along the flow of any Ginvariant Hamiltonian.g∗ is an affine Lie-Poisson space±{f, h}(µ) := ± µ,δf δh,δµ δµΣδf δh,,δµ δµf, h ∈ C ∞(g∗)Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201512 The infinitesimal non-equivariance two-cocycle Σ ∈ Z 2(g, R) isΣ:g×g(ξ, η) −→ dση (e) · ξ ∈ R,where ση : G → R defined by ση (g) = σ(g), η .Its symplectic leaves (reachable sets) are the Θ-orbits Oµ:±ωOµ (ν)(ξg∗ (ν), ηg∗ (ν)) = ± ν, [ξ, η]Σ(ξ, η).J : M → g∗ is a Poisson map.+Example: lifted actions on cotangent bundles. G acts on themanifold Q and then by lift on its cotangent bundle T ∗Q.J(αq ), ξ = αq , ξQ(q) ,∀ αq ∈ T ∗Q, ∀ ξ ∈ g. This is an Ad∗-equivariant momentum map.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201513 Special case 1: linear momentum. Configuration space of Nparticles in space is R3N . R3 acts on R3N by v · (qi, ) = (qi + v).Then J : T ∗R3N → R3 is the linear momentum J(qi, pi) = N pi.i=1Special case 2: angular momentum. SO(3) acts naturally onR3. Then J : T ∗R3N → R3 is the angular momentum J(q, p) = q × p.Example: symplectic linear actions. (V, ω) symplectic vectorspace, G ⊆ Sp(V, ω), acting naturally on V . Ad∗-equivariant momentum map J : V → sp(V, ω)∗ is1J(v), ξ = ω(ξV (v), v).2Special case: Cayley-Klein parameters and the Hopf fibration.SU(2) acts on C2, J : C2 → su(2)∗ given, as above, by1J(z, w), ξ = ω(ξ(z, w)T, (z, w)),2z, w ∈ C, ξ ∈ su(2).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201514 Lie algebra isomorphism (su(2), [ , ]) → (R3, ×) given byR3∼x = (x1, x2, x3) ←→1−ix3−ix1 − x2x :=ix32 −ix1 + x2∈ su(2).Identify su(2)∗ with R3 by the map µ ∈ su(2)∗ → µ ∈ R3 defined byˇµ · x := −2 µ, x ,ˇ∀ x ∈ R3 .Then ˇ : C2 → R3 has the expressionJ1ˇ(z, w) = − (2wz, |z|2 − |w|2) ∈ R3.J2(z, w) are the Cayley-Klein parameters or the Kustaanheimo2Stiefel coordinates. ˇ|S 3 : S 3 → S1/2 is the Hopf fibration. SimilarJconstruction in fluid dynamics: Clebsch variables.The momentum map of the SU(2)-action on C2, the Cayley-Kleinparameters, the Kustaanheimo-Stiefel coordinates, and the familyof Hopf fibrations on concentric three-spheres in C2 are the samemap.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201515 Properties of the momentum map• range TmJ = (gm)◦. Points with symmetry are points of bifurcation. Freeness of the action is equivalent to the regularity of J.• ker TmJ = (g · m)ω .• The obstruction to the existence of J is the vanishing of the mapH 1(g, R) := g/[g, g] [ξ] −→ iξM ω ∈ H 1(M, R).• J[ξ, η] = {Jξ , Jη } ⇐⇒ TmJ (ξM (m)) = − ad∗ J(m) ∀m ∈ M, ξ, η ∈ gξAmong all possible choices of momentum maps for a given action,there is at most one infinitesimally Ad∗-equivariant one.G connected, then infinitesimal Ad∗-equivariance ⇔ Ad∗-equivariance.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201516 • H 1(g; R) = 0 or H 1(M, R) = 0 ⇒ J exists. H 2(g; R) = 0 ⇒ J equiv.Whitehead lemmas: g is semisimple =⇒ H 1(g; R) = H 2(g; R) = 0.• If G is compact J can always be chosen to be Ad∗-equivariant• Reduction Lemma: gJ(m) · m = g · m ∩ ker TmJ = g · m ∩ (g · m)ω .Gµ • z•zJ–1(µ)symplecticallyorthogonal spacesG•zGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201517 Momentum maps and isotropy type manifolds• MGm is a symplectic submanifold of M for any m ∈ M .This is based on: H compact Lie group and (V, ω) symplectic representation space. Then V H is a symplectic subspace of V .m• Let MGm be the connected component of MGm containing m andmmN (Gm)m := {n ∈ N (Gm) | n · z ∈ MGm for all z ∈ MGm }.N (Gm)m is a closed subgroup of N (Gm) that contains the connected component of the identity. So it is also open and henceLie(N (Gm)m) = Lie(N (Gm)).In addition, (N (Gm)/Gm)m = N (Gm)m/Gm so thatLie (N (Gm)m/Gm) = Lie (N (Gm)/Gm) .m• Lm := N (Gm)m/Gm acts freely properly and canonically on MGmby Ψ(nGm, z) := n · z.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201518 m• The free proper canonical action of Lm := N (Gm)m/Gm on MGmmhas a momentum map JLm : MGm → (Lie(Lm))∗ given byJLm (z) := Λ(J|M m (z) − J(m)),Gmmz ∈ M Gm .In this expression Λ : (g◦ )Gm → (Lie(Lm))∗ denotes the naturalmLm-equivariant isomorphism given bydΛ(β),(exp tξ ) Gm = β, ξ ,dt t=0for any β ∈ (g◦ )Gm , ξ ∈ Lie(N (Gm)m) = Lie(N (Gm)).mm• The non-equivariance one-cocycle τ : MGm → (Lie(Lm))∗ of themomentum map JLm is given by the mapτ (l) = Λ(c(n) + n · J(m) − J(m)),l = nGm ∈ Lm, n ∈ N (Gm)m.So, even if J is equivariant, the induced momentum map JLm isnot, in general!Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201519 ConvexityJ : M → g∗ equivariant, G, M compact connected. The intersectionof range J with a Weyl chamber is a compact and convex polytope,the momentum polytope (Atiyah, Guillemin, Kirwan, Sternberg).Delzant polytope in Rn is a convex polytope that is also:(i) Simple: there are n edges meeting at each vertex.(ii) Rational: the edges meeting at a vertex p are of the formp + tui, 0 ≤ t < ∞, ui ∈ Zn, i ∈ {1, . . . , n}.(iii) Smooth: the vectors {u1, . . . , un} can be chosen to be anintegral basis of Zn.Delzant’s Theorem: There is a biection∼{symplectic toric manifolds} ←→ {Delzant polytopes}∼(M, ω, Tn, J : M → Rn)←→J(M )Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201520 Marsden-Weinstein Reduction Theorem• If µ ∈ J(M ) ⊂ g∗ regular value of J and• Gµ-action on J−1(µ) is free and proper; Gµ := {g ∈ G | Θg µ = µ},then (Mµ := J−1(µ)/Gµ, ωµ) is symplectic:∗πµωµ = i∗ ω,µiµ : J−1(µ) → M inclusion πµ : J−1(µ) → J−1(µ)/Gµ projection.The flow Ft of Xh, h ∈ C ∞(M )G, leaves the connected componentsof J−1(µ) invariant and commutes with the G-action, so it inducesµa flow Ft on Mµ byµπµ ◦ Ft ◦ iµ = Ft ◦ πµ.µFt is Hamiltonian on (Mµ, ωµ) for the reduced Hamiltonian hµ ∈C ∞(Mµ) given byhµ ◦ πµ = h ◦ iµ.Moreover, if h, k ∈ C ∞(M )G, then {h, k}µ = {hµ, kµ}Mµ .Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201521 Orbit symplectic form from reductionG a Lie group, Lg (h) = gh, Rg (h) = hg, left and right translationsOµ := {Ad∗ µ | g ∈ G} coadjoint G-orbit through µ ∈ g∗gTake the special case M = G and the left action g · h := gh, for allg, h ∈ G. The momentum map JL : T ∗G → g∗ has the expression∗JL(αg ) = Te Rg (αg ) ∈ g∗, ∀αg ∈ T ∗G.−∼Then, (J−1(µ)/Gµ, Ωµ) = (Oµ, ωOµ ); orbit symplectic form isL±ωOµ (ν)(ad∗ ν, ad∗ ν) = ± ν, [ξ, η] ,ηξ∀ ξ, η ∈ g, ν ∈ OµJR∗ is a Lie-Poisson space for the bracket ((T ∗ G)/G ←→ g∗ )g−{f, h}(µ) := ± µ,δf δh,δµ δµ,f, h ∈ C ∞(g∗)and its symplectic leaves (reachable sets) are Oµ.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201522 Reconstruction of dynamicsGiven is an integral curve cµ(t) of Xhµ ∈ X(Mµ). Let m0 ∈ J−1(µ).Find integral curve c(t) of Xh ∈ X(M ) with initial condition m0.Pick a smooth curve d(t) ⊂ J−1(µ) such that d(0) = m0 andπµ(d(t)) = cµ(t). If c(t) is the integral curve of Xh with initialcondition c(0) = m0, then there is a curve g(t) ⊂ Gµ such thatc(t) = g(t) · d(t).˙1.) Find smooth curve ξ(t) ⊂ gµ s.t. ξ(t)M (d(t)) = Xh(d(t)) − d(t).2.) With this ξ(t), solve g(t) = TeLg(t)ξ(t), g(0) = e.˙Let A ∈ Ω1 J−1(µ); gµ be a connection on the Gµ-principal bundleJ−1(µ) → Mµ. Choose d(t) to be the horizontal lift of cµ(t) through˙m0, i.e., A(d(t))(d(t)) = 0, πµ(d(t)) = cµ(t), d(0) = m0.Then the solution of 1.) isξ(t) = A(d(t)) Xh(d(t)) .Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201523 Orbit reduction••••Φ : G × M → M free proper symplectic action.The action admits a momentum map J : M → g∗.M is connected; if J equivariant, this is not needed.The affine coadjoint orbitOµ := {Ad∗−1 µ + c(g) | g ∈ G}gis an initial submanifold of g∗.• Bifurcation Lemma (range (TmJ) = (gm)◦) + the freeness of theaction (hence gm = {0}) =⇒ J is a submersion onto some opensubset of g∗. So J is transversal to Oµ, i.e., for any z ∈ J−1(Oµ),we have (Tz J)(Tz M ) + TJ(z)Oµ = g∗. So J−1(Oµ) is an initial submanifold of M of dimensiondim(J−1(Oµ)) = dim M − dim Gµwhose tangent space at z ∈ J−1(Oµ) equalsTz (J−1(Oµ)) = (Tz J)−1(TJ(z)Oµ) = g · z + ker(Tz J).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201524 • G-action restricts to a free and proper G-action on the G-invariantinitial submanifold J−1(Oµ). Why is this restricted action smooth?Action Φ : G × M → M is smooth, so ΦOµ : G × J−1(Oµ) → M issmooth (restriction: composition of smooth maps, J−1(Oµ) → M ).But ΦOµ (G×J−1(Oµ)) ⊂ J−1(Oµ). Since J−1(Oµ) is initial, it followsthat ΦOµ : G × J−1(Oµ) → J−1(Oµ) is smooth.• Hence MOµ := J−1(Oµ)/G is a manifold and the projection πOµ :J−1(Oµ) → MOµ is a surjective submersion.(i) On MOµ := J−1(Oµ)/G there is a unique symplectic form ωOµ+∗characterized by ι∗ µ ω = πOµ ωOµ + J∗ µ ωOµ .OO+ιOµ : J−1(Oµ) → M , JOµ := J|J−1(Oµ), and ωOµ is the +-symplecticstructure on the affine orbit Oµ.(MOµ , ωOµ ) is the symplectic orbit reduced space.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201525 (ii) h ∈ C ∞(M )G. The flow Ft of Xh leaves the connected components of J−1(Oµ) invariant and commutes with the G-action, so itOµinduces a flow Fton MOµ , uniquely determined byOµπ Oµ ◦ F t ◦ i Oµ = F t◦ π Oµ .O(iii) The vector field generated by the flow Ft µ on (MOµ , ωOµ ) isHamiltonian with associated reduced Hamiltonian hOµ ∈ C ∞(MOµ )defined by hOµ ◦ πOµ = h ◦ iOµ . The vector fields Xh and XhO areµπOµ -related.(iv) h, k ∈ C ∞(M )G ⇒ {h, k} ∈ C ∞(M )G and {h, k}Oµ = {hOµ , kOµ }MOµ ,where {·, ·}MOµ denotes the Poisson bracket associated to the symplectic form ωOµ on MOµ .This is a theorem in the Poisson category whereas the point reduction theorem is in the symplectic category.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201526 Problems with the hypotheses of the Reduction TheoremThe hypotheses are too restrictive, even in classical examples, suchas Jacobi’s elimination of the nodes. Properness of the actioncannot be eliminated because one needs the theory of G-manifolds.1.) How does one recover the conservation of isotropy? Themomentum map seems incapable to get this. J−1(µ) are not thesmallest invariant sets. Reduction completely ignores this point.2.) If the G-action is not free, Mµ is not a smooth manifold. Thenwhat is the structure of the reduced topological space? What isleft that remains symplectic?3.) If G is discrete, the momentum map is zero. What is reductionin that case?These are questions in bifurcation theory with symmetry. Forgeneric vector fields, a lot is known. For Hamiltonian vector fields,almost nothing (a few papers).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201527 Singular point reductionGiven: (M, ω) connected, m ∈ MG acting symplectically on MJ : (M, ω) → g∗ momentum mapc : G → g∗ group 1-cocycle defined by c(g) := J(g · z) − Ad∗−1 J(z)g∗ ν + c(g) on g∗affine G-action Θ(g, ν) := Adg−1Gµ the Θ-isotropy at µmNotation: MH connected component of MH containing m,H := Gm ⊆ Gµ := J(m) ∈ g∗Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201528 Singular symplectic point stratam(i) J−1(µ) ∩ (Gµ · MH ) is embedded in M .(H)m(ii) Mµ := [J−1(µ)∩(Gµ ·MH )]/Gµ has a unique quotient manifoldstructure such that(H)πµ(H)m: J−1(µ) ∩ (Gµ · MH ) −→ Mµis a surjective submersion.(H)(iii) There is a unique symplectic form ωµby(H) ∗ιµ(H)(H)on Mµcharacterized(H) ∗ (H)ωµ ,ω = πµ(H)mιµ: J−1(µ) ∩ (Gµ · MH ) → M inclusion. (Mµsingular symplectic point strata.(H), ωµ) are theGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201529 (iv) h ∈ C ∞(M )G. Flow Ft of Xh leaves the connected componentsmof J−1(µ) ∩ (Gµ · MH ) invariant and commutes with the Gµ-action,(H)µso it induces flow Ft on Mµ(H)πµ:(H)◦ Ft ◦ iµ(H)µ= F t ◦ πµ.(H)µ(H)(v) Ft is Hamiltonian on Mµfor the reduced Hamiltonian hµ :(H)(H)(H)(H)(H)Mµ → R, hµ ◦ πµ = h ◦ iµ . Xh and X (H) are πµ -related.hµ(vi) h, k ∈ C ∞(M )G ⇒ {h, k} ∈ C ∞(M )G and(H){h, k}µwhere {·, ·}(H)Mµ(H), kµ}(H)Mµis the Poisson bracket induced by the symplectic(H)structure on Mµ(H)= {hµ.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201530 Sjamaar point reduction principleGOAL: Realize the strata as usual reduced spacesRecall: we start with a proper symplectic G-action on (M, ω)• m ∈ M is fixed, H := Gm, µ := J(m).mm• N (H)m := {n ∈ N (H) | n · MH ⊂ MH }.N (H)m is open, hence closed, in N (H). Also H ⊂ N (H)m. ThusLie(N (H)m/H) = Lie(N (H)/H) =: lm• Lm := N (H)m/H acts freely, properly, and symplectically on MHwith momentum mapmJLm : MHz −→ Λ(J|M m (z) − µ) ∈ (Lie(Lm))∗H• Λ : (g◦ )H → (Lie(Lm))∗, Lm-equivariant isomorphismmΛ(β),d(exp tξ)Hdt t=0= β, ξ ,β ∈ (g◦ )H , ξ ∈ Lie(N (H)m) = Lie(N (H))mGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201531 •••g◦ ⊆ g∗ denotes the annihilator of gm in g∗m(g◦ )H are the H-fixed points in g◦mmNon-equivariance one-cocycle of JLmτ : Lml −→ Λ(c(n) + n · µ − µ) ∈ (Lie(Lm))∗for l = nH ∈ Lm and n ∈ N (H)m.(H)(H)mm:= [J−1(µ) ∩ (Gµ · MH )]/Gµ(i) πµ: J−1(µ) ∩ (Gµ · MH ) → Mµis a smooth fiber bundle with fiber Gµ/H and structure groupNGµ (H)m/H.(ii)m(MH )0 := J−1 (0)/Lm0Lmm= [J−1(µ) ∩ MH ]/(NGµ (H)m/H)Lm = Lm, in general (recall, the Lm-action is affine).0Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201532 m(iii) π0 : J−1 (0) → (MH )0 is a principal Lm-bundle. Gµ/H is a right0Lmm /H)-space and J−1 (µ)∩M m is a left (N (H)m /H)-space.(NGµ (H)GµHThe associated bundle with fiber Gµ/HmGµ/H ×NG (H)m/H J−1(µ) ∩ MH −→µm[J−1(µ) ∩ MH ]/(NGµ (H)m/H).is Gµ-symplectomorphic to(H)πµ(H)m: J−1(µ) ∩ (Gµ · MH ) −→ Mµ,which means that∼mm• Gµ/H ×NG (H)m/H J−1(µ) ∩ MH ←→ J−1(µ) ∩ (Gµ · MH )µis a Gµ-diffeomorphismmm• (MH )0 = J−1 (0)/Lm = (J−1(µ) ∩ MH )/(NGµ (H)m/H) is symplec0Lm(H)tomorphic to Mµ .Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201533 m• {J−1(µ) ∩ (Gµ · MH ) | J(z) = µ} forms a Whitney (B) stratificationof J−1(µ).(H)• {Mµ| (H)} is a symplectic Whitney (B) stratification of thecone space Mµ := J−1(µ)/Gµ.• Each connected component of Mµ contains a unique open stratumthat is connected, open, and dense in the connected component ofMµ that contains it.There are similar theorems for orbit reduction. In the diagram,at every level, the corresponding spaces are isomorphic and in therespective category.In the diagram below:• Lµ is an isomorphism of cone (hence Whitney (B)) stratifiedspaces; in particular, Lµ it is a homeomorphism(H)is the restriction of Lµ to the stratum determined by H := Gm(H)and fOµ• Lµ• fµ(H)are the Sjamaar principle symplectomorphismsGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201534 J−1(µ)lµJ−1(Oµ) inclusionEπµπOµ projectionsccLµJ−1(µ)/GµETJ−1(Oµ)/G stratified isomorphismTstratum inclusions(H)LµmJ−1(µ) ∩ (Gµ · MH )/GµEmG · (J−1 (µ) ∩ MH )/G symplectomorphismTT(H)fOµ symplectomorphism(H)fµJ−1(0)/Lm0Lm            mJ−1(µ) ∩ MH / NGµ (H)m/HL0EJ−1(O0)/Lm symplectomorphismLmddddddddddddmJ−1(N (H)m · µ) ∩ MH / (N (H)m/H)35 Cotangent bundle reduction – embeddingΦ : G × Q → Q left free proper action =⇒ Qµ := Q/Gµ is a smoothmanifold and πQ,Qµ : Q → Qµ is a principal Gµ-bundle.Lift Φ to a G-action on (T ∗Q, ωQ); it is free, proper, and it admitsan equivariant momentum map J : T ∗Q → g∗ given byJ(αq ), ξ = αq (ξQ(q)),∀αq ∈ T ∗Q, ξ ∈ g.Reduce at µ ∈ g∗ to get a symplectic manifold ((T ∗Q)µ, Ωµ).HYPOTHESIS: ∃ αµ ∈ Ω2(Q), Gµ-invariant, taking values in J−1(µ).∗∃! βµ ∈ Ω2(Qµ) such that πQ,Qµ βµ = dαµ.βµ is closed (not exact, in general).Note: αµ does not drop to Qµ whereas dαµ does.∗Define Bµ := πQµ βµ ∈ Ω2(T ∗Q), where πQµ : T ∗Qµ → Qµ projection.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201536 Cotangent Bundle Reduction – Embedding Version. There isa symplectic embeddingϕµ : ((T ∗Q)µ, (ωQ)µ) − (T ∗Qµ, ωQµ − Bµ),−onto the vector subbundle [T πQ,Gµ (V )]◦ ⊆ T ∗Qµ, where V ⊂ T Q isthe vector subbundle consisting of vectors tangent to the G-orbits,i.e., its fiber at q ∈ Q equals Vq = {ξQ(q) | ξ ∈ g}, and ◦ denotes theannihilator for the natural duality pairing between T Qµ and T ∗Qµ.∼ϕµ : ((T ∗Q)µ, (ωQ)µ) −→ (T ∗Qµ, ωcan − Bµ) symplectic ⇐⇒ g = gµ.Let A ∈ Ω1(Q; g) be a principal connection on the G-principal bundleπQ,Q/G : Q → Q/G and B ∈ Ω2(Q; g) its curvature.1,Can choose αµ(q) := A(q)∗µ =⇒ dαµ = µ, B + 2 [A ∧ A] ∈ Ω2(Q).∗∗Recall: Bµ = πQµ βµ ∈ Ω2(T ∗Qµ), βµ = πQ,Qµ dαµ ∈ Ω2(Qµ).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201537 Cotangent bundle reduction – fibrationΦ : G × Q → Q left free proper actionCotangent Bundle Reduction—Bundle Version Reduced space(T ∗Q)µ → T ∗(Q/G) is a locally trivial fiber bundle, typical fiber Oµ.This is not good enough because it does not say anything aboutthe symplectic form on (T ∗Q)µ in terms of the symplectic structureof T ∗(Q/G) and the orbit symplectic structure on Oµ.Need to study first the Poisson situation to fix the setup, also easier.Let A ∈ Ω1(Q; g) be a principal connection on πQ,Q/G : Q → Q/G.Hq = {vq ∈ Tq Q | A(vq ) = 0} horizontal space at q ∈ QVq = {ξQ(q) | ξ ∈ g} vertical space at q ∈ QGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201538 Tq QTq Qvq −→ verq (vq ) := [A(q)(vq )]Q(q) ∈ Vq vertical projectionvq −→ horq (vq ) := vq − verq (vq ) horizontal projectionTq πQ,Q/G|Hq : Hq → T[q](Q/G) isomorphism with inverseHorq := Tq πQ,Q/G|Hq−1: T[q](Q/G) → Hq , horizontal lift at q ∈ QπQ×g,Q/G : ˜ = (Q × g)/G → Q/G, the adjoint bundle; a vectorggbundle with fibers isomorphic to g; πQ×g,˜ : Q × g → ˜ projectiongVector bundle isomorphismαA : T Q/G[vq ] −→ (Tq π(vq ), [q, A(q)(vq )]) ∈ T (Q/G) ⊕ gwith inverse−1αA : T (Q/G) ⊕ g−1(αA )∗ : T ∗Q/Gv[q], [q, ξ] −→ [Horq v[q] + ξQ(q)] ∈ T Q/G[αq ] −→ Hor∗ αq , [q, J(αq )] ∈ T ∗(Q/G) ⊕ g∗q∗∗where Hor∗ : Tq Q → T[q](Q/G) is dual to Horq : T[q](Q/G) → Tq Q.qGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201539 A ∈ Ω1(Q; g) induces an affine connection on T ∗(Q/G) ⊕ g∗ → Q/G.For f ∈ C ∞(T ∗(Q/G) ⊕ g∗), w = (α[q], [q, µ]) ∈ W := T ∗(Q/G) ⊕ g∗,vα[q] ∈ Tα[q] (T ∗(Q/G)), the exterior covariant derivative is:dAf (w) ∈ Tα[q] (T ∗(Q/G)),˜πQ/G : T ∗(Q/G) → Q/GdAf (w) vα[q] := df (w) vα[q] , T(q,µ)πQ×g,g Horq Tα[q] πQ/G vα[q]˜,0Push forward by (α−1)∗ Poisson bracket. If f, g ∈ C ∞ (T ∗(Q/G) ⊕ g∗)A{f, g}W (w) = ωQ/G α[q]dAf (w) , dAg(w)˜˜+ [q, µ], B α[q]dAf (w) , dAg(w)˜˜δf δg− w,,δw δwδfδw ∈ (T (Q/G) ⊕ g)α[q] is the fiber derivativeδfw,δwd:=f (w + tw ),dt t=0w, w ∈ T ∗(Q/G) ⊕ g∗ α[q]∗B := πQ/GB ∈ Ω2(T ∗(Q/G); g), B ∈ Ω2(T ∗(Q/G); g)B([q]) Tq πQ/Guq , Tq πQ/Gvq := [q, CurvA(q)(uq , vq )]Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201540 Determine the symplectic leaves of the gauged Lie-Poisson bracketon T ∗(Q/G) ⊕ g∗? Solved by Perlmutter in his 1999 thesis and infinal form Marsden-Perlmutter [2000].O ⊂ g∗ codajoint orbitO := (Q×O)/G → Q/G associated fiber bundle. πQ×O,O : Q×O → OT ∗(Q/G) ×Q/G O :=α[q], [q, ν] | q ∈ Q, ν ∈ O, α[q] ∈ Tα[q] (T ∗Q) is∗a fiber bundle over Q/G whose fiber at [q] ∈ Q/G is T[q](Q/G) × O[q](α−1)∗ J−1(O/G) = T ∗(Q/G) ×Q/G O ⊂ T ∗(Q/G) ⊕ gASo, reduced symplectic form ωO on (T ∗Q)O := J−1(O)/G pushesforward by (α−1)∗ to a symplectic form ωA on T ∗(Q/G) ×Q/G O:AGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201541 ωA = ωQ/G − βwhere β ∈ Ω2 O is uniquely determined byπ∗Q×O,O+∗β = dα + πQ×O,O ωO ,α ∈ Ω1(Q × O),α(q, ν) uq , − ad∗ ν := ν, A(q)(uq ) ,ξq ∈ Q, uq ∈ Tq Q, ν ∈ O, ξ ∈ g.dα has the explicit expressiondα(q, ν)uq , − ad∗ ν , vq , − ad∗ νηξ= ν, [η , ξ] + [η, ξ ] + [ξ, η] + CurvA(q)(uq , vq )q ∈ Q, ν ∈ O, ξ, ξ , η, η ∈ g, uq , vq ∈ Tq Q, whereuq = ξQ(q) + horq uq ,vq = ηQ(q) + horq vqis the vertical-horizontal splitting on Tq Q given by A.∼So, (T ∗Q)O , (ωQ)O ←→ T ∗(Q/G) ×Q/G O, ωA symplectomorphismGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201542 Reconstruction of dynamicsGiven: Integral curve cµ(t) of Xhµ ∈ X ((T ∗Q)µ). Let αq ∈ J−1(µ).Find integral curve c(t) of Xh ∈ X(T ∗Q) with initial condition αq .Solution: c(t) = g(t) · d(t). Let A ∈ Ω1(J−1(µ); gµ) be a connectionand take d(t) to be the horizontal lift through αq of cµ(t). Solveg(t) = TeLg(t)ξ(t), g(0) = e. So, it all comes down to:˙• Choice of a convenient connection A ∈ Ω1(J−1(µ); gµ).• Finding ξ(t) ⊂ gµ in terms of d(t).∼1.) Gµ = S 1 or R. Let ζ ∈ gµ be a basis. Identify R a ←→ aζ ∈ gµ.1Connection A = µ,ζ θµ ∈ Ω1(J−1(µ)), where θµ is the pull backto J−1(µ) of the canonical θQ ∈ Ω1(T ∗Q); ωQ = −dθQ canonical1symplectic form on T ∗Q. The curvature is CurvA = − µ,ζ ωµ ∈∂Ω2((T ∗Q)µ). Then ξ(t) = dh(Λ)(d(t)), where Λ = pi ∂p (uniquei∗ Q satisfying dθ (Λ, ·) = θ ).vector field on TQQGeometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201543 2.) Let A ∈ Ω1(Q; gµ) be a connection on the left Gµ-principalbundle Q → Q/Gµ. A induces a connection A ∈ Ω1(J−1(µ); gµ) byA(αq ) Vαq := A(q) Tαq πQ Vαq,∗q ∈ Q, αq ∈ Tq Q, Vαq ∈ Tαq (T ∗Q).Then ξ(t) = A(q(t)) Fh(d(t) ⊂ gµ, Fh : T ∗Q → T Q fiber derivative,q(t) := πQ(d(t)) ⊂ Q.3.) Let (Q, ·, · ) be a Riemannian manifold and G act by isometries. The mechanical connection is defined by requiring that itshorizontal bundle is the orthogonal to the vertical bundle.Amech(q)(uq ) := Iµ(q)−1J uq ,q ∈ Q, uq ∈ Tq Q∼∗uq := uq , · ∈ Tq Q, Iµ(q) : gµ −→ g∗ is the µ-locked inertia tensorµdefined for each q ∈ Q by Iµ(q)(ζ)(η) := ζQ(q), ηQ(q) . Specialsituation of 2.). Thenξ(t) = Amech(q(t)) d(t).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201544 4.) Simple mechanical systems. The Hamiltonian is of the formh = k + v ◦ πQ, where k is the kinetic energy of the cometric on T ∗Qdetermined by a Riemannian metric ·, · on Q and v ∈ C ∞(Q). Gacts by isometries on Q and the potential energy v is G-invariant.The reconstruction method is quite explicit in this case.∗Given is αq ∈ J−1(µ) ⊂ Tq Q and the solution cµ(t) ⊂ (T ∗Q)µ of Xhµwith initial condition [αq ] ∈ (T ∗Q)µ.Step 1.) ϕµ : (T ∗Q)µ, (ωQ)µ → T ∗(Q/Gµ), ωQ/Gµ − Bµ , symplectic embedding onto a vector subbundle, Bµ induced by the mechanical connection. Then ϕµ(cµ(t)) is an integral curve of theHamiltonian system on T ∗(Q/Gµ), ωQ/Gµ − Bµ given by the kineticenergy of the quotient Riemannian metric on Q/Gµ and the quotient of the amended potential vµ := h ◦ αµ ∈ C ∞(Q). Compute thecurvesϕµ(cµ(t)) ⊂ T ∗(Q/Gµ) andqµ(t) := πQ/Gµ (ϕµ(cµ(t))) ⊂ Q/Gµ.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201545 Step 2.) Using the mechanical connection Amech ∈ Ω1(Q; gµ),horizontally lift qµ(t) ∈ Q/Gµ to a curve qh(t) ⊂ Q with qh(0) = q.Step 3.) Determine ξ(t) ⊂ gµ from the algebraic equationξ(t)Q(qh(t)), ηQ(qh(t))= µ, η , ∀η ∈ gµ.So, qh(0) and ξ(0)Q(q) are the horizontal and vertical components˙of the vector αq ∈ Tq Q.Step 4.) Solve g(t) = TeLg(t)ξ(t) in Gµ with g(0) = e.˙Step 5.) With qh(t) from Step 2.) and g(t) from Step 4.),define q(t) := g(t) · qh(t). This is the base integral curve of thesimple mechanical system with Hamiltonian h = k + v ◦ πQ satisfyingq(0) = q. The curve q(t) ⊂ T ∗Q is the integral curve of Xh with˙q(t) (0) = αq .˙Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201546 Interesting special cases(a) If Gµ is Abelian, equation in Step 4.) has the solutiontg(t) =0ξ(s)ds.(b) Gµ = S 1, ζ basis of gµ. Can solve for ξ(t) in Step 3.), namelyµ, ζξ(t) =ζζQ(qh(t)) 2and hencetq(t) = expµ, ζds0 ζQ (qh (s)) 2· qh(t)(c) If G is compact and (·, ·) is a positive definite metric, invariantunder the adoint G-action on g, and satisfying(ζ, η) =ζQ(q), ηQ(q),∀q ∈ Q, ζ, η ∈ g,then ξ ∈ gµ is uniquely determined by (ξ, ·) = µ|gµ and g(t) = exp(tξ).Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201547 (d) If G is solvable, let {ξ1, . . . , ξn} ⊂ g be a basis. Writeg(t) = exp(f1(t)ξ1) · · · exp(fn(t)ξn).Wei and Norman [1964] have shown that g(t) = TeLg(t)ξ(t) can be˙solved by quadratures for the all the functions f1(t), . . . , fn(t).˙(e) If ξ(t) = α(t)ξ(t) for a known function α(t), then g(t) =exp(f (t)ξ(t)) solves g(t) = TeLg(t)ξ(t), where˙tf (t) =sexp0tα(r)dr ds.The conditions in (c) are very strong, but they hold for the KaluzaKlein construction. Many of these formulas are very useful whenone wants to compute geometric phases.What happens if the action of G on Q is not free? Only partialresults of Perlmutter and Rodr´ıguez-Olmos. General case is open.Geometric Science of Information, Ecole Polytechnique, Paris-Saclay, October 28-30, 201548

Dimension reduction on Riemannian manifolds (chaired by Xavier Pennec, Alain Trouvé)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper presents derivations of evolution equations for the family of paths that in the Diffusion PCA framework are used for approximating data likelihood. The paths that are formally interpreted as most probable paths generalize geodesics in extremizing an energy functional on the space of differentiable curves on a manifold with connection. We discuss how the paths arise as projections of geodesics for a (non bracket-generating) sub-Riemannian metric on the frame bundle. Evolution equations in coordinates for both metric and cometric formulations of the sub-Riemannian geometry are derived. We furthermore show how rank-deficient metrics can be mixed with an underlying Riemannian metric, and we use the construction to show how the evolution equations can be implemented on finite dimensional LDDMM landmark manifolds.
 

Faculty of ScienceAnisotropic Distributions on Manifolds,Diffusion PCA, and Evolution EquationsGSI 2015, Paris, FranceStefan SommerDepartment of Computer Science, University of CopenhagenOctober 29, 2015Slide 1/21 Intrinsic Statistics in Geometric SpacesStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 2/21 Statistics on Manifolds1• Frechet mean: argminx ∈M N ∑N 1 d (x , yi )2´i=• PGA (Fletcher et al., ’04); GPCA (Huckeman et al.,’10); HCA (Sommer, ’13); PNS (Jung et al., ’12); BS(Pennec, ’15)PGAGPCAHCAStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 3/21 Infinitesimally defined Distributions; MLE• aim: construct a family NM (µ, Σ) of anisotropicGaussian-like distributions; fit by MLE/MAP• in Rn , Gaussian distributions are transitiondistributions of diffusion processesdXt = dWt• on (M , g ), Brownian motion is transition distribution ofstochastic process (Eells-Elworthy-Malliavinconstruction), or solution to heat diffusion equation∂1p(t , x ) = ∆p(t , x )∂t2• infinitesimal dXtvs.global pt (x ; y ) ∝ e− x −y2Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 4/21 MLE of Diffusion Processes• Eells-Elworthy-Malliavin construction gives map: FM → Dens(M )DiffDiff (FM ) = NM ⊂ Dens(M ): the set of (normalized)transition densities from FM diffusions• γ = Diff (x , Xα ) = pγ γ0 , the log-likelihood•Nln L (x , Xα ) = ln L (γ) =∑ ln pγ(yi )i =1• Estimated Template: argmax(x ,Xα )∈FM ln L (x , Xα )• MLE of data yi under the assumption y ∼ γ ∈ NM• Diffusion PCA (Sommer ’14): argmax ln L (x , Xα + εI )generalizing Probabilistic PCA (Tipping, Bishop, ’99;Zhang, Fletcher ’13)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 5/21 Most Probable Paths to Samples• Euclidean:T−1• density pt (x ; y ) ∝ e−(x −y ) Σ (x −y )• transition density of diffusion processes withstationary generator• x − y most probable path from y to x• Manifolds:• which distributions correspond to anisotropicGaussian distributions N (x , Σ)?• what is the most probable path from y to x?Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 6/21 Anisotropic Diffusions and Holonomy• driftless diffusion SDE in Rn , stationary generator:dXt = σdWt , σ ∈ M n×d• diffusion field σ, infinitesimal generator σσT• curvature: stationary field/generator cannot bedefined due to holonomyStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 7/21 Stochastic Development:Eells-Elworthy-Malliavin Construction• Xt : Rn valued Brownian motion (driving process)• Ut : FM valued (sub-elliptic) diffusion• Yt : M valued stochastic process (target process)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 8/21 The Frame Bundle• the manifold and frames (bases) for the tangentspaces Tp M• F (M ) consists of pairs u = (x , Xα ), x ∈ M, Xα framefor Tx M• curves in the horizontal part of F (M ) correspond tocurves in M and parallel transport of framesStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 9/21 Driving process, FM valued process andTarget process• Hi , i = 1 . . . , n horizontal vector fields on F (M ):Hi (u ) = π−1 (ui )∗• SDE in Rn (driving):dXt = Idn dBt , X0 = 0• SDE in FM:dUt = Hi (Ut ) ◦ dXti ,U0 = (x0 , Xα ) , Xα ∈ GL(Rn , Tx0 M)• Process on M (target):Yt = πFM (Ut )Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 10/21 Ut : Frame Bundle DiffusionStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 11/21 Estimated TemplatesMLE templateStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 12/21 Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• Onsager-Machlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21 Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• Onsager-Machlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21 Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• Onsager-Machlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))• MPP for target processStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21 Most Probable Paths• in Rn , straight lines are most probable for stationarydiffusion processes• Onsager-Machlup functional, σt curve on M:L(σt ) = −12σ (t )2g+112R (σ(t ))• MPP for driving processR=0Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 13/21 Definition (MPPs for Driving Process)Let Xt be the driving process for the diffusion Yt and x ∈ M, i.e.Yt = π(φ(Xt )). Then σ is a most probable path for the drivingprocess if it satisfies1σ = argminc ∈H (Rd ),φ(c )(1)=x−L(ct )dt0PropositionLet Yα be a frame for Ty M, and let Yt = π(φ(y ,Yα ) (Xt )), i.e. Yt isthe development of Xt starting at (y , Yα ). Then MPPs for thedriving process Xt maps to geodesics of a lifted sub-Riemannianmetric on FM:−−˜˜w , w FM = Xα 1 π∗ w , Xα 1 π∗ w Rn .• isotropic case, MPPs for drv. process maps to geodesics1• if − ln L (x , Xα ) ≈ c + N ∑N 1 p(MPP(x , yi )). Then Frechet´i=mean ≈ MLE, isotropic caseStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 14/21 MPPs on S2increasing anisotropy −→(a) cov. diag(1, 1) (b) cov. diag(2, .5) (c) cov. diag(4, .25)Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 15/21 Sub-Riemannian Geometry on FM• Xα : Rn → Tx M gives inner-product−−v , w Xα = Xα 1 v , Xα 1 w Rn• optimal control problem with nonholonomicconstraints1˙ 2xt = arg min0 ct Xα,t dtct ,c0 =x ,c1 =y• let˜ ˜v,wHFM−−˜= Xα,t1 π∗ (˜), Xα,t1 π∗ (w )vRnon H(xt ,Xα,t ) FM. This defines a sub-Riemannianmetric G on TFM and equivalent problem1(xt , Xα,t ) =arg min(ct ,Cα,t ),c0 =x ,c1 =y0˙ ˙(ct , Cα,t )2HFM dt˙ ˙with horizontality constraint (ct , Cα,t ) ∈ H(ct ,Cα,t ) FMStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 16/21 MPP Evolution Equations• sub-Riemannian Hamilton-Jacobi equationspq˙ytk = Gkj (yt )ξt ,j1 ∂G˙ξt ,k = −ξt ,p ξt ,q2 ∂y k,i• in coordinates (x i ) for M, Xα for Xα , and W encodingk lthe inner product W kl = δαβ Xα Xβ :jβ˙x i = W ij ξj − W ih Γh ξjβ,iijβαα˙iXα = −Γh W hj ξj + Γk W kh Γh ξjβ1 hγ kh kδhkk˙Γk ,i W Γh + Γk γ W kh Γhδi ξhγ ξkδξi = W hl Γl ,δ ξh ξkδ −i,2˙ξiα =hkΓk γiα W kh Γhδ ξhγ ξkδ,−12kk− W hl,iα Γl δ + W hl Γl ,δα ξh ξkδihγkW hk,iα ξh ξk + Γk W kh,iα Γhδ ξhγ ξkδStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 17/21 Landmark LDDMM• Christoffel symbols (Michelli et al. ’08)1Γk ij = gir g kl g rs,l − g sl g rk,l − g rl g ks,l gsj2• mix of transported frame and cometric: F d M bundle˜of rank d linear maps Rd → Tx M, ξ, ξ ∈ T ∗ F d M,cometric gF d M + λgR :˜˜˜ξ, ξ = δαβ (ξ|π−1 Xα )(ξ|π−1 Xβ ) + λ ξ, ξ∗∗gR• the whole frame need not be transportedStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 18/21 LDDMM Landmark MPPs+ horz. var.isotropic+ vert. var.Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 19/21 Statistical Manifold: Geometry of Γ• DensitiesDens(M ) = {γ ∈ Ωn (M ) :γ = 1, γ > 0}MFR• Fisher-Rao metric: Gγ (α, β) =• Γ finite dim. subset of Dens(M )αβM γ γγ: FM → Dens(M )Diff0• naturally defined on bundle of symmetric positive T2tensorsStefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 20/21 Summary• infinitesimal definition of anisotropic normaldistributions NM (µ, Σ) on M• diffusion map Diff : FM → Dens(M ) fromEells-Elworthy-Malliavin construction, stoch. develop.• MLE of template / covariance (in FM)• MPPs for driving processes generalize geodesicsbeing sub-Riemannian geodesics12345Sommer: Diffusion Processes and PCA on Manifolds, Oberwolfach extendedabstract (Asymptotic Statistics on Stratified Spaces), 2014.Sommer: Anisotropic Distributions on Manifolds: Template Estimation and MostProbable Paths, Information Processing in Medical Imaging (IPMI) 2015.Sommer: Evolution Equations with Anisotropic Distributions and Diffusion PCA,Geometric Science of Information (GSI) 2015.Svane, Sommer: Similarities, SDEs, and Most Probable Paths, SIMBAD15extended abstract.Sommer, Svane: Holonomy, Curvature, and Anisotropic Diffusions, MOTR15extended abstract.Stefan Sommer (sommer@diku.dk) (Department of Computer Science, University of Copenhagen) — Anisotropic Distributions on Manifolds, Diffusion PCA, and Evolution EquationsSlide 21/21

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper addresses the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. Current methods like Principal Geodesic Analysis (PGA) and Geodesic PCA (GPCA) minimize the distance to a “Geodesic subspace”. This allows to build sequences of nested subspaces which are consistent with a forward component analysis approach. However, these methods cannot be adapted to a backward analysis and they are not symmetric in the parametrization of the subspaces. We propose in this paper a new and more general type of family of subspaces in manifolds: barycentric subspaces are implicitly defined as the locus of points which are weighted means of k + 1 reference points. Depending on the generalization of the mean that we use, we obtain the Fréchet/Karcher barycentric subspaces (FBS/KBS) or the affine span (with exponential barycenter). This definition restores the full symmetry between all parameters of the subspaces, contrarily to the geodesic subspaces which intrinsically privilege one point. We show that this definition defines locally a submanifold of dimension k and that it generalizes in some sense geodesic subspaces. Like PGA, barycentric subspaces allow the construction of a forward nested sequence of subspaces which contains the Fréchet mean. However, the definition also allows the construction of backward nested sequence which may not contain the mean. As this definition relies on points and do not explicitly refer to tangent vectors, it can be extended to non Riemannian geodesic spaces. For instance, principal subspaces may naturally span over several strata in stratified spaces, which is not the case with more classical generalizations of PCA.
 

Xavier PennecAsclepios team, INRIA Sophia-Antipolis –Mediterranée, FranceandCôte d’Azur University (UCA)Barycentric Subspaces andAffine Spans in ManifoldsGSI 30-10-2015 Statistical Analysis of Geometric FeaturesComputational Anatomy deals with noisyGeometric MeasuresTensors, covariance matricesCurves, tractsSurfaces, shapesImagesDeformationsData live on non-Euclidean manifoldsX. Pennec - GSI 20152 Low dimensional subspace approximation?Manifold of cerebral ventriclesEtyngier, Keriven, Segonne 2007.Manifold of brain imagesS. Gerber et al, Medical Image analysis, 2009.Manifold dimension reductionWhen embedding structure is already manifold (e.g. Riemannian):Not manifold learning (LLE, Isomap,…) but submanifold learningX. Pennec - GSI 20153 Barycentric Subspacesand Affine Spans in ManifoldsPCA in manifolds: tPCA / PGA / GPCA / HCAAffine span and barycentric subspacesConclusionX. Pennec - GSI 20154 Bases of Algorithms in Riemannian ManifoldsExponential map (Normal coordinate system):Expx = geodesic shooting parameterized by the initial tangentLogx = development of the manifold in the tangent space along geodesics Geodesics = straight lines with Euclidean distance Local  global domain: star-shaped, limited by the cut-locus Covers all the manifold if geodesically completeReformulate algorithms with Expx and LogxVector -> Bi-point (no more equivalence classes)5 Statistical tools: MomentsFrechet / Karcher mean minimize the variance

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We present a novel method that adaptively deforms a polysphere (a product of spheres) into a single high dimensional sphere which then allows for principal nested spheres (PNS) analysis. Applying our method to skeletal representations of simulated bodies as well as of data from real human hippocampi yields promising results in view of dimension reduction. Specifically in comparison to composite PNS (CPNS), our method of principal nested deformed spheres (PNDS) captures essential modes of variation by lower dimensional representations.
 

IntroductionDeformationSkeletal RepresentationsDimension Reduction on Polyspheres withApplication to Skeletal Representationsjoint work with Stephan Huckemann and Sungkyu JungBenjamin EltznerUniversity of Göttingenconference on Geometric Science of Information, 2015-10-30Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Tangent space approaches ignore geometry and periodic topology.Intrinsic approaches rely on manifold geometry.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsDimension Reduction on ManifoldsPCA relies on linearity.Tangent space approaches ignore geometry and periodic topology.Intrinsic approaches rely on manifold geometry. Two classes:Forward methods: Submanifold dimension d = 1, 2, 3, . . .Needs “good” geodesics and a construction scheme.Backward methods: d = D − 1, D − 2, D − 3, . . .Needs rich (parametric) set of submanifolds.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsPolysphere Dimension ReductionddKAlmost all geodesics of PD = Sr11 × · · · × SrK are dense in (S1 )K .Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsPolysphere Dimension ReductionddKAlmost all geodesics of PD = Sr11 × · · · × SrK are dense in (S1 )K .Low symmetry isom(PD ) = SO(d1 + 1) × · · · × SO(dK + 1), nogeneric rich set of submanifolds.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsConclusionDeformation for Unit SpheresDimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3Recursively deform polysphere to sphere f : PD → SD .Squared line elements of two unit spheres:d1ds21=sink=1d2k−12φ1,j  dφ2 ,1,kj=1Deformation: ds2 = ds2 +2ds221k−12=sin φ2,j  dφ22,kk=1d2j=1j=1sin2 φ2,j ds21S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319.S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83.3S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568.2Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representations IntroductionDeformationSkeletal RepresentationsConclusionDeformation for Unit SpheresDimension reduction methods exist for spheres: GPCA1 , HPCA2 , PNS3Recursively deform polysphere to sphere f : PD → SD .Squared line elements of two unit spheres:d1ds21=sink=1d2k−12φ1,j  dφ2 ,1,kj=1Deformation: ds2 = ds2 +2ds22k−12=sin φ2,j  dφ22,kk=1d2j=1j=1sin2 φ2,j ds21Degrees of freedom: Rotation and ordering of spheres.1S. Huckemann and H. Ziezold. Advances in Applied Probability 2.38 (2006), pp. 299–319.S. Sommer. Geometric Science of Information. Vol. 8085. Lecture Notes in Computer Science. 2013, pp. 76–83.3S. Jung, I. L. Dryden, and J. S. Marron. Biometrika 99.3 (2012), pp. 551–568.2Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representations IntroductionDeformationSkeletal RepresentationsFixing Degrees of FreedomRotation:dEmbed Srii into Rdi +1 .Determine Fréchet mean µi and use rotation along a geodesic to move itˆto positive xi,di +1 -direction (north pole).Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsConclusionFixing Degrees of FreedomRotation:dEmbed Srii into Rdi +1 .Determine Fréchet mean µi and use rotation along a geodesic to move itˆto positive xi,di +1 -direction (north pole).Ordering:NData spread: si =d2 (ψi,n , µi )ˆn=1Choose permutation p such that sp−1 (1) is maximal and sp−1 (K) is minimal.Minimizes distortion due to factors sin2 φj , i. e. deviation from polyspheregeometry.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representations IntroductionDeformationSkeletal RepresentationsMapping Data PointsdEmbedding S1i ⊂ Rdi +1 we get∀1 ≤ j ≤ d2 :yj = x2,j ,∀1 ≤ k ≤ d1 + 1 :Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representationsyd2 +k = x2,d1 +1 x1,jConclusion IntroductionDeformationSkeletal RepresentationsConclusionMapping Data PointsdEmbedding S1i ⊂ Rdi +1 we get∀1 ≤ j ≤ d2 :yj = x2,j ,∀1 ≤ k ≤ d1 + 1 :yd2 +k = x2,d1 +1 x1,jFor different radii, rescale∀1 ≤ j ≤ d1 + 1 :∀i > 1 ∀1 ≤ j ≤ di :x1,j → ˜1,j = R1 x1,j ,xxi,j → ˜i,j = Ri xi,jxand use ˜ in definition of y coordinates.xThis yields an ellipsoidd1 +1d22R−2 x2,k +2x ∈ Rd2 +d1 +1k=1R−2 (x2,d2 +1 x1,k )2 = 11k=1Normalize all y-vectors to length R :=Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsKj=1Rj1Kas final step.. IntroductionDeformationSkeletal RepresentationsIllustration for Different Radii1. Map from bluepolysphere togreen ellipsoid.2. Map to redsphere.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsA Brief Review of Principal Nested Spheres (PNS)PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}.Recursively fit small subsphere Sd−1 ⊂ Sd minimizing sum of squaredgeodesic projection distances.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsA Brief Review of Principal Nested Spheres (PNS)PNS determines a sequence SK ⊃ SK−1 ⊃ · · · ⊃ S2 ⊃ S1 ⊃ {µ}.Recursively fit small subsphere Sd−1 ⊂ Sd minimizing sum of squaredgeodesic projection distances.At every projection, save signed projection distance (residuals).Parameter space dimension for Sd−1 ⊂ Sd is p = d + 1, compared tolinear PCA where for Rd−1 ⊂ Rd it is p = d.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsSkeletal Representation (s-rep) Parameter SpaceS-rep consists of1. A two-dimensional mesh of m × nskeletal points.2. Spokes from mesh points to the surface.Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsSkeletal Representation (s-rep) Parameter SpaceS-rep consists of1. A two-dimensional mesh of m × nskeletal points.2. Spokes from mesh points to the surface.Parameters: Size of centered mesh, spoke lengths, normalizedmesh-points, spoke directions:Q = R+ × RK × S3mn−1 × S2+Polysphere deformation on S3mn−1 × S2KKyieldsQ = S5mn+2m+2n−5Image from: J. Schulz et al. Journal of Computational and Graphical Statistics 24.2 (2015), p. 539Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsDimension Reduction for Real S-repsPNDS: Deform polysphere to sphere and apply PNS.CPNS: PNS on spheres individually and linear PCA on joint residuals.100PNDSCPNSVariances [%]80604020001020304050DimensionFigure : PNDS vs. CPNS: residual variances for s-reps of 51 hippocampi5 .5S. M. Pizer et al. Ed. by M. Breuß, Bruckstein, and Maragos. Springer, Berlin, 2013, pp. 93–115.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion IntroductionDeformationSkeletal RepresentationsConclusionDimension Reduction for Simulated S-reps0.40component 1variance = 92.02%components 2 and 1100components 3 and 11000.45component 1variance = 62.73%0.400.35components 2 and 1components 3 and 110010050500.350.3050500.300.250.250.2000−50−50−100−100000.200.150.100.150.00−100−50050−100100components 1 and 21.2−500500.05−100100component 2variance = 5.95%−500501000.00components 3 and 2100−50−50−100−1000.100.05−100−50050−100100components 1 and 20.5−50050−100100component 2variance = 32.10%−50050100components 3 and 21001001.0501000.4500.850500.300.60000.20.4−50−50−500.10.2−100−100−500501000.0components 1 and 3−50050−10010010050−100−100−100components 2 and 3100−50501.8−50050−100−100100component 3variance = 0.64%−500501000.0components 1 and 31.6−100−50050−100100components 2 and 31.210010050500−50−1001000.4−100500.6−5000.80−50component 3variance = 2.17%1.01.41.21.0000.8−500.6−50−1000.40.2−100−100−50050100−100−500501000.0−100−50050100−100−500501000.2−100−500501000.0−100−50050100Figure : PNDS vs. CPNS for simulated twisted ellipsoids: scatter plots of residualsigned distances for the first three components.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal Representations IntroductionDeformationSkeletal RepresentationsReflection on Parameter Space Dimension1.00.50.0−0.5Parameter space dimensions:−1.01.00.5−1.0−0.50.00.00.51.0−0.5−1.0Figure : Simulated twisted ellipsoiddata projected to the secondcomponent (a small two-sphere) inPNDS with first component (a smallcircle) inside.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsPNS on SD : p = 1 D(D + 3) − 1.2PCA on RD : p = 1 D(D + 1).2Conclusion IntroductionDeformationSkeletal RepresentationsConclusionWe propose a deformation procedure mapping data on a polysphere tosphere.The construction aims at minimizing geometric distortion.We achieve lower dimensional representations than CPNS.The success of our method is rooted in the higher parameter spacedimension.Benjamin Eltzner University of GöttingenDimension Reduction on Polyspheres with Application to Skeletal RepresentationsConclusion

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
This paper studies the affine-invariant Riemannian distance on the Riemann-Hilbert manifold of positive definite operators on a separable Hilbert space. This is the generalization of the Riemannian manifold of symmetric, positive definite matrices to the infinite-dimensional setting. In particular, in the case of covariance operators in a Reproducing Kernel Hilbert Space (RKHS), we provide a closed form solution, expressed via the corresponding Gram matrices.
 

Affine-invariant Riemanniandistance betweeninfinite-dimensional covarianceoperatorsH` Quang MinhaIstituto Italiano di Tecnologia, ITALYAffine-invariant Riemannian distance between infinite-dimensional covariance operators – p.1/52 From finite to infinite dimensionsAffine-invariant Riemannian distance between infinite-dimensional covariance operators – p.2/52 Outline1. Review of finite-dimensional setting:Affine-invariant Riemannian metric on the manifold ofsymmetric positive definite matrices2. Infinite-dimensional generalization: Riemann-Hilbertmanifold of positive definite unitized Hilbert-Schmidtoperators3. Affine-invariant Riemannian distance betweenReproducing Kernel Hilbert Spaces (RKHS)covariance operatorsAffine-invariant Riemannian distance between infinite-dimensional covariance operators – p.3/52 Positive definite matricesSym++ (n) = symmetric, positive definite n × n matricesHave been studied extensively mathematicallyNumerous practical applicationsBrain imaging (Arsigny et al 2005, Dryden et al2009, Qiu et al 2015)Computer vision: object detection (Tuzel et al 2008,Tosato et al 2013), image retrieval (Cherian et al2013), visual recognition (Jayasumana et al 2015)Radar signal processing: Barbaresco (2013),Formont et al 2013Machine learning: kernel learning (Kulis et al 2009)Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.4/52 Positive definite matricesSym++ (n) = symmetric, positive definite n × n matricesDifferentiable manifold viewpointTangent space TP (Sym++ )(n) ∼ Sym(n) = vector space=of symmetric matricesAffine-invariant Riemannian metric: on TP (Sym++ (n))A, BP= P −1/2 AP −1/2 , P −1/2 BP −1/2F= tr[P −1 AP −1 B]with the Frobenius inner product A, BF= tr(AT B)Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.5/52 Positive definite matricesRiemannian metric: on TP (Sym++ (n)) = Sym(n)A, BP= P −1/2 AP −1/2 , P −1/2 BP −1/2with the Frobenius inner product A, BFF= tr(AT B)Affine-invarianceCAC T , CBC TCP C T= A, BPfor any matrix C ∈ GL(n)Siegel (1943), Mostow (1955), Pennec et al 2006,Bhatia 2007, Moakher and Zéraï 2011, Bini andIannazzo 2013Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.6/52 Positive definite matricesGeodesically complete, with nonpositive curvatureGeodesic joining P, Q ∈ Sym++ (n)γP Q (t) = P 1/2 (P −1/2 QP −1/2 )t P 1/2The exponential mapExpP : TP (Sym++ (n)) → Sym++ (n)ExpP (V ) = P 1/2 exp(P −1/2 V P −1/2 )P 1/2is defined on all of TP (Sym++ (n)Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.7/52 Positive definite matricesRiemannian distancedaiE (A, B) = || log(A−1/2 BA−1/2 ||Fwhere log(A) is the principal logarithm of AA = U DU T = U diag(λ1 , . . . , λn )U Tlog(A) = U log(D)U T = U (log λ1 , . . . , log λn )U TAffine-invariancedaiE (CAC T , CBC T ) = daiE (A, B)for any matrix C ∈ GL(n).Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.8/52 Positive definite matricesOther metrics/distancesLog-Euclidean metric: bi-invariant Riemannian metric(Arsigny et al 2007)dlogE (A, B) = || log(A) − log(B)||FNon-Riemannian metrics: Bregman divergences, e.g.Stein divergence, (its square root is a metric, Sra 2012)dstein (A, B) = logdetA+B2det(A) det(B)Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.9/52 Covariance matricesρ= Borel probability distribution on Rn ,Mean vector (in Rn )Rn|x|2 dρ(x) < ∞xdρ(x)µ = Eρ [x] =RnCovariance matrix (n × n)C = Eρ [(x − µ)(x − µ)T ] = E[xxT ] − µµTFor ρ1 ∼ N (µ, C1 ), ρ2 ∼ N (µ, C2 )daiE (C1 , C2 ) = 2(Fisher-Rao distance between ρ1 and ρ2 )Affine-invariant Riemannian distance between infinite-dimensional covariance operators – p.10/52 Empirical covariance matricesx = [x1 , . . . , xm ] = data matrix randomly sampled fromX = Rn , with m observations, xi ∈ RnEmpirical mean vector1µx

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
We develop a generic framework to build large deformations from a combination of base modules. These modules constitute a dynamical dictionary to describe transformations. The method, built on a coherent sub-Riemannian framework, defines a metric on modular deformations and characterises optimal deformations as geodesics for this metric. We will present a generic way to build local affine transformations as deformation modules, and display examples.
 

A sub-Riemannian modular approach fordiffeomorphic deformationsGSI 2015Barbara GrisAdvisors: Alain Trouvé (CMLA) and Stanley Durrleman (ICM)gris@cmla.ens-cachan.frOctober 30, 2015 1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Introduction"Is it possible to mechanize human intuitive understanding of biologicalpictures that typically exhibit a lot of variability but also possesscharacteristic structure ?"Ulf GrenanderHands : a Pattern Theoric Study of Biological Shapes, 1991 IntroductionStructure in data IntroductionStructure in dataϕt = vt ◦ ϕt , ϕt=0 = Id˙ IntroductionStructure in dataStructure in deformations IntroductionStructure in dataStructure in deformationsType of vector fields Previous workslocally affine deformationsPoly-affine[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees.Medical image analysis, 2012] Previous workslocally affine deformationsPoly-affine[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees.Medical image analysis, 2012]v (x) =iwi (x)Ai (x) Previous workslocally affine deformationsPoly-affine[C. Seiler , X. Pennec, and M. Reyes. Capturing the multiscale anatomical shape variability with polyaffine transformation trees.Medical image analysis, 2012]v (x) =iwi (x)Ai (x)Deformation structure does not evolve with the flow Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]: Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fields Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works : Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014] Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-ordermomentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013] Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-ordermomentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123-134. Springer, 2011] Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-ordermomentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123-134. Springer, 2011]Deformation structure evolves with flow Previous worksShape space (S. Arguillère)Shape space[S. Arguillere. Géométrie sous-riemannienne en dimension infinie et applications à l’analyse mathématique des formes . PhDthesis, 2014.]:Deformation structure imposed by shapes and action of vector fieldsPrevious works :LDDMM [M. I. Miller, L. Younes, and A. Trouvé. Diffeomorphometry and geodesic positioning systems forhuman anatomy, 2014]Higher-order momentum [S. Sommer M. Nielsen, F. Lauze, and X. Pennec. Higher-ordermomentum distributions and locally affie lddmm registration. SIAM Journal on Imaging Sciences, 2013]Sparse LDDMM [S. Durrleman, M. Prastawa, G. Gerig, and S. Joshi. Optimal data-driven sparseparameterization of diffeomorphisms for population analysis. In Information Processing in Medical Imaging ,pages 123-134. Springer, 2011]Deformation structure evolves with flowNo control on deformation structure Previous worksConstraintsDiffeons[L. Younes. Constrained diffeomorphic shape evolution. Foundations of Computational Mathematics, 2012.] Our model : Deformation modulesPurpose : Our model : Deformation modulesPurpose :Incorporate constraints in the deformation model Our model : Deformation modulesPurpose :Incorporate constraints in the deformation modelMerge different constraints in a complex one Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Deformation modulesDefinition and first examplesA deformation module : Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapes Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapesCan generate vector fields that : Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapesCan generate vector fields that :are of a particular type Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapesCan generate vector fields that :are of a particular type−→ deformation structure Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapesCan generate vector fields that :are of a particular type−→ deformation structuredepend on the state of the shape Deformation modulesDefinition and first examplesA deformation module :Contains a space of shapesCan generate vector fields that :are of a particular type−→ deformation structuredepend on the state of the shape−→ the deformation structure evolves with the flow Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Deformation modulesDefinition and first examples : local translation of scale σExample of generated vector field Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère) Deformation modulesDefinition and first examples : local translation of scale σM = (O, H, V , ζ, ξ, c)O is a shape space (S.Arguillère)There exists C > 0 :∀(o, h) ∈ O × H:|ζ(o, h)|2 ≤ C c(o, h)V Deformation modulesDefinition and first examples : local scaling of scale σ Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector field Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector field Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector field Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector fieldz3z2z1 Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector fieldd3z3z2d2z1d1 Deformation modulesDefinition and first examples : local scaling of scale σExample of generated vector fieldd3z3z2d2z1d1 Deformation modulesDefinition and first examples : local scaling of scale σ Deformation modulesDefinition and first examples : local rotation of scale σ IntroductionDefinition and first examples : local translation of scale σ and fixed direction Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Deformation modulesModular large deformationsM = (O, H, V , ζ, ξ, c) Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H). Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H).v = v ◦ ϕv , ϕv−→ Solutions of ϕt˙ttt=0 = Id exist. Deformation modulesModular large deformationsStudied trajectories :˙t → (ot , ht ) ∈ O × H such that ot = ξ ot (vt ) where vt = ζ ot (ht ) ∈ ζ ot (H).v = v ◦ ϕv , ϕv−→ Solutions of ϕt˙ttt=0 = Id exist.−→ ϕv = modular large deformation. Deformation modulesModular large deformations : an example Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Deformation modulesCombination Deformation modulesCombination Deformation modulesCombinationFeatures :iiif coi (hi ) = |ζoi (hi )|2 i then co (h) =Vii|ζoi (hi )|2 i = |Vi2i ζoi (hi )|V Deformation modulesCombinationFeatures :iiif coi (hi ) = |ζoi (hi )|2 i then co (h) =Vii|ζoi (hi )|2 i = |Vi2i ζoi (hi )|VGeometrical descriptors are transported by the global vector field Deformation modulesCombinationFeatures :iiif coi (hi ) = |ζoi (hi )|2 i then co (h) =Vii|ζoi (hi )|2 i = |Vi2i ζoi (hi )|VGeometrical descriptors are transported by the global vector fieldCoherent mathematical framework : possibility to combine anymodules Deformation modulesCombination : Example of modular large deformation Sommaire1Introduction2Deformation modulesDefinition and first examplesModular large deformationsCombining deformation modules3Numerical results Deformation modulesMatching problem Deformation modulesMatching problem Deformation modulesMatching problem10co (h) + g(ϕv · fsource , ftarget )t=1v = ζo (h)[N. Charon and A. Trouvé. The varifold representation of non-oriented shapes for diffeomorphic registration, 2013] Deformation modulesMatching problem Deformation modulesMatching problem Deformation modulesMatching problem Deformation modulesMatching problem Deformation modulesMatching problem Deformation modulesMatching problem ConclusionWe have presented ConclusionWe have presented a coherent mathematical framework ConclusionWe have presented a coherent mathematical framework to buildmodular large deformations. ConclusionWe have presented a coherent mathematical framework to buildmodular large deformations. We showed how easily incorporatingconstraints in a deformation model ConclusionWe have presented a coherent mathematical framework to buildmodular large deformations. We showed how easily incorporatingconstraints in a deformation model and merging different constraints ina global one. Conclusion"Is it possible to mechanize human intuitive understanding of biologicalpictures that typically exhibit a lot of variability but also possesscharacteristic structure ?"Ulf GrenanderHands : a Pattern Theoric Study of Biological Shapes, 1991 Thank you for your attention !

Optimization on Manifold (chaired by Pierre-Antoine Absil, Rodolphe Sepulchre)

Creative Commons Attribution-ShareAlike 4.0 International Creative Commons Attribution-ShareAlike 4.0 International
Voir la vidéo
The Riemannian trust-region algorithm (RTR) is designed to optimize differentiable cost functions on Riemannian manifolds. It proceeds by iteratively optimizing local models of the cost function. When these models are exact up to second order, RTR boasts a quadratic convergence rate to critical points. In practice, building such models requires computing the Riemannian Hessian, which may be challenging. A simple idea to alleviate this difficulty is to approximate the Hessian using finite differences of the gradient. Unfortunately, this is a nonlinear approximation, which breaks the known convergence results for RTR. We propose RTR-FD: a modification of RTR which retains global convergence when the Hessian is approximated using finite differences. Importantly, RTR-FD reduces gracefully to RTR if a linear approximation is used. This algorithm is available in the Manopt toolbox.
 

Ditch the Hessian Hasslewith Riemannian Trust RegionsNicolas Boumal, Inria & ENS ParisGeometric Science of Information, GSI 2015Oct. 30, 2015, Paris The goal is to optimizea smooth functionon a smooth manifold The Trust Region methodis like Newton’swith a safeguard On the tangent space,optimize the model in a trust region