From Geometry and Physics to Computational Linguistics

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14257
DOI : You do not have permission to access embedded form.

Résumé

I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic 

From Geometry and Physics to Computational Linguistics

Média

Voir la vidéo
YouTube
0:00
unavailable

Métriques

47
0
2.23 Mo
 application/pdf
bitcache://0f362ab6fddee5ff7f575a53769250f70e58ef51

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14257</identifier><creators><creator><creatorName>Matilde Marcolli</creatorName></creator></creators><titles>
            <title>From Geometry and Physics to Computational Linguistics</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Sat 7 Nov 2015</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Sun 25 Jun 2017</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">0f362ab6fddee5ff7f575a53769250f70e58ef51</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>29156</version>
        <descriptions>
            <description descriptionType="Abstract">I will show how techniques from geometry (algebraic geometry and topology) and physics (statistical physics) can be applied to Linguistics, in order to provide a computational approach to questions of syntactic 
</description>
        </descriptions>
    </resource>
.

From Geometry and Physics to Computational Linguistics Matilde Marcolli Geometric Science of Information, Paris, October 2015 Matilde Marcolli Geometry, Physics, Linguistics A Mathematical Physicist’s adventures in Linguistics Based on: 1 Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 2 Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 3 Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 4 Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you Matilde Marcolli Geometry, Physics, Linguistics What is Linguistics? • Linguistics is the scientific study of language - What is Language? (langage, lenguaje, ...) - What is a Language? (lange, lengua,...) Similar to ‘What is Life?’ or ‘What is an organism?’ in biology • natural language as opposed to artificial (formal, programming, ...) languages • The point of view we will focus on: Language is a kind of Structure - It can be approached mathematically and computationally, like many other kinds of structures - The main purpose of mathematics is the understanding of structures Matilde Marcolli Geometry, Physics, Linguistics • How are di↵erent languages related? What does it mean that they come in families? • How do languages evolve in time? Phylogenetics, Historical Linguistics, Etymology • How does the process of language acquisition work? (Neuroscience) • Semiotic viewpoint (mathematical theory of communication) • Discrete versus Continuum (probabilistic methods, versus discrete structures) • Descriptive or Predictive? to be predictive, a science needs good mathematical models Matilde Marcolli Geometry, Physics, Linguistics A language exists at many di↵erent levels of structure An Analogy: Physics looks very di↵erent at di↵erent scales: General Relativity and Cosmology ( 1010 m) Classical Physics (⇠ 1 m) Quantum Physics ( 10 10 m) Quantum Gravity (10 35 m) Despite dreams of a Unified Theory, we deal with di↵erent mathematical models for di↵erent levels of structure Matilde Marcolli Geometry, Physics, Linguistics Similarly, we view language at di↵erent “scales”: units of sound (phonology) words (morphology) sentences (syntax) global meaning (semantics) We expect to be dealing with di↵erent mathematical structures and di↵erent models at these various di↵erent levels Main level I will focus on: Syntax Matilde Marcolli Geometry, Physics, Linguistics Linguistics view of syntax kind of looks like this... Alexander Calder, Mobile, 1960 Matilde Marcolli Geometry, Physics, Linguistics Modern Syntactic Theory: • grammaticality: judgement on whether a sentence is well formed (grammatical) in a given language, i-language gives people the capacity to decide on grammaticality • generative grammar: produce a set of rules that correctly predict grammaticality of sentences • universal grammar: ability to learn grammar is built in the human brain, e.g. properties like distinction between nouns and verbs are universal ... is universal grammar a falsifiable theory? Matilde Marcolli Geometry, Physics, Linguistics Principles and Parameters (Government and Binding) (Chomsky, 1981) • principles: general rules of grammar • parameters: binary variables (on/o↵ switches) that distinguish languages in terms of syntactic structures • Example of parameter: head-directionality (head-initial versus head-final) English is head-initial, Japanese is head-final VP= verb phrase, TP= tense phrase, DP= determiner phrase Matilde Marcolli Geometry, Physics, Linguistics ...but not always so clear-cut: German can use both structures auf seine Kinder stolze Vater (head-final) or er ist stolz auf seine Kinder (head-initial) AP= adjective phrase, PP= prepositional phrase • Corpora based statistical analysis of head-directionality (Haitao Liu, 2010): a continuum between head-initial and head-final Matilde Marcolli Geometry, Physics, Linguistics Examples of Parameters Head-directionality Subject-side Pro-drop Null-subject Problems • Interdependencies between parameters • Diachronic changes of parameters in language evolution Matilde Marcolli Geometry, Physics, Linguistics Dependent parameters • null-subject parameter: can drop subject Example: among Latin languages, Italian and Spanish have null-subject (+), French does not (-) it rains, piove, llueve, il pleut • pro-drop parameter: can drop pronouns in sentences • Pro-drop controls Null-subject How many independent parameters? Geometry of the space of syntactic parameters? Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntax • Alexander Port, Iulia Gheorghita, Daniel Guth, John M.Clark, Crystal Liang, Shival Dasu, Matilde Marcolli, Persistent Topology of Syntax, arXiv:1507.05134 Databases of Syntactic Parameters of World Languages: 1 Syntactic Structures of World Languages (SSWL) http://sswl.railsplayground.net/ 2 TerraLing http://www.terraling.com/ 3 World Atlas of Language Structures (WALS) http://wals.info/ Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Data Sets how data cluster around topological shapes at di↵erent scales Matilde Marcolli Geometry, Physics, Linguistics Vietoris–Rips complexes • set X = {x↵} of points in Euclidean space EN, distance d(x, y) = kx yk = ( PN j=1(xj yj )2)1/2 • Vietoris-Rips complex R(X, ✏) of scale ✏ over field K: Rn(X, ✏) is K-vector space spanned by all unordered (n + 1)-tuples of points {x↵0 , x↵1 , . . . , x↵n } in X where all pairs have distances d(x↵i , x↵j )  ✏ Matilde Marcolli Geometry, Physics, Linguistics • inclusion maps R(X, ✏1) ,! R(X, ✏2) for ✏1 < ✏2 induce maps in homology by functoriality Hn(X, ✏1) ! Hn(X, ✏2) barcode diagrams: births and deaths of persistent generators Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Syntactic Parameters • Data: 252 languages from SSWL with 115 parameters • if consider all world languages together too much noise in the persistent topology: subdivide by language families • Principal Component Analysis: reduce dimensionality of data • compute Vietoris–Rips complex and barcode diagrams Persistent H0: clustering of data in components – language subfamilies Persistent H1: clustering of data along closed curves (circles) – linguistic meaning? Matilde Marcolli Geometry, Physics, Linguistics Sources of Persistent H1 • “Hopf bifurcation” type phenomenon • two di↵erent branches of a tree closing up in a loop two di↵erent types of phenomena of historical linguistic development within a language family Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Indo-European Languages • Two persistent generators of H0 (Indo-Iranian, European) • One persistent generator of H1 Matilde Marcolli Geometry, Physics, Linguistics Persistent Topology of Niger–Congo Languages • Three persistent components of H0 (Mande, Atlantic-Congo, Kordofanian) • No persistent H1 Matilde Marcolli Geometry, Physics, Linguistics The origin of persistent H1 of Indo-European Languages? Naive guess: the Anglo-Norman bridge ... but lexical not syntactic Matilde Marcolli Geometry, Physics, Linguistics Answer: No, it is not the Anglo-Norman bridge! Persistent topology of the Germanic+Latin languages Matilde Marcolli Geometry, Physics, Linguistics Answer: It’s all because of Ancient Greek! Persistent topology with Hellenic (and Indo-Iranic) branch removed Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters as Dynamical Variables • Example: Word Order: SOV, SVO, VSO, VOS, OVS, OSV Very uneven distribution across world languages Matilde Marcolli Geometry, Physics, Linguistics • Word order distribution: a neuroscience explanation? - D. Kemmerer, The cross-linguistic prevalence of SOV and SVO word orders reflects the sequential and hierarchical representation of action in Broca’s area, Language and Linguistics Compass, 6 (2012) N.1, 50–66. • Internal reasons for diachronic switch? - F.Antinucci, A.Duranti, L.Gebert, Relative clause structure, relative clause perception, and the change from SOV to SVO, Cognition, Vol.7 (1979) N.2 145–176. Matilde Marcolli Geometry, Physics, Linguistics Changes over time in Word Order • Ancient Greek: switched from Homeric to Classical - A. Taylor, The change from SOV to SVO in Ancient Greek, Language Variation and Change, 6 (1994) 1–37 • Sanskrit: di↵erent word orders allowed, but prevalent one in Vedic Sanskrit is SOV (switched at least twice by influence of Dravidian languages) - F.J. Staal, Word Order in Sanskrit and Universal Grammar, Springer, 1967 • English: switched from Old English (transitional between SOV and SVO) to Middle English (SVO) - J. McLaughlin, Old English Syntax: a handbook, Walter de Gruyter, 1983. Syntactic Parameters are Dynamical in Language Evolution Matilde Marcolli Geometry, Physics, Linguistics Spin Glass Models of Syntax • Karthik Siva, Jim Tao, Matilde Marcolli, Spin Glass Models of Syntax and Language Evolution, arXiv:1508.00504 – focus on linguistic change caused by language interactions – think of syntactic parameters as spin variables – spin interaction tends to align (ferromagnet) – strength of interaction proportional to bilingualism (MediaLab) – role of temperature parameter: probabilistic interpretation of parameters – not all parameters are independent: entailment relations – Metropolis–Hastings algorithm: simulate evolution Matilde Marcolli Geometry, Physics, Linguistics The Ising Model of spin systems on a graph G • configurations of spins s : V (G) ! {±1} • magnetic field B and correlation strength J: Hamiltonian H(s) = J X e2E(G):@(e)={v,v0} sv sv0 B X v2V (G) sv • first term measures degree of alignment of nearby spins • second term measures alignment of spins with direction of magnetic field Matilde Marcolli Geometry, Physics, Linguistics Equilibrium Probability Distribution • Partition Function ZG ( ) ZG ( ) = X s:V (G)!{±1} exp( H(s)) • Probability distribution on the configuration space: Gibbs measure PG, (s) = e H(s) ZG ( ) • low energy states weight most • at low temperature (large ): ground state dominates; at higher temperature ( small) higher energy states also contribute Matilde Marcolli Geometry, Physics, Linguistics Average Spin Magnetization MG ( ) = 1 #V (G) X s:V (G)!{±1} X v2V (G) sv P(s) • Free energy FG ( , B) = log ZG ( , B) MG ( ) = 1 #V (G) 1 ✓ @FG ( , B) @B ◆ |B=0 Ising Model on a 2-dimensional lattice • 9 critical temperature T = Tc where phase transition occurs • for T > Tc equilibrium state has m(T) = 0 (computed with respect to the equilibrium Gibbs measure PG, • demagnetization: on average as many up as down spins • for T < Tc have m(T) > 0: spontaneous magnetization Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters and Ising/Potts Models • characterize set of n = 2N languages Li by binary strings of N syntactic parameters (Ising model) • or by ternary strings (Potts model) if take values ±1 for parameters that are set and 0 for parameters that are not defined in a certain language • a system of n interacting languages = graph G with n = #V (G) • languages Li = vertices of the graph (e.g. language that occupies a certain geographic area) • languages that have interaction with each other = edges E(G) (geographical proximity, or high volume of exchange for other reasons) Matilde Marcolli Geometry, Physics, Linguistics graph of language interaction (detail) from Global Language Network of MIT MediaLab, with interaction strengths Je on edges based on number of book translations (or Wikipedia edits) Matilde Marcolli Geometry, Physics, Linguistics • if only one syntactic parameter, would have an Ising model on the graph G: configurations s : V (G) ! {±1} set the parameter at all the locations on the graph • variable interaction energies along edges (some pairs of languages interact more than others) • magnetic field B and correlation strength J: Hamiltonian H(s) = X e2E(G):@(e)={v,v0} NX i=1 Je sv,i sv0,i • if N parameters, configurations s = (s1, . . . , sN) : V (G) ! {±1}N • if all N parameters are independent, then it would be like having N non-interacting copies of a Ising model on the same graph G (or N independent choices of an initial state in an Ising model on G) Matilde Marcolli Geometry, Physics, Linguistics Metropolis–Hastings • detailed balance condition P(s)P(s ! s0) = P(s0)P(s0 ! s) for probabilities of transitioning between states (Markov process) • transition probabilities P(s ! s0) = ⇡A(s ! s0) · ⇡(s ! s0) with ⇡(s ! s0) conditional probability of proposing state s0 given state s and ⇡A(s ! s0) conditional probability of accepting it • Metropolis–Hastings choice of acceptance distribution (Gibbs) ⇡A(s ! s0 ) = ⇢ 1 if H(s0) H(s)  0 exp( (H(s0) H(s))) if H(s0) H(s) > 0. satisfying detailed balance • selection probabilities ⇡(s ! s0) single-spin-flip dynamics • ergodicity of Markov process ) unique stationary distribution Matilde Marcolli Geometry, Physics, Linguistics Example: Single parameter dynamics Subject-Verb parameter Initial configuration: most languages in SSWL have +1 for Subject-Verb; use interaction energies from MediaLab data Matilde Marcolli Geometry, Physics, Linguistics Equilibrium: low temperature all aligned to +1; high temperature: Temperature: fluctuations in bilingual users between di↵erent structures (“code-switching” in Linguistics) Matilde Marcolli Geometry, Physics, Linguistics Entailment relations among parameters • Example: {p1, p2} = {Strong Deixis, Strong Anaphoricity} p1 p2 `1 +1 +1 `2 1 0 `3 +1 +1 `4 +1 1 {`1, `2, `3, `4} = {English, Welsh, Russian, Bulgarian} Matilde Marcolli Geometry, Physics, Linguistics Modeling Entailment • variables: S`,p1 = exp(⇡iX`,p1 ) 2 {±1}, S`,p2 2 {±1, 0} and Y`,p2 = |S`,p2 | 2 {0, 1} • Hamiltonian H = HE + HV HE = Hp1 + Hp2 = X `,`02languages J``0 ⇣ S`,p1 ,S`0,p1 + S`,p2 ,S`0,p2 ⌘ HV = X ` HV ,` = X ` J` X`,p1 ,Y`,p2 J` > 0 anti-ferromagnetic • two parameters: temperature as before and coupling energy of entailment • if freeze p1 and evolution for p2: Potts model with external magnetic field Matilde Marcolli Geometry, Physics, Linguistics Acceptance probabilities ⇡A(s ! s ± 1 (mod 3)) = ⇢ 1 if H  0 exp( H) if H > 0. H := min{H(s + 1 (mod 3)), H(s 1 (mod 3))} H(s) Equilibrium configuration (p1, p2) HT/HE HT/LE LT/HE LT/LE `1 (+1, 0) (+1, 1) (+1, +1) (+1, 1) `2 (+1, 1) ( 1, 1) (+1, +1) (+1, 1) `3 ( 1, 0) ( 1, +1) (+1, +1) ( 1, 0) `4 (+1, +1) ( 1, 1) (+1, +1) ( 1, 0) Matilde Marcolli Geometry, Physics, Linguistics Average value of spin p1 left and p2 right in low entailment energy case Matilde Marcolli Geometry, Physics, Linguistics Syntactic Parameters in Kanerva Networks • Jeong Joon Park, Ronnel Boettcher, Andrew Zhao, Alex Mun, Kevin Yuh, Vibhor Kumar, Matilde Marcolli, Prevalence and recoverability of syntactic parameters in sparse distributed memories, arXiv:1510.06342 – Address two issues: relative prevalence of di↵erent syntactic parameters and “degree of recoverability” (as sign of underlying relations between parameters) – If corrupt information about one parameter in data of group of languages can recover it from the data of the other parameters? – Answer: di↵erent parameters have di↵erent degrees of recoverability – Used 21 parameters and 165 languages from SSWL database Matilde Marcolli Geometry, Physics, Linguistics Kanerva networks (sparse distributed memories) • P. Kanerva, Sparse Distributed Memory, MIT Press, 1988. • field F2 = {0, 1}, vector space FN 2 large N • uniform random sample of 2k hard locations with 2k << 2N • median Hamming distance between hard locations • Hamming spheres of radius slightly larger than median value (access sphere) • writing to network: storing datum X 2 FN 2 , each hard location in access sphere of X gets i-th coordinate (initialized at zero) incremented depending on i-th entry ot X • reading at a location: i-th entry determined by majority rule of i-th entries of all stored data in hard locations within access sphere Kanerva networks are good at reconstructing corrupted data Matilde Marcolli Geometry, Physics, Linguistics Procedure • 165 data points (languages) stored in a Kanerva Network in F21 2 (choice of 21 parameters) • corrupting one parameter at a time: analyze recoverability • language bit-string with a single corrupted bit used as read location and resulting bit string compared to original bit-string (Hamming distance) • resulting average Hamming distance used as score of recoverability (lowest = most easily recoverable parameter) Matilde Marcolli Geometry, Physics, Linguistics Parameters and frequencies 01 Subject-Verb (0.64957267) 02 Verb-Subject (0.31623933) 03 Verb-Object (0.61538464) 04 Object-Verb (0.32478634) 05 Subject-Verb-Object (0.56837606) 06 Subject-Object-Verb (0.30769232) 07 Verb-Subject-Object (0.1923077) 08 Verb-Object-Subject (0.15811966) 09 Object-Subject-Verb (0.12393162) 10 Object-Verb-Subject (0.10683761) 11 Adposition-Noun-Phrase (0.58974361) 12 Noun-Phrase-Adposition (0.2905983) 13 Adjective-Noun (0.41025642) 14 Noun-Adjective (0.52564102) 15 Numeral-Noun (0.48290598) 16 Noun-Numeral (0.38034189) 17 Demonstrative-Noun (0.47435898) 18 Noun-Demonstrative (0.38461539) 19 Possessor-Noun (0.38034189) 20 Noun-Possessor (0.49145299) A01 Attributive-Adjective-Agreement (0.46581197) Matilde Marcolli Geometry, Physics, Linguistics Matilde Marcolli Geometry, Physics, Linguistics Overall e↵ect related to relative prevalence of a parameter Matilde Marcolli Geometry, Physics, Linguistics More refined e↵ect after normalizing for prelavence (syntactic dependencies) Matilde Marcolli Geometry, Physics, Linguistics • Overall e↵ect relating recoverability in a Kanerva Network to prevalence of a certain parameter among languages (depends only on frequencies: see in random data with assigned frequencies) • Additional e↵ects (that deviate from random case) which detect possible dependencies among syntactic parameters: increased recoverability beyond what e↵ect based on frequency • Possible neuroscience implications? Kanerva Networks as models of human memory (parameter prevalence linked to neuroscience models) • More refined data if divided by language families? Matilde Marcolli Geometry, Physics, Linguistics Phylogenetic Linguistics (WORK IN PROGRESS) • Constructing family trees for languages (sometimes possibly graphs with loops) • Main information about subgrouping: shared innovation a specific change with respect to other languages in the family that only happens in a certain subset of languages - Example: among Mayan languages: Huastecan branch characterized by initial w becoming voiceless before a vowel and ts becoming t, q becoming k, ... Quichean branch by velar nasal becoming velar fricative, ´c becoming ˇc (prepalatal a↵ricate to palato-alveolar)... Known result by traditional Historical Linguistics methods: Matilde Marcolli Geometry, Physics, Linguistics Mayan Language Tree Matilde Marcolli Geometry, Physics, Linguistics Computational Methods for Phylogenetic Linguistics • Peter Foster, Colin Renfrew, Phylogenetic methods and the prehistory of languages, McDonald Institute Monographs, 2006 • Several computational methods for constructing phylogenetic trees available from mathematical and computational biology • Phylogeny Programs http://evolution.genetics.washington.edu/phylip/software.html • Standardized lexical databases: Swadesh list (100 words, or 207 words) Matilde Marcolli Geometry, Physics, Linguistics • Use Swadesh lists of languages in a given family to look for cognates: - without additional etymological information (keep false positives) - with additional etymological information (remove false positives) • Two further choices about loan words: - remove loan words - keep loan words • Keeping loan words produces graphs that are not trees • Without loan words it should produce trees, but small loops still appear due to ambiguities (di↵erent possible trees matching same data) ... more precisely: coding of lexical data ... Matilde Marcolli Geometry, Physics, Linguistics Coding of lexical data • After compiling lists of cognate words for pairs of languages within a given family (with/without lexical information and loan words) • Produce a binary string S(L1, L2) = (s1, . . . , sN) for each pair of languages L1, L2, with entry 0 or 1 at the i-th word of the lexical list of N words if cognates for that meaning exist in the two languages or not (important to pay attention to synonyms) • lexical Hamming distance between two languages d(L1, L2) = #{i 2 {1, . . . , N} | si = 1} counts words in the list that do not have cognates in L1 and L2 Matilde Marcolli Geometry, Physics, Linguistics Distance-matrix method of phylogenetic inference • after producing a measure of “genetic distance” Hamming metric dH(La, Lb) • hierarchical data clustering: collecting objects in clusters according to their distance • simplest method of tree construction: neighbor joining (1) - create a (leaf) vertex for each index a (ranging over languages in given family) (2) - given distance matrix D = (Dab) distances between each pair Dab = dH(La, Lb) construct a new matrix Q-test Q = (Qab) with Qab = (n 2)Dab nX k=1 Dak nX k=1 Dbk this matrix Q decides first pairs of vertices to join Matilde Marcolli Geometry, Physics, Linguistics (3) - identify entries Qab with lowest values: join each such pair (a, b) of leaf vertices to a newly created vertex vab (4) - set distances to new vertex by d(a, vab) = 1 2 Dab + 1 2(n 2) nX k=1 Dak nX k=1 Dbk ! d(b, vab) = Dab d(a, vab) d(k, vab) = 1 2 (Dak + Dbk Dab) (5) - remove a and b and keep vab and all the remaining vertices and the new distances, compute new Q matrix and repeat until tree is completed Matilde Marcolli Geometry, Physics, Linguistics Neighborhood-Joining Method for Phylogenetic Inference Matilde Marcolli Geometry, Physics, Linguistics Example of a neighbor-joining lexical linguistic phylogenetic tree from Delmestri-Cristianini’s paper Matilde Marcolli Geometry, Physics, Linguistics N. Saitou, M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees, Mol Biol Evol. Vol.4 (1987) N. 4, 406-425. R. Mihaescu, D. Levy, L. Pachter, Why neighbor-joining works, arXiv:cs/0602041v3 A. Delmestri, N. Cristianini, Linguistic Phylogenetic Inference by PAM-like Matrices, Journal of Quantitative Linguistics, Vol.19 (2012) N.2, 95-120. F. Petroni, M. Serva, Language distance and tree reconstruction, J. Stat. Mech. (2008) P08012 Matilde Marcolli Geometry, Physics, Linguistics Syntactic Phylogenetic Trees (instead of lexical) • instead of coding lexical data based on cognate words, use binary variables of syntactic parameters • Hamming distance between binary string of parameter values • shown recently that one gets an accurate reconstruction of the phylogenetic tree of Indo-European languages from syntactic parameters only • G. Longobardi, C. Guardiano, G. Silvestri, A. Boattini, A. Ceolin, Towards a syntactic phylogeny of modern Indo-European languages, Journal of Historical Linguistics 3 (2013) N.1, 122–152. • G. Longobardi, C. Guardiano, Evidence for syntax as a signal of historical relatedness, Lingua 119 (2009) 1679–1706. Matilde Marcolli Geometry, Physics, Linguistics Work in Progress • Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli, Syntactic Phylogenetic Trees, in preparation ...coming soon to an arXiv near you – Assembled a phylogenetic tree of world languages using the SSWL database of syntactic parameters – Ongoing comparison with specific historical linguistic reconstruction of phylogenetic trees – Comparison with Computational Linguistic reconstructions based on lexical data (Swadesh lists) and on phonetical analysis – not all linguistic families have syntactic parameters mapped with same level of completeness... di↵erent levels of accuracy in reconstruction Matilde Marcolli Geometry, Physics, Linguistics