New model search for nonlinear recursive models, regressions and autoregressions

28/10/2015
Publication GSI2015
OAI : oai:www.see.asso.fr:11784:14296

Résumé

Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodness-of-fit-testing in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding information-geometric 3D computer-graphical selection procedure. Some sample-size asymptotics is given as well.

New model search for nonlinear recursive models, regressions and autoregressions

Collection

application/pdf New model search for nonlinear recursive models, regressions and autoregressions Wolfgang Stummer, Anna-Lena Kißlinger

Média

Voir la vidéo

Métriques

110
8
2.27 Mo
 application/pdf
bitcache://42dbaf804cdc342d32d8dd6ca7d1ad31b4905175

Licence

Creative Commons Attribution-ShareAlike 4.0 International

Sponsors

Organisateurs

logo_see.gif
logocampusparissaclay.png

Sponsors

entropy1-01.png
springer-logo.png
lncs_logo.png
Séminaire Léon Brillouin Logo
logothales.jpg
smai.png
logo_cnrs_2.jpg
gdr-isis.png
logo_gdr-mia.png
logo_x.jpeg
logo-lix.png
logorioniledefrance.jpg
isc-pif_logo.png
logo_telecom_paristech.png
csdcunitwinlogo.jpg
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/11784/14296</identifier><creators><creator><creatorName>Wolfgang Stummer</creatorName></creator><creator><creatorName>Anna-Lena Kißlinger</creatorName></creator></creators><titles>
            <title>New model search for nonlinear recursive models, regressions and autoregressions</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2015</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>Scaled Bregman distances</subject><subject>Model selection</subject><subject>Nonlinear regression</subject><subject>AR SARIMA NARX</subject><subject>Autorecursions</subject><subject>3D score surface</subject></subjects><dates>
	    <date dateType="Created">Sun 8 Nov 2015</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Mon 15 Oct 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">42dbaf804cdc342d32d8dd6ca7d1ad31b4905175</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>24686</version>
        <descriptions>
            <description descriptionType="Abstract">
Scaled Bregman distances SBD have turned out to be useful tools for simultaneous estimation and goodness-of-fit-testing in parametric models of random data (streams, clouds). We show how SBD can additionally be used for model preselection (structure detection), i.e. for finding appropriate candidates of model (sub)classes in order to support a desired decision under uncertainty. For this, we exemplarily concentrate on the context of nonlinear recursive models with additional exogenous inputs; as special cases we include nonlinear regressions, linear autoregressive models (e.g. AR, ARIMA, SARIMA time series), and nonlinear autoregressive models with exogenous inputs (NARX). In particular, we outline a corresponding information-geometric 3D computer-graphical selection procedure. Some sample-size asymptotics is given as well.

</description>
        </descriptions>
    </resource>
.

New model search for nonlinear recursive models, regressions and autoregressions Wolfgang Stummer and Anna-Lena Kißlinger FAU University of Erlangen-Nürnberg Talk at GSI 2015, Palaiseau, 29/10/2015 Outline Outline • introduce a new method for model search (model preselection, structure detection) in data streams/clouds: key technical tool: density-based probability distances/divergences with “scaling” • gives much flexibility for interdisciplinary situation-based applications (also with cost functions, utility, etc.) • goal-specific handling of outliers and inliers (dampening, amplification) not directly covered today • give new general parameter-free asymptotic distributions for involved data-derived distances/divergences • outline a corresponding information-geometric 3D computer-graphical selection procedure 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 3 WHY distances between (non-)probability measures (1) • “distances” D(P, Q) between two (non-)probability measures P, Q play a prominent role in modern statistical inferences: • parameter estimation, • testing for goodness-of-fit resp. homogenity resp. independence, • clustering, • change-point detection, • Bayesian decision procedures as well as for other research fields such as • information theory, • signal processing including image and speech processing, • pattern recognition, • feature extraction, • machine learning, • econometrics, and • statistical physics. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 4 WHY distances between (non-)probability measures (2) • suppose we want to describe the proximity/distance/closeness/similarity D(P, Q) of two (non-)probability distributions P and Q • either two “theoretical” distributions e.g. P = N(µ1, σ2 1), Q = N(µ2, σ2 2) • or two (empirical) distributions representing data (e.g. derived from frequencies, histograms, . . . ) • or one of each −→ today • P, Q may live on Rd , or on “spaces of functions with appropriate properties”: e.g. potential future scenarios of a time series, or a cont.-time stochastic process e.g. functional data • exemplary statistical uses of distances D(P, Q) −→ 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 5 WHY distances between probability measures (3) Applic. 1: plane = all probability distributions (on R, Rd , a path space, . . . ) we have a “distance” on this, say D(P, Q) e.g. P := P orig N := Pemp N := 1 N · N i=1 δXi [·] . . . empirical distribution of an iid sample X1, . . . , XN of size N from Qθtrue ; puts equal “weight” 1 N on each data point. θ = minimum distance estimator (e.g. θ = MLE for D(Pemp N , Qθ) = Kullback-Leib.) however, D(Pemp N , Qθ) may still be large −→ “bad goodness of fit” −→ test 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 6 Time Series and Nonlinear Regressions (1) in time series, the data (describing random var.) . . . , X1, X2, . . . are non-iid: e.g. autoregressive model AR(2) of order 2: Xm+1 − ψ1 · Xm − ψ2 · Xm−1 = εm+1, m ≥ k, where (εm+1)m≥k is a family of independent and identically distributed (i.i.d.) random variables on some space Y having parametric distribution Qθ (θ ∈ Θ). compact notation: take the parameter vector £ := (2, ψ1, ψ2), the backshift operator B defined by B Xm := Xm−1, the 2−polynomial ψ1 · B + ψ2 · B2 , the identity operator 1 given by 1Xm := Xm −→ left-hand side becomes F£ Xm+1, Xm, Xm−1, . . . , Xk = 1 − 2 j=1 ψjBj Xm+1 −→ as data-derived distribution we take the empirical distribution of left-hand side P orig N,£ [ · ] := P[ · ; Xk−1, . . . , Xk+N; £] := 1 N · N i=1 δ F£ Xk+i,Xk+i−1,...,Xk [·] with histogram-according probability mass function (relative frequencies) p £ N(y) = # i ∈ {1, . . . , N} : F£ Xk+i, . . . , Xk = y N = # i : Xk+i − γ1 · Xk+i−1 − γ2 · Xk+i−2 = y N 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 7 Time Series and Nonlinear Regressions (2) −→ 2 issues: which time series models Xi and which distances D(·, ·) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 8 Time Series and Nonlinear Regressions (3) more general: nonlinear autorecursions in the sense of F£m+1 m+1, Xm+1, Xm, Xm−1, . . . , Xk, Zk−, am+1, am, am−1, . . . , ak = εm+1, m ≥ k, • where (F£m+1 )m≥k is a sequence of nonlinear functions parametrized by £m+1 ∈ Γ, • (εm+1)m≥k are iid with parametric distribution Qθ (θ ∈ Θ), • (ak)m≥k are independent variables which are non-stochastic (deterministic) today, • the “backlog-input” Zk− denotes the additional input on X and a before k to get the recursion started. today, we assume k = −∞, and EQθ [εm+1] = 0, and that the initial data Xk as well as the backlog-input Zk− are deterministic. Special case: Xm+1 = g f£m+1 (m+1, Xm, Xm−1, . . . , Xk, Zk−, am+1, am, am−1, . . . , ak), εm+1 for some appropriate functions f£m+1 and g, e.g. g(u, v) := u + v, g(u, v) := u · v −→ (εm+1)m≥k can be interpreted as “randomness-driving innovations (noise)” 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 9 Time Series and Nonlinear Regressions (4) our general context covers in particular • NARX models = nonlinear autoregressive models with exogenous input: is the above special case with constant parameter vector £m+1 ≡ £ and additive g. Especially: • nonlinear regressions with deterministic independent variables: the only involved X is Xm+1 • AR(r) = linear autoregressive models (time series) of order r ∈ N (recall the above example with r = 2) • ARIMA(r,d,0) = linear autoregressive integrated models (time series) of order r ∈ N0 and d ∈ N0 • SARIMA(r,d,0)(R,D,0)s = linear seasonal autoregressive integrated models (time series) of order d ∈ N0 of non-seasonal differencing, order r ∈ N0 of the non-seasonal AR-part, length s ∈ N0 of a season, order D ∈ N0 of seasonal differencing and order R ∈ N0 of the seasonal AR-part. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 10 Divergences / similarity measures (1) • so far: motiviations for “WHY to measure the proximity/distance/closeness/similarity D(P, Q)” here: P = P orig N,£ [ · ] (= empirical distribution of iid noises) Q = Qθ ( = candidate for true distribution of iid noises) • now: “HOW to measure”, which “distance” D(P, Q) to use ? • prominent examples for D(P, Q): relative entropy (Kullback-Leibler information discrimination) –> MDE = MLE !!, Hellinger distance, Pearson’s Chi-Square divergence, Csiszar’s f−divergences ... −→ all will be covered by our much more general context • DESIRE: to have a toolbox {Dφ,M(P, Q) : φ ∈ Φ, M ∈ M} which is far-reaching and flexible (reflected by different choices of the “generator” φ and the scaling measure M) should also cover robustness issues !! 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 11 Divergences / similarity measures (2) • from now on: probability distributions P, Q on (X, A) non-probability distribution/(σ−)finite measure M on (X, A) we assume that all three of them have densities w.r.t. a σ−finite measure λ p(x) = dP dλ (x), q(x) = dQ dλ (x) and m(x) = dM dλ (x) for a. all x ∈ X (for today: mostly X ⊂ R) • furthermore we take a “divergence (distance) generating function” φ : (0, ∞) → R which (for today) is twice differentiable, strictly convex without loss of generality we also assume φ(1) = 0 the limit φ(0) := limt↓0 φ(t) always exists (but may be ∞) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 12 Scaled Bregman Divergences (1) Definition (Stu. 07, extended in Stu. & Vajda 2012 IEEE Trans. Inf. Th.) The Bregman divergence (distance) of probability distributions P, Q scaled by the (σ−)finite measure M on (X, A) is defined by Bφ (P, Q | M) := X m(x) φ p(x) m(x) − φ q(x) m(x) − φ q(x) m(x) · p(x) m(x) − q(x) m(x) dλ(x) • if X = {x1, x2, . . . xs} where s may be infinite, and “λ is a counting measure” −→ p(·), q(·), m(·) are classical probability mass functions (“counting densities”): Bφ (P, Q | M) = s i=1 m(xi) φ p(xi) m(xi) − φ q(xi) m(xi) − φ q(xi) m(xi) · p(xi) m(xi) − q(xi) m(xi) e.g. φ(t) = (t − 1)2 −→ Bφ (P, Q | M) = s i=1 (p(xi)−q(xi))2 m(xi) weighted Pearson χ2 Ex.: P := Pemp N := 1 N · N i=1 δεi [·] . . . empirical distribution of an iid sample of size N from Qθtrue ; corresponding pmf = relative frequency p(x) := pemp N (x) := 1 N · #{j ∈ {1, . . . , N} : εj = x}; Q := Qθ where the “hypothetical candidate distribution” Qθ has pmf q(x) := qθ(x) M := W(Pemp N , Qθ) with pmf m(x) = w(pemp N (x), qθ(x)) > 0 for some funct. w(·, ·) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 13 discrete case with φ(t) = φα(t) and m(x) = wβ(p(x), q(x)) 3D presentation; exemplary goal: ≈ 0 for all α, β 10 3D presentation; exemplary goal: ≈ 0 for all α, β 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 14 Bφ (P, Q | M) with composite scalings M = W(P, Q) (1) • from now on: M = W(P, Q), i.e. m(x) = w(p(x), q(x)) for some function w(·, ·) • w(u, v) = 1 −→ unscaled/classical Bregman distance (discr.: Pardo/Vajda 97,03) e.g. for generator φ1(t) = t log t + 1 − t −→ Kullback-Leibler divergence (MLE) e.g. for the power functions φα(t) := tα−1+α−α·t α(α−1) , α = 0, 1, −→ density power divergences of Basu et al. 98, Basu et al. 2013/14/15 • new example (Kißlinger/Stu. (2015c): scaling by weighted r-th-power means: wβ,r (u, v) := (β · ur + (1 − β) · vr )1/r , β ∈ [0, 1], r ∈ R\{0} • e.g. r = 1: arithmetic-mean-scaling (mixture scaling) subcase β = 0: w0,1(u, v) = v −→ all Csiszar φ−divergences/disparities for φ2(t) one gets Pearson’s chi-square divergence subcase β = 1 and φ2(t) −→ Neyman’s chi-square divergence subcase β ∈ [0, 1] and φ2(t) −→ blended weight chi-square divergence, Lindsay 94 subcase β ∈ [0, 1] and φα(t) −→ Stu./Vajda (2012), Kißlinger/Stu. (2013, 2015a) • e.g. r = 1/2: wβ,1/2(u, v) = (β · √ u + (1 − β) · √ v)2 subcase β ∈ [0, 1] and φ2(t) −→ blended weight Hellinger distance: Lindsay (1994), Basu/Lindsay (1994) • e.g. r → 0: geometric-mean scaling wβ,0(u, v) = uβ · v1−β Kißlinger/Stu. (2015b) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 15 Some scale connectors w(u, v) (for any generator φ) (1) (a) w0,1(u, v) = v Csiszar diverg. (b) w0.45,1(u, v) = 0.45 · u + 0.55 · v . (c) w0.45,0.5(u, v) = (0.45 √ u + 0.55 √ v)2 (d) w0.45,0(u, v) = u0.45 · v0.55 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 16 Scale connectors w(u, v), NOT r−th power means (e) WEXPM: w0.45,˜f6 (u, v) = 1 6 log 0.45e6u + 0.55e6v (g) wmed 0.45 (u, v) = med{min{u, v}, 0.45, max{u, v}} . (j) wsmooth adj (u, v) with hin = −0.5 , hout = 0.3, δ = 10−7 , etc. (k) Parameter description for wadj (u, v) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 17 Robustness to obtain the robustness against outliers and inliers (i.e. high unusualnesses in data, surprising observations), as well as the (asymptotic) efficiency of our procedure is a question of a good choice of the scale connector w(·, ·) −→ another long paper Kiss. and Stu. 2015b −→ another talk we end up with a new transparent, far-reaching 3D computer-graphical “geometric” method called density-pair adjustment function this is vaguely a similar task to choosing a good copula in (inter-)dependence-modelling frameworks 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 18 Universal model search UMSPD (1) recall: which time series model Xi and which distance D(·, ·) now: model search in detail; basic idea (for finite discrete distributions): under the correct (“true”) model F£0 m+1 , Qθ0 we get that the sequence Fγ0 k+i (k + i, Xk+i, Xk+i−1, ..., Xk, Zk−, ak+i, ..., ak) i=1...N behaves like a size-N-sample from an iid sequence under the distribution Qθ0 , i.e. P£0 N [·] := 1 N N i=1 δF£0 k+i (k+i,Xk+i,Xk+i−1,...,Xk ,Zk−,ak+i,...,ak )[·] N→∞ −−−→ Qθ0 [·] and thus Dα,β P£0 N , Qθ0 N→∞ −−−→ 0 for a very broad family D := Dα,β(·, ·) : α ∈ [α, α] , β ∈ β, β of distances, where we use the SBDs Dα,β(P£0 N , Qθ) := Bφα P£0 N , Qθ0 || Wβ(Pemp N , Qθ0 ) for a α−family of generators φα(·) (today: the above power functions) and a β−family of scale connectors Wβ(·, ·) (today: geometric-mean scaling wβ,0(u, v) = uβ · v1−β ) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 19 Universal model search UMSPD (2) We introduce the universal model-search by probability distance (UMSPD): 1. choose F£m+1 m≥k from a principal parametric-function-family class 2. choose some prefixed class of parametric candidate distributions {Qθ : θ ∈ Θ} 3. find a parameter sequence £ := (£m+1)m≥k (often constant) and a θ ∈ Θ such that Dα,β P£ N, Qθ ≈ 0 for large enough sample size N and all (α, β) ∈ [α, α] × β, β 4. preselect the model F£m+1 , Qθ if the “3D score surface” (the “mountains”) S := {(α, β, Dα,β(P£ N, Qθ)) : α ∈ [α, α] , β ∈ β, β } is smaller than some appropriatly chosen threshold T (namely, a chisquare-quantile, see below) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 20 Universal model search UMSPD (3) Graphical implementation by plotting the 3D preselection-score surface S 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 21 Universal model search UMSPD (4) ADVANTAGE OF UMSPD: after the preselection process one can continue to work with the same Dα,β(·, ·) in order to perform amongst all preselected candidate models a statistically sound inference in terms of simultaneous exact parameter-estimation and goodness-of-fit. one issue remains to be discussed for UMSPD: the choice of the threshold T 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 22 Universal model search UMSPD (5) exemplarily show how to quantify the above-mentioned preselection criterion “the 3D surface S should be smaller than a threshold T” by some sound asymptotic analysis for the above special choices φα(·) and wβ(·, ·) the cornerstone is the following limit theorem Theorem Let Qθ0 be a finite discrete distribution with c := |Y| ≥ 2 possible outcomes and strictly positive densities qθ0 (y) > 0 for all y ∈ Y. Then for each α > 0, α = 1 and each β ∈ [0, 1[ the random scaled Bregman power distance 2N · Bφα P£0 N , Qθ0 | (P£0 N )β · Q1−β θ0 =: 2N · B(α, β; £0, θ0; N) is asymptotically chi-squared distributed in the sense that 2N · B(α, β; £0, θ0; N) L −−−→ N→∞ χ2 c−1 . in terms of the corresponding χ2 c−1−quantiles, one can derive the threshold T which the 3D preselection-score surface S has to (partially) exceed in order to believe with appropriate level of confidence that the investigated model ((F£m+1 )m≥k, Qθ) is not good enough to be preselected. 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 23 Further Topics • can use scaled Bregman divergences for robust statistical inferences with “completely general asymptotic results” for other choices of φ(·) and w(·, ·) −→ Kißlinger & Stu. (2015b) • can use scaled Bregman divergences for change detection in data streams −→ Kißlinger & Stu. (2015c) • explicit formulae for Bφα(Pθ1 , Pθ2 |Pθ0 ) where Pθ1 , Pθ2 , Pθ0 stem from the same arbitrary exponential family, cf. Stu. & Vajda (2012), Kißlinger & Stu. (2013); including stochastic processes (Levy processes) • we can do Bayesian decision making with important processes • non-stationary stochastic differential equations • e.g. non-stationary branching processes −→ Kammerer & Stu. (2010) • e.g. inhomogeneous binomial diffusion approximations −→ Stu. & Lao (2012) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 24 Summary • introduced a new method for model search (model preselection, structure detection) in data streams/clouds: key technical tool: density-based probability distances/divergences with “scaling” • gives much flexibility for interdisciplinary situation-based applications (also with cost functions, utility, etc.) • gave a new parameter-free asymptotic distribution result for involved data-derived distances/divergences • outlined a corresponding information-geometric 3D computer-graphical selection procedure 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 25 Ali, M.S., Silvey, D.: A general class of coefficients of divergence of one distribution from another. J. Roy. Statist. Soc. B-28,131-140 (1966) Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85, 549–559 (1998) Basu, A., Shioya, H., Park, C.: Statistical Inference: The Minimum Distance Approach. CRC Press, Boca Raton (2011) Billings, S.A.: Nonlinear System Identification. Wiley, Chichester (2013) Csiszar, I.: Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizität von Markoffschen Ketten. Publ. Math. Inst. Hungar. Acad. Sci. A-8, 85–108 (1963) Kißlinger, A.-L., Stummer, W.: Some Decision Procedures Based on Scaled Bregman Distance Surfaces. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2013, LNCS 8085, pp. 479–486. Springer, Berlin (2013) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 26 Kißlinger, A.-L., Stummer, W.: New model search for nonlinear recursive models, regressions and autoregressions. In: F. Nielsen and F. Barbaresco (Eds.): GSI 2015, LNCS 9389, Springer, Berlin (2015a) Kißlinger, A.-L., Stummer, W.: Robust statistical engineering by means of scaled Bregman divergences. Preprint (2015b). Kißlinger, A.-L., Stummer, W.: A New Information-Geometric Method of Change Detection. Preprint (2015c). Liese, F., Vajda, I.: Convex Statistical Distances. Teubner, Leipzig (1987) Nock, R., Piro, P., Nielsen, F., Ali, W.B.H., Barlaud, M.: Boosting k−NN for categorization of natural sciences. Int J. Comput. Vis. 100, 294 – 314 (2012) Pardo, L.: Statistical Inference Based on Divergence Measures. Chapman & Hall, Boca Raton (2006) Pardo, M.C., Vajda, I.: On asymptotic properties of information-theoretic divergences. IEEE Transaction on Information Theory 49(7), 1860 – 1868 (2003) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 27 Read, T.R.C., Cressie, N.A.C.: Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer, New York (1988) Stummer, W.: Some Bregman distances between financial diffusion processes. Proc. Appl. Math. Mech. 7(1), 1050503 – 1050504 (2007) Stummer, W., Vajda, I.: On Bregman Distances and Divergences of Probability Measures. IEEE Transaction on Information Theory 58 (3), 1277–1288 (2012) 29/10/2015 | Wolfgang Stummer and Anna-Lena Kißlinger | GSI 2015 | 28