Constructing Universal, Non-Asymptotic Con dence Sets for Intrinsic Means on the Circle

07/11/2017
Publication GSI2017
OAI : oai:www.see.asso.fr:17410:22369
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit
 

Résumé

We construct con dence sets for the set of intrinsic means on the circle based on i.i.d. data which guarantee coverage of the entire latter set for nite sample sizes without any further distributional assumptions.
Simulations demonstrate its applicability even when there are multiple intrinsic means.

Constructing Universal, Non-Asymptotic Condence Sets for Intrinsic Means on the Circle

Collection

application/pdf Constructing Universal, Non-Asymptotic Con dence Sets for Intrinsic Means on the Circle Matthias Glock, Thomas Hotz
Détails de l'article
contenu protégé  Document accessible sous conditions - vous devez vous connecter ou vous enregistrer pour accéder à ou acquérir ce document.
- Accès libre pour les ayants-droit

Constructing Universal, Non-Asymptotic Condence Sets for Intrinsic Means on the Circle

Média

Voir la vidéo

Métriques

0
0
280.59 Ko
 application/pdf
bitcache://d88d50825f9c2b41e5658d69469a1308499da67e

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors Platine

alanturinginstitutelogo.png
logothales.jpg

Sponsors Bronze

logo_enac-bleuok.jpg
imag150x185_couleur_rvb.jpg

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors

smai.png
logo_gdr-mia.png
gdr_geosto_logo.png
gdr-isis.png
logo-minesparistech.jpg
logo_x.jpeg
springer-logo.png
logo-psl.png

Organisateurs

logo_see.gif
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/17410/22369</identifier><creators><creator><creatorName>Thomas Hotz</creatorName></creator><creator><creatorName>Matthias Glock</creatorName></creator></creators><titles>
            <title>Constructing Universal, Non-Asymptotic Condence Sets for Intrinsic Means on the Circle</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2018</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><subjects><subject>circular data</subject><subject>intrinsic means</subject><subject>Frechet means</subject><subject>universal con- dence sets</subject></subjects><dates>
	    <date dateType="Created">Sun 18 Feb 2018</date>
	    <date dateType="Updated">Sun 18 Feb 2018</date>
            <date dateType="Submitted">Mon 15 Oct 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">d88d50825f9c2b41e5658d69469a1308499da67e</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>37025</version>
        <descriptions>
            <description descriptionType="Abstract">We construct con dence sets for the set of intrinsic means on the circle based on i.i.d. data which guarantee coverage of the entire latter set for nite sample sizes without any further distributional assumptions.<br />
Simulations demonstrate its applicability even when there are multiple intrinsic means.
</description>
        </descriptions>
    </resource>
.

Constructing Universal, Non-Asymptotic Confidence Sets for Intrinsic Means on the Circle Matthias Glock and Thomas Hotz Institut für Mathematik, Technische Universität Ilmenau, 98684 Ilmenau, Germany {matthias.glock,thomas.hotz}@tu-ilmenau.de Abstract. We construct confidence sets for the set of intrinsic means on the circle based on i.i.d. data which guarantee coverage of the entire latter set for finite sample sizes without any further distributional assumptions. Simulations demonstrate its applicability even when there are multiple intrinsic means. Keywords: circular data, intrinsic means, Fréchet means, universal con- fidence sets 1 Introduction We are concerned with circular statistics, i.e. with the analysis of data which take values on the unit circle S1 . Such data occur often in practice, e.g. as measurements of wind directions in meteorology or other data with a periodic interpretation like times of the day at which patients are admitted to some hospital unit. Good references for circular statistics which include many more examples are [2, 9, 11], amongst others. Here, we will focus on intrinsic means which are Fréchet means with respect to the intrinsic distance on the circle. To be specific, we will henceforth assume that X, X1, . . . , Xn (for some sample size n ∈ IN) are independent and iden- tically distributed random variables taking values on the (unit) circle S1 . For convenience, we will think of angular measurements and identify S1 with (−π, π], calculating modulo 2π whenever necessary, so that we can treat X, X1, . . . , Xn as real-valued. Of course, the circle is not a vector space so the population (or sample) mean cannot be defined through integration (or averaging). But, following [3] we observe that in a Euclidean space the mean is the unique minimiser of the expected (or summed) squared distances to the random point (or the data). Therefore, given a metric d on S1 , we accordingly define the set of Fréchet (population) means to be M = argmin µ∈S1 F(µ) where F : S1 → [0, ∞) is the Fréchet functional given by F(µ) = E d(X, µ)2 for µ ∈ S1 , i.e. M is the set of minimisers of F. There are two popular metrics being used on S1 : if one embeds S1 as the unit circle in C, {exp(ix) : x ∈ (−π, π]}, then the extrinsic (or chordal) distance is given by | exp(ix)−exp(iy)| for x, y ∈ (−π, π]. On the other hand, there is the intrinsic (or arc-length) distance d which is given by d(x, y) = min{|x−y+2πk| : k ∈ Z Z} for x, y ∈ (−π, π]. A comparison between Fréchet means on the circle with respect to these two metrics may be found in [5]. In this article, we are concerned with Fréchet means with respect to the latter, intrinsic distance which are called intrinsic means, and we aim to construct a confidence set C given X1, . . . , Xn which contains the set M of intrinsic means with probability at least 1 − α for any pre-specified α ∈ (0, 1). The analysis of intrinsic means on the circle is not trivial; the main reason for this is the fact that for any x ∈ S1 , the squared distance to that point, S1 3 µ 7→ d(x, µ)2 , is everywhere continuously differentiable except at the point x∗ “opposite” x which maximises the distance to x, i.e. at the cut-locus of x given by x∗ = x+π for x ∈ (−π, 0] and x∗ = x−π for x ∈ (0, π]. Consequently, F need not be everywhere differentiable. However, F is differentiable at any intrinsic mean m ∈ M with a vanishing derivative there, F0 (m) = E 2(X − m) = 0 (calculated modulo 2π), while its cut-locus m∗ carries probability measure P(X = m∗ ) = 0 [10], cf. also [6]. Since intrinsic means are defined as minimisers of the Fréchet functional F, given data X1, . . . , Xn it would be natural to consider the minimisers of the empirical Fréchet functional F̂n : S1 → [0, ∞) with F̂n(µ) = 1 n Pn i=1 d(Xi, µ)2 (1) as M-estimators, i.e. the so-called empirical Fréchet means. Since F̂n(µ) con- verges to F(µ) almost surely (a.s.) for every µ ∈ S1 , one might expect the empirical means to be close to the population means, and derive asymptotic confidence sets based on the asymptotic (for n → ∞) behaviour of the em- pirical means. In fact, one can prove the following result [6, 12]: if M = {0} (unique population mean), then any measurable selection of empirical Fréchet means µ̂n converges a.s. to 0, and if the distribution of X features a contin- uous Lebesgue density f in a neighbourhood of 0∗ = π with f(π) < 1 2π then µ̂n D → N  0, E X2 1−2πf(π) 2  while in case f(π) = 1 2π a central limit theorem with a slower rate might hold. In order to derive asymptotic confidence sets from this central limit theorem, one would need to ensure that M contains only a single point which imposes a restriction on the distribution of X, and that this distribution features a con- tinuous Lebesgue density smaller than 1 2π at the cut-locus of the intrinsic mean; both conditions would e.g. be fulfilled for distributions with a unimodal density [5]. Then, one could either somehow estimate the asymptotic variance consis- tently or use a bootstrap approach to obtain asymptotic confidence sets for the unique intrinsic (population) mean, cf. [1] where this has been developed for distributions having no mass at an entire neighbourhood of the cut-locus. 0 2π 3 −2π 3 (a) 0 (b) −2π 3 Fig. 1. (a) The distribution in Ex. 1 gives equal weight to the three (small, red) points which also constitute M (thick, white points); an example confidence set from Sect. 4 based on n = 10,000 points and 1 − α = 90% is also shown (thick, blue line) (b) The distribution in Ex. 2 comprises a point mass (small, red point) and a segment with uniform density (medium, red line) opposite such that M is an entire (thick, white) segment; an example confidence set from Sect. 4 based on n = 10,000 points and 1 − α = 90% is also shown (thick, blue line) This approach based on the asymptotic distribution of the empirical Fréchet mean has several drawbacks, one being that it guarantees only an approximate coverage probability of 1 − α for finite sample sizes where the quality of the approximation is usually unknown, and another one being that the assumptions justifying this approach are difficult to check in practice. In particular, judging whether they are fulfilled only by looking at an empirical Fréchet mean may be quite misleading as the following example shows. Example 1 (equilateral triangle) Let the distribution of X give equal weight to 3 points forming an equilateral triangle, i.e. set P X = −2 3 π  = P(X = 0) = P X = 2 3 π  = 1 3 , see Fig. 1(a). It is easy to see that these very points form the set of intrinsic means, i.e. M =  −2 3 π, 0, 2 3 π in this case, cf. [5]. For a large sample, however, the empirical measure will comprise 3 different weights with large probability, so that the empirical sample mean will be unique and close to one of the point masses, opposite of which there is no mass at all. Therefore, it will appear as if the assumptions for the central limit theorem are fulfilled though they are not. We do not know of any constructions of confidence sets for intrinsic means which are applicable if there is more than one, let alone an entire segment of intrinsic means as in the following example taken from [6, Example 1, case 0b]. Example 2 (point mass with uniform density at the cut locus) Let the distribution of X comprise a point mass at 0 with weight 0.6 as well as a Lebesgue continuous part with density 1 2π χ(−π,−0.6π]∪[0.6π,π] where χ denotes the char- acteristic function of the corresponding segment, see Fig. 1(b). A straightfor- ward calculation shows that the set of intrinsic means is then given by M = [−0.4π, 0.4π], see [6]. These examples ask for confidence sets which are both universal, i.e. they require no distributional assumptions beside the observations being i.i.d., and non-asymptotic, i.e. they guarantee coverage of M with probability at least 1−α for any finite sample size n ∈ IN. Such confidence sets have been constructed for extrinsic means, i.e. Fréchet means w.r.t. the extrinsic distance on the circle, using geometric considerations for that particular distance in [7, 8]. Our construction of such confidence sets for intrinsic means utilises mass concentration inequalities to control both the empirical Fréchet functional F̂n (Sect. 2) and its derivative (Sect. 3). We then provide simulation results for the two examples above (Sect. 4) before finally discussing the results obtained as well as further research (Sect. 5). 2 Controlling the Empirical Functional For our first step, recall that at every point µ ∈ S1 , the empirical Fréchet functional F̂n(µ) = 1 n Pn i=1 d(Xi, µ)2 will be close to the population Fréchet functional F(µ) = E d(X, µ)2 by the law of large numbers; the deviation may be quantified by a mass concentration inequality since S1 is compact, whence the squared distance is bounded. In fact, since S1 is compact and the squared distance is Lipschitz, it will suffice to bound the difference between F̂n and F at finitely many points on a regular grid (using the union bound) in order to estimate it uniformly on the entire circle; we then may conclude with large probability that points where F̂n is large cannot be intrinsic means. For this, we partition (−π, π] into J ∈ IN intervals of identical length, Ij,J = −π + (j − 1)2π J , −π + j 2π J  for j = 1, . . . , J whose closure is given by the closed balls with centers µj,J = −π + (2j − 1)π J and radius δJ = π J . In order to control the deviation of F̂n from F at each µj,J we employ Ho- effding’s inequality [4]: if U1, . . . , Un, n ∈ IN are independent random variables taking values in the bounded interval [a, b], −∞ < a < b < ∞, then P |Ūn − E Ūn| ≥ t  ≤ 2 exp − 2nt2 (b−a)2  (2) for any t ∈ [0, ∞) where Ūn = 1 n Pn i=1 Ui. Now fix some µ ∈ S1 . Then, since the maximal (intrinsic) distance of two points on the circle is π and E F̂n(µ) = F(µ), we obtain P |F̂n(µ) − F(µ)| ≥ t  ≤ 2 exp −2nt2 π4  for any t ∈ [0, ∞). Moreover, for any ν, x ∈ S1 |d(x, ν)2 − d(x, µ)2 | = d(x, ν) + d(x, µ)  |d(x, ν) − d(x, µ)| ≤ 2πd(ν, µ) by the bound on d and the reverse triangle inequality, so the mapping S1 3 ν 7→ d(x, ν)2 is Lipschitz with constant 2π for any fixed x ∈ S1 . This implies that F̂n and F are also Lipschitz with that very constant, so that F̂n − F is Lipschitz with constant 4π. Therefore, bounding |F̂n(µ) − F(µ)| for µ ∈ Ij,J by |F̂n(µj,J ) − F(µj,J )| + 4πδJ , the union bound gives P supµ∈S1 |F̂n(µ) − F(µ)| ≥ t + 4πδJ  ≤ 2J exp −2nt2 π4  . If we want the right hand side to equal β ∈ (0, 1), we have to choose t = q −π4 2n log β 2J  which leads to P  supµ∈S1 |F̂n(µ) − F(µ)| ≥ q −π4 2n log β 2J  + 4π2 J  ≤ β . Since by definition F(m) = infµ∈S1 F(µ) for any intrinsic mean m ∈ M, the triangle inequality gives sup m∈M F̂n(m) − inf µ∈S1 F̂n(µ) = sup m∈M F̂n(m) − F(m) + inf µ∈S1 F(µ) − inf µ∈S1 F̂n(µ) ≤ sup m∈M |F̂n(m) − F(m)| + sup µ∈S1 |F̂n(µ) − F(µ)| ≤ 2 sup µ∈S1 |F̂n(µ) − F(µ)| . Thus, choosing β = α 2 , we have obtained our first confidence set for the set M of intrinsic means: Proposition 1 Let C1 = n µ ∈ S1 : F̂n(µ) < inf ν∈S1 F̂n(ν) + ∆1 o (3) where the critical value ∆1 > 0 is given by ∆1 = π2 q − 2 n log α 4J  + 8 J  . Then we have P(C1 ⊇ M) ≥ 1 − α 2 . Note that J ∈ IN may be selected in advance by numerical optimisation such that ∆1 becomes minimal. Even in the most favourable situation, however, when P(X = 0) = 1, we a.s. have F(µ) = F̂n(µ) = µ2 for any µ ∈ (−π, π], so that C1 has Lebesgue measure of the order of log(n log n) n 1 4 for large n (for J of the order of √ n log n) which would give a somewhat slow rate of convergence. This is due to the fact that F itself behaves like a quadratic function at the minimum; this will be improved upon by considering the derivative F0 which behaves linearly at the minimum where it vanishes. Notwithstanding this problem, we observe that supµ∈S1 |F̂n(µ) − F(µ)| and ∆1 converge to zero in probability when n tends to infinity, which shows that C1 in a certain sense converges to M in probability, thus ensuring consistency of our approach. 3 Controlling the Derivative Recall that F is differentiable at any intrinsic mean while F̂n is differentiable except at points opposite observations which occur opposite an intrinsic mean only with probability 0 [6]. If the derivative of F̂n exists, i.e. for any µ ∈ S1 with µ∗ / ∈ {X1, . . . , Xn}, it is given by F̂0 n(µ) = 2 n Pn i=1[Xi − µ] (4) where representatives for the Xi, i = 1, . . . , n need to be chosen in IR such that [Xi − µ] ∈ (−π, π]. Otherwise, i.e. in case µ∗ ∈ {X1, . . . , Xn}, we simply define F̂0 n(µ) by (4). This is utilised as follows: partitioning (−π, π] into n disjoint intervals Ik,n, k = 1, . . . , n (using the notation from Sect. 2), let K = {k : Ik,n ∩ M 6= ∅} be the set of indices of intervals which contain an intrinsic mean and choose one intrinsic mean mk ∈ M for every k ∈ K whence M ⊆ ∪k∈KIk,n. Since 2[Xi −mk] in (4) takes values in [−2π, 2π], we can employ Hoeffding’s inequality (2) again to get P |F̂0 n(mk)| ≥ t  ≤ 2 exp − nt2 8π2  for any k ∈ K where we used E F̂0 n(mk) = 0. The union bound readily implies P ∃k ∈ K : infµ∈Ik,n∩M |F̂0 n(µ)| ≥ t  ≤ 2|K| exp − nt2 8π2  where |K| is the cardinality of K; choosing t > 0 such that the right hand side becomes α 2 then gives P  ∃k ∈ K : infµ∈Ik,n∩M |F̂0 n(µ)| ≥ q −8π2 n log α 4|K|  ≤ α 2 . (5) Unfortunately, K is not known in advance, but it can be estimated using the confidence set C1 for M constructed in Sect. 2: let K̂ = {k : Ik,n ∩ C1 6= ∅}. Then, whenever C1 ⊇ M we have K̂ ⊇ K, in particular |K̂| ≥ |K|, and thus M ⊆ C1 ∩ ∪k∈K̂Ik,n. So, setting C2 = ∪k∈K̂:infµ∈Ik,n∩C1 |F̂ 0 n(µ)|<∆2 Ik,n (6) with the critical value ∆2 = q −8π2 n log α 4|K̂|  —which will then be larger than the one in (5) based on K— allows to finally construct the desired confidence set: Proposition 2 Let X1, . . . , Xn, n ∈ IN be independent and identically dis- tributed random points on S1 , and let α ∈ (0, 1) be given. Then, the confidence set C = C1 ∩ C2 (7) based on the sets C1 and C2 constructed in (3) and (6) above, respectively, is a (1 − α)-confidence set for the set M of intrinsic means, i.e. it fulfils P(C ⊇ M) ≥ 1 − α. Table 1. average Lebesgue measure of C over 1,000 repetitions for different sample sizes (rounded, ± standard deviation) of Ex. 1 n avg. Lebesgue measure (± st. dev.) 102 6.25 (±0.07) 103 2.76 (±0.01) 104 0.94 (±0.00) 105 0.32 (±0.00) 106 0.11 (±0.00) 4 Simulations The construction of the confidence interval C in (7) has been implemented within the statistical software package R [13]. For illustration, we show results for the two examples introduced in Sec. 1. We simulated Ex. 1 for sample sizes n = 102 , 103 , 104 , 105 , 106 and 1 − α = 90%; for each simulation it was checked whether M was covered by C, and the Lebesgue measure of C was computed. This was independently repeated 1,000 times for each sample size. The result of one simulation for n = 104 is shown in Fig. 1(a). Averages (and standard deviations) of the Lebesgue measure of C computed over the repetitions for each sample size are reported in Tab. 1. As Ex. 2 leads to numerically more involved calculations, it was simulated only for sample sizes n = 102 , 103 , 104 , 105 and 1 − α = 90%; for each simulation it was checked whether M was covered by C, and the Lebesgue measure of C was computed. This was consequently independently repeated only 100 times for each sample size. Since M has positive Lebesgue measure, averages (and standard deviations) of the Lebesgue measure of C \ M, i.e. the confidence sets’ excess size, computed over the repetitions for each sample size are given in Tab. 2. In both simulation we found that M was covered in all simulations which is to be expected since our use of mass concentration inequalities results in quite conservative confidence sets. This, however, may to a certain extent be the price one has to pay in order to obtain non-asymptotic, universal confidence sets guaranteeing coverage of all of M. Nonetheless, we observe that the (excess) size of the confidence sets decreases roughly (up to a log-factor) like n− 1 2 so the second step in our construction Table 2. average Lebesgue measure of C \ M over 100 repetitions for different sample sizes (rounded, ± standard deviation) of Ex. 2 n avg. Lebesgue measure (± st. dev.) 102 2.63 (±0.30) 103 0.92 (±0.10) 104 0.31 (±0.03) 105 0.11 (±0.01) had the desired effect to obtain confidence sets of a size usually obtained for M-estimators while the first step was necessary to ensure consistency by also removing local minimisers of F. 5 Discussion and Outlook We would like to stress again that for both examples asymptotic confidence sets cannot easily be constructed since neither example features a unique intrinsic mean; indeed, to the best of our knowledge, the given construction is the first of a confidence set for M applicable in such situations. Of course, the given construction may be repeated for more general compact metric spaces as long as one can construct the necessary grids of points at which to control the functionals; carrying this out for other interesting spaces will be left for further research. We also note that the construction may be improved upon by taking the “variance of the estimator” into account; this corresponds here to making usage of the knowledge about F when controlling F̂n. References 1. Bhattacharya, R.N., Patrangenaru, V.: Large sample theory of intrinsic and ex- trinsic sample means on manifolds II. The Annals of Statistics 33(3), 1225–1259 (2005) 2. Fisher, N.I.: Statistical analysis of circular data. Cambridge University Press, Cam- bridge (1993) 3. Fréchet, M.: Les éléments aléatoires de nature quelconque dans un espace distancié. Annales de l’Institut Henri Poincaré 10(4), 215–310 (1948) 4. Hoeffding, W.: Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58(301), 13–30 (1963) 5. Hotz, T.: Extrinsic vs intrinsic means on the circle. In: Nielsen, F., Barbaresco, F. (eds.) Geometric Science of Information, Lecture Notes in Computer Science, vol. 8085, pp. 433–440. Springer-Verlag, Heidelberg (2013) 6. Hotz, T., Huckemann, S.: Intrinsic means on the circle: Uniqueness, locus and asymptotics. Annals of the Institute of Statistical Mathematics 67(1), 177–193 (2015) 7. Hotz, T., Kelma, F., Wieditz, J.: Universal, Non-asymptotic Confidence Sets for Circular Means, pp. 635–642. Springer International Publishing, Cham (2015) 8. Hotz, T., Kelma, F., Wieditz, J.: Non-asymptotic confidence sets for circular means. Entropy 18(10), 375 (2016) 9. Jammalamadaka, S.R., SenGupta, A.: Topics in Circular Statistics, Series on Mul- tivariate Analysis, vol. 5. World Scientific, Singapore (2001) 10. Le, H., Barden, D.: On the measure of the cut locus of a Fréchet mean. Bulletin of the London Mathematical Society 46(4), 698–708 (2014) 11. Mardia, K.V., Jupp, P.E.: Directional Statistics. Wiley, New York (2000) 12. McKilliam, R.G., Quinn, B.G., Clarkson, I.V.L.: Direction estimation by mini- mum squared arc length. IEEE Transactions on Signal Processing 60(5), 2115–2124 (2012) 13. R Core Team: R: A Language and Environment for Statistical Computing. R Foun- dation for Statistical Computing, Vienna (2013), http://www.R-project.org/