Visualizing projective shape space


Voir la vidéo


2.01 Mo


Creative Commons Aucune (Tous droits réservés)


Sponsors scientifique


Sponsors financier


Sponsors logistique

Séminaire Léon Brillouin Logo
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/2552/4895</identifier><creators><creator><creatorName>John Kent</creatorName></creator></creators><titles>
            <title>Visualizing projective shape space</title></titles>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Tue 17 Sep 2013</date>
	    <date dateType="Updated">Wed 31 Aug 2016</date>
            <date dateType="Submitted">Sun 10 Feb 2019</date>
	    <alternateIdentifier alternateIdentifierType="bitstream">c8239ca94dc5c9686cf7745f87ec9d776cb7eec3</alternateIdentifier>
            <description descriptionType="Abstract"></description>

Visualizing projective shape space John Kent University of Leeds hello j.t.kent@leeds.ac.uk http://maths.leeds.ac.uk/~john GSI August 2013 Overview This talk is about a camera view of a “scene”, where the scene contains a set of collinear points in the plane (using a one-dimensional film), or a set of coplanar points in three dimensions (using a two-dimensional film). We are interested in the information in the scene that is invariant to the location of the focal point of the camera and the orientation of the film. Thus we are looking for features in the scene that are invariant under the group of projective transformations. Such features are known as projective invariants. The collection of information in projective invariants is called “projective shape”. Unfortunately, projective invariants, as usually formulated, are not suitable for quantitative statistical analysis — there is no obvious metric between different sets of projective invariants. The purpose of this talk is to give a standardized representation of projective shape that is amenable to metric comparisons. The simplest case — 4 collinear points For much of the talk we focus on the simplest case (k = 4 points in m = 1 dimension), where there is just projective invariant — the cross ratio. We then generalize the methodology to higher values of k and m. The next slides illustrate the main issues. First is a figure containing a scene of 4 collinear points, a focal point of a camera, and a linear film. The effect of changing camera position is then illustrated by two images from my back garden taken from different positions. Camera view of 4 collinear points * * * * X X XX My back garden View 1 of lanterns View 2 of lanterns The cross ratio Given four numbers u1, . . . , u4 (representing coordinates for four labelled collinear points in a two-dimensional scene), the cross-ratio is defined by τ = (u2 − u1)(u4 − u3) (u3 − u1)(u4 − u2) . It can be shown that the cross ratio is the one and only projective invariant in this situation. If the landmarks are re-labelled (there are 24 permutations), the cross ratio takes 6 possible forms (spanning all of R if the original value of τ is restricted to the interval (0, 1/2)): τ, 1 − τ, 1/(1 − τ), 1/τ, −(1 − τ)/τ, −τ/(1 − τ) (0, 1/2), (1/2, 1), (1, 2), (2, ∞), (−∞, −1), (−1, 0) Cross ratios in the back garden From the two images of my back garden, I extracted the coordinates of the lanterns and computed τ in each case. The answers are very similar (as expected)! τ1 = 0.489, τ2 = 0.487. Unsuitability of cross ratio for metric comparions The behavior of the cross ratio under relabelling underscores its unsuitability for metric comparisons. In particular if we want to compare two cross ratios near 0 (e.g. τ1 = 0.1, τ2 = 0.01), they look very close together on the τ scale (|0.1 − 0.01| = 0.09), but quite far apart on the 1/τ scale (|10 − 100| = 90), which means the labelling of the landmarks affects metric comparisons between cross ratios. What to do? We shall look at a geometric solution (limited to 4 collinear landmarks) and an algebraic solution (more landmarks and higher dimensions). Geometric standardization for 4 collinear landmarks Suppose the four landmarks are labelled, A,B,C,D in increasing order on the line. Draw two semi-circles, one with diameter AC and the other with diameter BD. The two semicircles intersect in a point O, say. Make this point the focal point of the camera. Switch from linear film to circular film. The image of a landmark is now a pair of antipodal points on the circle. The angles AOC and BOD are right angles. The angle AOB, δ, say, is related to the cross ratio by τ = sin2δ. Further under relabellings the cross ratio takes the following forms in terms of δ: sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, Geometric choice of preferred focal point. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 O A B C D q Standardized image of 4 collinear points on circular film −1.0 −0.5 0.0 0.5 1.0 −1.0− Standardized configuration Y X X XX Homogeneous coordinates To understand why this choice of focal point is useful for metric comparisons, we need to do some algebraic calculations. The first step is to construct homogeneous coordinates. Starting with the four real coordinates u1, . . . , u4 construct a 4 × 2 “augmented” configuration matrix by adding a column of ones, X =     u1 1 u2 1 u3 1 u4 1     =     x1 x2 x3 x4     where xT i denotes the ith row of X, and think of each row as defined only up to a scalar multiple. (In general X is a k × p matrix, p = m + 1.) Projective shape as an equivalence class of matrices It can be shown that the projective shape is precisely the information in X that is invariant under the transformations X → DXB, where D = diag(di ) is a k × k diagonal nonsingular matrix (the distance between the focal point and each landmark in the scene is unknown), and B(p × p) nonsingular (representing the effect of focal point position). Thus projective shape can be described in terms of an equivalence class of matrices. How can we choose a preferred element of the equivalence class? Tyler standardization for projective shape — 1 For projective shape recall that X ≡ DXB. Let us choose D and B so that after standardization (a) the rows of X are unit vectors, xT i xi = 1, i = 1, . . . , k, and (b) the columns of X are orthonormal, up to a factor k/p, XT X = (k/p)Ip. Choice of D: Since each row of X is defined only up to a multiplicative constant, we can scale each row of X so the first element is 1 (the conventional choice, appropriate for flat film) or to have norm 1 (the Tyler choice, appropriate for spherical film), in both cases with the focal point of the camera at the origin. The existence of a solution for D and B is due to Dave Tyler who developed a similar result in the context of robust estimation of a covariance matrix in multivariate analysis. In general D and B must be found numerically using an iterative algorithm. Tyler standardization for projective shape — 2 Let Y = DXB denote the Tyler standardized configuration after using the optimal D and B. Then the rows yi , i = 1, . . . , k are unit vectors and the columns are orthonormal up to a factor k/p. On our spherical film the yi are “uniformly spread” around the unit sphere in Rp in terms of their moment of inertia matrix, Y T Y = yi yT i = (k/p)Ip. Note that Y is unique up to (a) multiplying each row of Y by ±1, and (b) multiplying Y on the right by a p × p orthogonal matrix. How to remove these remaining indeterminacies in Y ? Embedding From a standardized configuration Y , define an “absolute inner product” matrix M(k × k) by mij = |yT i yj |, i, j = 1, . . . , k. Then (a) mij is invariant under sign changes for each row and under rotation/reflection of data around the circle. (b) At least for p = 2, it is possible to reconstruct the projective shape of Y from M. (c) Hence, at least for p = 2, M is a representation of the projective shape of Y Tyler standardization for 4 collinear points In the case k = 4, p = 2 it can be shown that that a standardized configuration Y takes the form Y =     v(−δ/2)T v(δ/2)T v(π/2 − δ/2)T v(π/2 + δ/2)T     =     c −s c s s c s −c     , where v(θ) = (cos(θ), sin(θ))T c = cos(δ/2), s = sin(δ/2), 0 < δ < π/4 unique up to (a) permutation of landmarks, (b) sign of each row, (c) rotation/reflection of data around the circle. Then τ is related to δ by one of the trig functions sin2 δ, cos2 δ, sec2 δ, csc2 δ, − tan2 δ, − cot2 δ, depending on the permutation. Standardized representation of 4 collinear points −1.0 −0.5 0.0 0.5 1.0 −1.0− Standardized configuration Y X X XX Embedding for 4 collinear points In this case Y =     c −s c s s c s −c     , c = cos(δ/2) s = sin(δ/2) , where 0 < δ < π/2. Then M =     1 C 0 S C 1 S 0 0 S 1 C S 0 C 1     where C = cos(δ), S = sin(δ). Note m2 12 + m2 13 + m2 14 = 1 with one structural 0, so M can be represented as the edges of a spherical triangle, in unit sphere in R3. Projective shape space for 4 collinear points as a spherical triangle (a) A=C A=B A=D A~C A~D A~B q q q q q q 0 0.5 1 2−1 +/− ∞ Interpretation of the spherical triangle The position of the structural 0 in M is closely related to the ordering of the landmarks. In particular it identfies which pairs of landmarks are perpendicular in the circular film image. In our earlier picture with ordered landmarks, A,B,C,D, angles AOC (and hence also BOD) were right angles. At one end of this edge (i.e. vertex of the spherical triangle), landmarks A & B coalesce (as do landmarks B & D). At the other vertex, landmarks A & D coalesce (as do landmarks B & C). Why corners? Why does the spherical triangle representation of projective shape space for 4 collinear landmarks have corners? In terms of the cross ratio, τ = {B − A)(D − C)}/{C − A)(D − B)}, there seems no reason for corners. E.g. hold A < C < D fixed and let B vary through the extended real line. Then the cross ratio varies in a bijective fashion through the extended real line. If we avoid the singularity at B = D, then the cross ratio is an infinitely differentiable function of B. In particular, there is no hint of a singularity as B passes through A and C. But at these points (B = A and B = C) the cross ratio takes the values 0 and 1, respectively, corresponding to two of the vertices in projective shape space. Where do these singularities (i.e. vertices or corners) come from? The reasons for corners (a) The first answer is that when B approaches one of the other three landmarks, e.g. B → A, Tyler standardization forces the other two landmarks to come together as well. Thus the single-pair singularity in the simple cross ratio description (B = A) is actually a double-pair singularity (B = A, D = C) in the Tyler-standardized description. (b) Further, there are two distinct ways to move away from a singularity (e.g. B = A, D = C) in terms of the separation of the landmarks. On one edge (the lower edge of the spherical triangle we have A is separated from C (and hence B is separated from D). On the other edge (the left edge of the spherical triangle) we have A is separated from D (and hence B is separated from D). (c) The rank of the Tyler standardized configuration Y drops from 2 to 1 at the corners. Further ideas I: Statistical issues It is possible to do distribution theory in some simple cases (e.g. 4 iid normally distributed landmarks on the line), but the results are complicated, the pdfs have singularities at the corners of the spherical triangle, and such models are not very realistic. A more promising approach is to look in more detail at the effect of small-scale variability about a fixed configuration/projective shape. But the pose of the object affects the distribution of projective shape. Further ideas II: Four types of projective shape space In many cases there is partial information about the camera: (a) oriented vs. unoriented, and (b) directional vs. axial. (a) In an oriented camera we know the side of the scene that the camera lies on. That is, mathematically we know whether det(B) is positive or negative. Conversely, for an unoriented camera, the sign of det(B) is unknown. (b) In a directional camera we know whether an image point lies between the focal point of the camera and the corresponding real-world point, or whether the focal point lies between the image point and the real-world point. In an axial camera this information is not available. Mathematically, in terms of the k × k diagonal matrix D, we require the di > 0 for a directional camera, and merely that di = 0 for an axial camera. Which version of projective shape space to use? Projective geometry focuses mainly on an unoriented axial camera. However, in real life a camera is usually oriented and directional. We now illustrate these ideas for the simplest situation of k = 4 collinear points (m = 1 dimension). Comments for 4 collinear points Directional vs. axial For a directional camera, the red “X”s are observed. For an axial camera, we cannot distinguish each red “X” from the opposite point on the circle. Oriented vs. unoriented For an oriented camera, we see the circle as given. For an unoriented camera, we cannot distinguish the circle from its reflection.