Comparison of linear modularization criteria of networks using relational metric

28/08/2013
OAI : oai:www.see.asso.fr:2552:4901
DOI :

Résumé

Comparison of linear modularization criteria of networks using relational metric

Média

Voir la vidéo

Métriques

94
12
1.53 Mo
 application/pdf
bitcache://322d659701578ea85d78559a668e130de9b37b61

Licence

Creative Commons Aucune (Tous droits réservés)

Sponsors

Sponsors scientifique

logo_smf_cmjn.gif

Sponsors financier

logo_gdr-mia.png
logo_inria.png
image010.png
logothales.jpg

Sponsors logistique

logo-minesparistech.jpg
logo-universite-paris-sud.jpg
logo_supelec.png
Séminaire Léon Brillouin Logo
logo_cnrs_2.jpg
logo_ircam.png
logo_imb.png
<resource  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xmlns="http://datacite.org/schema/kernel-4"
                xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4/metadata.xsd">
        <identifier identifierType="DOI">10.23723/2552/4901</identifier><creators><creator><creatorName>Patricia Conde Céspedes</creatorName></creator></creators><titles>
            <title>Comparison of linear modularization criteria of networks using relational metric</title></titles>
        <publisher>SEE</publisher>
        <publicationYear>2013</publicationYear>
        <resourceType resourceTypeGeneral="Text">Text</resourceType><dates>
	    <date dateType="Created">Tue 17 Sep 2013</date>
	    <date dateType="Updated">Sun 25 Dec 2016</date>
            <date dateType="Submitted">Fri 10 Aug 2018</date>
	</dates>
        <alternateIdentifiers>
	    <alternateIdentifier alternateIdentifierType="bitstream">322d659701578ea85d78559a668e130de9b37b61</alternateIdentifier>
	</alternateIdentifiers>
        <formats>
	    <format>application/pdf</format>
	</formats>
	<version>25044</version>
        <descriptions>
            <description descriptionType="Abstract"></description>
        </descriptions>
    </resource>
.

Comparison of linear modularization criteria of networks using relational metric Patricia Conde C´espedes LSTA, Paris 6 August 2013 Thesis supervised by J.F. Marcotorchino (Thales Scientific Director) Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 1 / 35 Outline 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 2 / 35 Introduction and objective Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 3 / 35 Introduction and objective Description of the problem Objective: Compare the partitions found by different linear criteria Nowadays, we can find networks everywhere: (biology, computer programming, marketing, etc). Some practical applications are: Cyber-Marketing: Cyber-Security: It is difficult to analyse a network directly because of its big size. Therefore, we need to decompose it in clusters or modules ⇐⇒ modularize it. Different modularization criteria have been formulated in different contexts in the last few years and we need to compare them. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 4 / 35 Introduction and objective Graph partition Definition of module or community. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 5 / 35 Mathematical Relational representations of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 6 / 35 Mathematical Relational representations of criteria Mathematical Relational modeling Modularizing a graph G(V, E) ⇔ defining an equivalence relation on V . Let X be a square matrix of order N = |V | defining an equivalence relation on V as follows: xii = 1 if i and i are in the same cluster ∀i, i ∈ V × V 0 otherwise (1) We present a modularization criterion as a linear function to optimize: Max X F(A, X) (2) subject to the constraints of an equivalence relation: xii ∈ {0, 1} Binarity (3) xii = 1 ∀i Reflexivity xii − xi i = 0 ∀(i, i ) Symmetry xii + xi i − xii ≤ 1 ∀(i, i , i ) Transitivity Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 7 / 35 Mathematical Relational representations of criteria Properties verified by linear criteria Every linear criteria is separable as it can be written in the general form (it is possible to separate the data from the variables): F(X) = N i=1 N i =1 φ(aii )xii + constant (4) where aii is the general term of the adjacency matrix A and φ(aii ) is a function of the adjacency matrix only. Besides, the criterion is balanced if it can be written in the form: F(X) = N i=1 N i =1 φ(aii )xii + N i=1 N i =1 ¯φ(aii )¯xii (5) Where: ¯xii = 1 − xii represents the opposite relation of X, noted ¯X. φ(aii ) ≥ 0 ∀i, i and ¯φ(aii ) ≥ 0 ∀i, i are non negative functions verifying: N i=1 N i =1 φii > 0 and N i=1 N i =1 ¯φii > 0. As we will see later the functions φ and ¯φ behave as ”costs”. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 8 / 35 Mathematical Relational representations of criteria The property of Linear balance Given a graph If ¯φii = 0 ∀i, i all the nodes are clustered together, then κ = 1. If φii = 0 ∀i, i all nodes are separated, then κ = N If N i=1 N i =1 φii = N i=1 N i =1 ¯φii the criterion is a null model and therefore it has a resolution limit. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 9 / 35 Mathematical Relational representations of criteria Existing linear functions Function Relational notation Zahn-Condorcet (1785, 1964) FZC(X) = N i=1 N i =1 (aii xii + ¯aii ¯xii ) Owsi´nski-Zadro˙zny (1986) FZOZ (X) = N i=1 N i =1 ((1−α)aii xii +α¯aii ¯xii ) with 0 < α < 1 Newman-Girvan (2004) FNG(X) = 1 2M N i=1 N i =1 aii − ai.a.i 2M xii Table: Linear criteria Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 10 / 35 Mathematical Relational representations of criteria Three new linear criteria Function Relational notation Deviation to uniformity (2013) FUNIF(X) = 1 2M N i=1 N i =1 aii − 2M N2 xii Deviation to indeter- mination (2013) FDI(X) = 1 2M N i=1 N i =1 aii − ai. N − a.i N + 2M N2 xii Balanced modularity (2013) FBM (X) = N i=1 N i =1 (aii − Pii ) xii + (¯aii − ¯Pii )¯xii where Pii = ai.a.i 2M and ¯Pii = ¯aii − (N−ai.)(N−a.i ) N2−2M Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 11 / 35 Mathematical Relational representations of criteria Interpretation of new linear criteria Uniformity structure Indetermination structure Duality independance and indetermination structure: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 12 / 35 Mathematical Relational representations of criteria Some properties of these new criteria Whereas the Newman-Girvan modularity is based on the ”deviation from independance” structure, the DI index criterion is based on the ”deviation to the indetermination” structure. All these three new criteria are null models as Newman-Girvan modularity. The balanced modularity is a balanced version of Newman-Girvan modularity. If all the nodes had the same degree : dav = N i=1 ai N = 2M N all the new criteria would have the same behavior as Newman-Girvan modularity does: FNG ≡ FUNIF ≡ FBM ≡ FDI. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 13 / 35 Algorithm and some results Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 14 / 35 Algorithm and some results The number of clusters The Louvain algorithm is easy to adapt to Separable criteria. Data Jazz Internet N = 198 N = 69949 M = 2742 M = 351380 Function κ κ Zahn-Condorcet 38 40123 Owsi´nski-Zadro˙zny 6 α = 2% 456 α < 1% Newman-Girvan 4 46 Deviation to uniformity 20 173 Deviation to indetermination 6 45 Balanced modularity 5 46 How to explain these differences? Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 15 / 35 Comparison of criteria Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 16 / 35 Comparison of criteria Impact of merging two clusters Now let us suppose we want to merge two clusters C1 and C2 in the network of sizes n1 and n2 respectively. Let us suppose as well they are connected by l edges and they have average degree d1 av et d2 av respectively. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 17 / 35 Comparison of criteria Impact of merging two clusters What is the contribution of merging two clusters to the value of each criterion? The contribution C of merging two clusters will be: C = n1 i∈C1 n2 i ∈C2 (φii − ¯φii ) (6) The objective is to compare function φ(.) to function ¯φ(.) If C > 0 the criterion merges the two clusters, the contribution is a gain. If C < 0 the criterion separates the two clusters, the contribution is a cost. l is the number of edges between clusters C1 and C2. ¯l is the number of missing edges between clusters C1 and C2. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 18 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Zahn-Condorcet criterion: CZC = (l − ¯l) = l − n1n2 2 (7) The Zahn-Condorcet criterion requires that the connexions within the cluster be bigger than the absence of connexions ⇐⇒ the number of connections l between C1 and C2 must be at least as half as the possible connexions between the two subgraphs. This criterion does not have resolution limit as the contribution depends only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. With this criterion we obtain many small clusters or cliques, some of them are sigle nodes. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 19 / 35 Comparison of criteria Contribution of merging two clusters The contribution for the Owsi´nski-Zadro˙zny criterion: COZ = (l − αn1n2) 0 < α < 1 (8) As this Zahn-Condorcet criterion is so exigent we obtain many small clusters, the Owsi´nski-Zadro˙zny criterion gives the choice to define the minimum required percentage of within-cluster α edges. This coefficient defines the balance between φ and ¯φ. For α = 0.5 the wsi´nski-Zadro˙zny criterion ≡ the Zahn-Condorcet criterion. α is defined by the user as the minimum required fraction of within- cluster edges. This criterion does not have resolution limit as the contribution depend only upon local properties: l, ¯l, n1, n2. The contribution does not depend on the size of the network. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 20 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Newman-Girvan criterion: CNG = l − n1n2 d1 avd2 av 2M (9) The contribution depends on the degree distribution of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M. The optimal partition has no clusters with a single node. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 21 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to Uniformity criterion: CUNIF = l − n1n22M N2 (10) This criterion is a particular case of Zahn-Condorcet (or Owsi´nski-Zadro˙zny) criterion with α = 2M N2 which can be interpreted as a density occupancy of edges among the nodes δ. To merge the two clusters l n1n2 > δ the fraction of within clusters edges must be greater than the global density of edges δ. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 22 / 35 Comparison of criteria Impact of merging two clusters The contribution for the deviation to indetermination criterion: CDI = l − n1n2 d1 av N + d2 av N − 2M N2 (11) The contribution depends on the degree distribution of the clusters and on their sizes. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. It favors big cluster with high average degree and small clusters with low average degree. So, the degree distribution of each cluster obtained by this criterion tends to be more homogeneous than that of the clusters obtained by optimizing the Newman-Girvan criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 23 / 35 Comparison of criteria Impact of merging two clusters The contribution for the Balanced modularity criterion: CBM = 2l + n1n2 (N − d1 av)(N − d2 av) N2 − 2M − n1n2 − n1n2 d1 avd2 av 2M (12) The contribution depends on the degree distribution of the clusters and on the sizes of the clusters. This criterion has a resolution limit since the contribution depends on global properties of the whole network M and N. Depending upon δ and dav this criterion behaves like a regulator between the Newman-Girvan criterion and the deviation to indetermination criterion. On one hand the degree distribution within clusters is more homogeneous than that found with Newman-Girvan criterion. On the other hand, the degree distribution within clusters is more heterogeneous than that found with deviation to indetermination criterion. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 24 / 35 Comparison of criteria Summary by criterion Criterion Characteristics of the clustering Zahn-Condorcet Clusters have a fraction of within cluster edges greater than 50%. Owsi´nski-Zadro˙zny A generalization of ZC criterion where the user defines the minimum fraction of within cluster edges. Deviation to Unifor- mity The OZ criterion for α = δ the density of edges among the nodes. Newman-Girvan It has a resolution limit and the optimal clus- tering does not contain isolated nodes. Deviation to uniformity Within cluster degree distribution is more homogeneous than that found by the Newman-Girvan criterion. Balanced modularity It behaves like a regulator between the NG criterion and the DI criterion. . Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 25 / 35 Applications Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 26 / 35 Applications ”Data: Zachary karate club network” A network of friendships between the 34 members of a karate club at a US university. N = 34 nodes, M = 78 edges, dav = 4.6 and δ = 0.13 Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 27 / 35 Applications ”Data: Zachary karate club network” The number of clusters per criterion (with Louvain Algorithm): Criterion Number of clusters Single nodes Zahn-Condorcet 19 12 Owsi´nski-Zadro˙zny 7 (α = 0.2) 3 Deviation to uniformity 6 2 Newman-Girvan 4 Deviation to indetermination 4 Balanced modularity 4 The partitions found with Newman-Girvan, Deviation to indetermination and the Balanced modularity are the same. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 28 / 35 Applications ”Data: Zachary karate club network” Density of within cluster edges: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 29 / 35 Applications ”Data: Zachary karate club network” Partitions obtained by the criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 30 / 35 Applications The Jazz network A network of jazz musicians, N = 198, M = 2742, dav ∼= 27.7 and δ = 0.14. The clusters found by three criteria: Newman-Girvan nj dj av σj cvj 62 32.3 18.5 0.57 53 30.5 16.2 0.53 61 20.3 14.1 0.69 22 28.4 20.1 0.71 Balanced modularity nj dj av σj cvj 60 33.1 18.2 0.55 53 31.3 16.3 0.52 61 20.3 14.1 0.69 23 26 19.4 0.75 1 1 0 0 Deviation to indetermination nj dj av σj cvj 63 19.8 14.2 0.71 63 33.7 16 0.48 18 13.8 5.2 0.37 51 36.4 17.7 0.49 2 2.5 2.1 0.85 1 1 0 0 Where nj is the size of the cluster, dj av is the average degree, σj is the standard deviation and cvj is the coefficient of variation of the degree of the cluster j. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 31 / 35 Applications The Jazz network The coefficient of variation for the three criteria: Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 32 / 35 Conclusions Plan 1 Introduction and objective 2 Mathematical Relational representations of criteria 3 Algorithm and some results 4 Comparison of criteria 5 Applications 6 Conclusions Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 33 / 35 Conclusions Conclusions We presented 6 modularization criteria in Relational Analysis notation, which allowed us to easily calculate their contribution, cost or gain, when merging two clusters. We analysed important characteristics of different criteria. We compared the differences found in the partitions provided by each criterion. However the 3 criteria we introduced have nearly the same properties they differ depending mainly on the degree distribution, on the sizes of the clusters and on global characteristics of the graph. Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 34 / 35 Conclusions Thanks for your attention! Patricia Conde C´espedes (LSTA, Paris 6) Comparison of linear modularization criteria of networks using relational metricAugust 2013 35 / 35