Assortativity of kNN graphs

by Patryk Małek

Problem definition

Whether there exists a correlation between:

k parameter of kNN graphs (nodes' number of outgoing connections)
and its assortativity coefficient

Where assortativity is a tendency of nodes to connect to similar nodes

(by similar we take into account in-degree of nodes)

Why is research of assortativity important?

We can investigate whether real world datasets/networks tend to have nodes more keen to connect to nodes alike them or rather connect to random nodes.

Examples of (dis)assortative networks:

Social networks, friendship - assortative
- we connect with people that are alike us in terms of age, nationality, income, etc
Internet - (slightly) disassortative
- Internet nodes can connect to any other node

Research, solution implementation

Created solution calculating assortativity coefficient vs k parameter of kNN graph

Example dataset .csv file, with 4 instances, each with 6 attributes

-10.29700756;-11.66678905;11.56066895;-2.08131576;4.04465580;4.08681488
2.13813472;3.50308204;-0.06407356;0.98330212;2.00134420;1.47868252
-2.19252634;0.83010828;0.45290488;1.08559155;1.96038878;-0.87715614
-0.01794434;0.44264925;0.62042207;-1.13598573;0.08879852;0.18696856

Euclidean metric has been used to calculate distance between nodes

Assortativity coefficient calculation

public double compute() {
  int numLinks = g.getEdgeCount();
  double[] degSrc = new double[numLinks];
  double[] degDst = new double[numLinks];
  
  Collection links = g.getEdges();
  Iterator lit = links.iterator();
  int i = 0;
  while (lit.hasNext()) {
    Link l = lit.next();
    Pair endPoints = g.getEndpoints(l);
    degSrc[i] = g.degree(endPoints.getFirst());
    degDst[i] = g.degree(endPoints.getSecond());
    i++;
  }
  
  PearsonsCorrelation pc = new PearsonsCorrelation();
  return pc.correlation(degSrc, degDst);
}

One of results received via implemented solution

Results and conclusions:

k-NN graphs tend to be dissasortative or show no assortative mixing for small k (k = 1)
k-NN graphs tend to be assortative for large k
Assortativity for large k occurs regardless of hubness phenomena

Follow-up work

Take into account different distance metrics, e.g. manhattan
Use different measures for Pearson's Correlation Coefficient, e.g. person's age

Any questions?

Code available at
github.com/pmalek/knngraphs

Built with bespoke.js