Section 49 Clusters of similar maps

Markiczy and Goldberg (1995) suggest a method for summarising a set of causal maps, one each from a set of respondents, which does not rely on piecing them together but on finding typical clusters of maps, and then presenting the central map from each cluster along with the typical respondent characteristics.

The way they suggest of clustering graphs depends on the concept of distance between graphs. There are various methods Gao et al. (2010) to measure the distance between (pairs of) graphs, some making use of the idea of the maximum common subgraph. The most common of these methods (like Levenshtein distance) do not make use of node or edge attributes – information like the direction, polarity or strength of the influence between nodes.

Markiczy and Goldberg (1995) also point out that also here we need to ask ourselves if the absence of a variable or an arrow is relevant or not.

For example, is the graph

B->C->D->E->F->G->H->I

more similar to

B->C

or to

B->C->D->F->H->I

On the one hand, the second is a perfect subgraph of the first – it has, in a sense, no mistakes; on the other hand, the third has three mini-maps in common with the first (but also some incompatible parts).

It probably makes sense to penalise the second more than the third, because if we were to say that 1 and 2 were close, we would produce results with very small graphs, and we can already see which are the most popular mini-maps just by looking at the frequency of arrows.

I haven’t yet implemented this and am looking for a suitable R package.