Tuesday, June 11, 2013

How the NSA could have stopped Paul Revere using metadata

London, 1772.

I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. ...

The analysis in this report is based on information gathered by our field agent Mr David Hackett Fischer and published in an Appendix to his lengthy report to the government. ...

Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not. Surely this is but a small encroachment on the freedom of the Crown’s subjects. ...

If you want to follow along yourself, there is a secret repository containing the data and the appropriate commands for your portable analytical engine.

Here is what the data look like. ...

The organizations are listed in the columns, and the names in the rows. As you can see, membership is represented by a “1”. So this Samuel Adams person (whoever he is), belongs to the North Caucus, the Long Room Club, the Boston Committee, and the London Enemies List. I must say, these organizational names sound rather belligerent. ...

I cannot show you the whole Person by Person matrix, because I would have to kill you. I jest, I jest! It is just because it is rather large. But here is a little snippet of it. At this point in the eighteenth century, a 254x254 matrix is what we call ”Bigge Data”. I have an upcoming EDWARDx talk about it. You should come. ...

You can see here that Mr Appleton and Mr John Adams were connected through both being a member of one group, while Mr John Adams and Mr Samuel Adams shared memberships in two of our seven groups. ...

Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.

Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any) or his personal life. All I know is this bit of metadata, based on membership in some organizations. ...

...we could calculate a betweenness centrality measure for everyone in our matrix, which is roughly the number of “shortest paths” between any two people in our network that pass through the person of interest. It is a way of asking “If I have to get from person a to person z, how likely is it that the quickest way is through person x?” Here are the top betweenness scores for our list of suspected terrorists...

Perhaps I should not say “terrorists” so rashly. But you can see how tempting it is. Anyway, look—there he is again, this Mr Revere! Very interesting. ...

At the present time, alas, the technology required to automatically collect the required information is beyond our capacity. But I say again, if a mere scribe such as I—one who knows nearly nothing—can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.