Mining the Wikileaks 9/11 Pager Data

I spent Thanskgiving and the day after relaxing in my own peculiar way--by mining the Wikileaks 9/11 pager data.

Here are some early results:

I started by pulling out all the email addresses from the pages and storing them in their own table, with keys to their original page. I also pulled the unique pager numbers from the pages. What I got was a bipartite directed graph with one side being emails and the other being pagers, with messages functioning as edges. Using Django, Graphviz, Cinelerra and a bunch of other tools, I was able to make a video of the graph as it lights up on each relevant page.

Here's a single image that show the whole graph at once:

Beware! This is a 3.3-megabyte image--and that's shrunken down from the 64-megabyte monster on my laptop. The graph is truly immense. Notice the large subgraphs in the center and the smaller ones at the outside. At the very edges are tiny two-node subgraphs showing emails and pagers that are unique to each other. Very beautiful, really, this picture of human activity in the midst of intense trauma.

The big, dense circular subgraph in the center are people who are getting major news feeds off their pagers, particularly CNN Breaking News. It covers several thousand unique connections, including a kind of fascinating bunch from Bank Of America, where a lot of employees get work emails over the pager network

Up and to the left is a subgraph with another set of companies, AT&T and IBM. It's just barely a single subgraph--there's only one pager number that gets both AT&T and IBM emails. That's probably a contractor from IBM assigned to AT&T or vice-versa.

Another large subgraph has employees from Mastercard. It's substantially smaller, though still large enough to be kind of interesting to read.

Note that the IBM-ATT and Mastercard subgraphs weren't picked out by me--they just happened to not be connected to anything else, like CNN or Yahoo new updates. There's probably a bunch of interesting conversations buried in the big subgraph, like the Bank of America pages, but you'd have to tease them out some more.

I've also been scraping the IP Addresses, domain names, and telephone numbers. More on that when I get around to making those graphs. One quick tidbit that I tweeted earlier: the pager that was getting the larger number of unique IP addresses was getting emails from none other than Enron. That person was probably a sys-admin assigned to their servers.

