Twenty Four Hours of #swineflu

I've been doing more research on Twitter recently, mostly looking at back channels from conferences (more on that to come). I wanted to post up a quick analysis, though, on a recent story that blew up big--the Swine Flu outbreak (found in twitter, in part, via the #swineflu hashtag.)

Using the Twitter search API and the excellent NetworkX software package, I pulled down and analyzed the tweets from roughly 24 hours before the time I started, which pretty much corresponds to the major growth of the topic on the Twitter service.

I pulled about 11574 total tweets then made a directed graph using NetworkX. To do this, I looked for any occurrence of @something in each tweet--not just the replied to tweeter that shows up in the API data. I found that a lot of people reply to more than one tweeter at the same time, so I took every incident of "@" to be a reply. I then made each tweeter a node and each reply from a tweeter to another one an edge.

Here's a graph of the largest "strongly connected component" in the results. This is the biggest set of tweeters where everyone is both replying and replied to. It does a reasonably good job of showing some of the major players, like AndrewPWilson, who is the biggest hub (more on him below). What's really interesting is what is not shown--I can't find CDC Emergency, the biggest authority, anywhere in there.

#swineflu's Largest Strongly Connected Component: The 24-hour twitterfall ending at around 5 p.m. Eastern time on 4/26/2009.#swineflu's Largest Strongly Connected Component: The 24-hour twitterfall ending at around 5 p.m. Eastern time on 4/26/2009.

What is a hub and an authority? Bearing in mind that I'm pretty new at this myself, hubs and authorities are determined by the HITS algorithm. A hub, as I understand it, is a node that links to the highest-scoring authorities--kind of like a really reliable directory of important references. An authority is a node that has been linked to from the highest-scoring hubs--it's been acknowledged, by the most important referrers, to be the most important source of information.

HITS is a bit like the famous PageRank algorithm used by Google. Both wind up identifying the most important things linked to by important other things (I have the PageRank results listed below, too, if you're interested). The cool thing about HITS is that I can get a sense, using my own terms, of who is a reporter and who is a source (n.b.: I'm using "source" in the journalistic sense, not in graph-theory terms.)

From the results I get, AndrewPWilson is the highest-scoring reporter, referencing the most high scoring sources, including CDC Emergency, mashable, and others. Wilson is also the #8 authority, so plenty of other big hubs are replying to his tweets, too. He's primarily a reporter, but is also a respected source retweeted by other folks.

Wilson, according to his page, is a member of the U.S. Health and Human Services social media team. No, I had no idea such a thing existed either, but I'm happy to find out about it--he's clearly on the ball. So, having Wilson appear from this data as a hub and lesser authority is not a big surprise, given his occupation.

Less clear is who the other hubs are. thetuyoboard is a tweeter from the suburbs of Bogota. Unfortunately, I don't read Spanish, so I'm not able to get a sense of his posts, but he has been retweeting quite a lot. From there on, the scores of the hubs drop pretty quickly.

Update:I just a got a very cool reply from thetuyoboard

@mike_j_edwards thing I do: Try to keep friends on the #swineflu know translating info to spanish via facebook via twitter. Thks 4 d post.

It's awesome that thetuyoboard took up the cause to translate the major updates on Twitter for the Spanish-speaking community. It's an interesting example of the spontaneous specialization that occurs in even an ad hoc community.

CDC Emergency is the top authority. Tweeters are taking its reports as canon and retweeting them like crazy. CDC Emergency is far and away the biggest authority, compared to the next biggest, mashable, who appears to be tweeting about the various social media resources and occurrences related to the flu. Lots of people are passing along those resources, too.

There are some conspicuous absences. cnnbrk, with the second largest number of follower in Twitter, next to Ashton Kutcher and just above Britney Spears, is nowhere to be found. This partly due to CNN's seeming hesitation to use hashtags--they just won't show up in the analysis. But I add in tweeters who have been retweeted (even if they do not use hashtags themselves), so they should at least show up somewhere. It's possible that the story is being made more or less directly between the sources, like CDC Emergency and Wilson, and the twitter community. That's only a guess. It'd also be interesting to see what would happen if Kutcher or Spears tweeted with #swineflu, but that hasn't happened yet.

Below are my raw results, including some measures of centrality (betweenness, degree) that I haven't discussed. If you think it's interesting, let me know. I'm going to make the software I used for this available pretty soon, so keep checking back. In the meantime, I'm really interested in hearing what people think, so leave a comment or hit me up on Twitter if you like.

Update: Just posted the BIG version of the tweeter graph, which makes looking at the connections a lot easier.

Betweenness centrality

igeldard 0.002
lauras 0.003
malbonster 0.003
zen2012 0.003
thebeerlady 0.003
alonis 0.004
swineflu2009 0.004
autumn_meadows 0.005
hyperlocavore 0.006
andrewpwilson 0.007

Indegree centrality

mpoppel 0.009
terrycojones 0.011
murnahan 0.012
andrewpwilson 0.013
swineflu2009 0.014
veratect 0.016
jimmy_wales 0.016
breakingnews 0.016
mashable 0.047
cdcemergency 0.094

Outdegree centrality

retweet_bot 0.004
alonis 0.004
polymath22 0.005
asteris 0.005
the_tech_update 0.005
lyne_robichaud 0.007
malbonster 0.007
hyperlocavore 0.009
zen2012 0.009
andrewpwilson 0.021

Pagerank

swineflu2009 0.004
murnahan 0.004
terrycojones 0.005
jimmy_wales 0.006
breakingnews 0.007
veratect 0.009
jeanettecole 0.017
cdc 0.019
mashable 0.020
cdcemergency 0.040

Hubs

twellness 0.01
standingfirmcm 0.01
brucepatrick23 0.01
fingertipnews 0.01
birdflugov 0.01
retweet_bot 0.01
madlolscientist 0.01
zen2012 0.01
thetuyoboard 0.02
andrewpwilson 0.15

Authorities

hadramie 0.00
veratect 0.00
andrewpwilson 0.00
mpoppel 0.01
breakingnews 0.01
swineflu2009 0.01
webnex 0.01
cdc 0.01
mashable 0.01
cdcemergency 0.47

Metadata

Total tweets
11574
HT (heard through)
12
OH (overheard)
52
RT (retweet)
3204

Copyright Mike Edwards 2006-2009. All content available under the Creative Commons Attribution ShareAlike license, unless otherwise noted.