The Reach of a Tweet

What are we doing?

Have you ever guessed how many people could see a tweet that has been retweeted by a lot of people? Could you find it out using python and Twitter API? Let's do it!

Which tweet are we looking at?

My latest tweet received a ton more attention than any of my previous ones. It was retweeted by 45 people, including a high profile tweeter. If you haven't seen it or can't remember, it was the one from which I got this gif:

Talk is cheap. Show me the code!

We are going to use tweepy, "An easy-to-use Python library for accessing the Twitter API".

In [1]:
import tweepy

You thought I would show you my secret keys? You have to create your own app at apps.twitter.com and get yours.

In [2]:
from secret_tweepy import consumer_key, consumer_secret, access_token, token_secret

With your keys at hand, you can get access to Twitter API.

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, token_secret)
api = tweepy.API(auth)

We'll get our tweet by its id. It's the same number that appears in our url: twitter.com/carneiroblogbr/status/693201831216943104

In [4]:
TWEET_TO_ANALYZE = 693201831216943104
tweet = api.get_status(TWEET_TO_ANALYZE)

Just to be sure, let's take a look at that tweet's text:

In [5]:
print(tweet.text)
Juggling is a lot of fun!! So is #opencv ! Thanks, @nedbat #python https://t.co/QwEKYMNRMF https://t.co/aF8op1AL37 https://t.co/zaADj2DPUF

Tweepy limits the amount of retweets that we can fetch to a hundred. Luckily (?) we didn't get that much attention.

In [6]:
retweets = api.retweets(TWEET_TO_ANALYZE, 100)
print('Retweets count: %d' % len(retweets))
Retweets count: 45

Now we can build a dictionary to receive the username of the people that retweeted us. Their screen_name will be our key and the count of followers of each of them is our value.

In [7]:
retweeters = {}
for retweet in retweets:
    retweeters[retweet.user.screen_name] = retweet.user.followers_count

The moment of truth! Let's sum up all of the followers counts and see how many people could have seen our tweet (if they were paying attention).

In [8]:
print('Tweet reach: %d' % sum(retweeters.values()))
Tweet reach: 207308

Wow! That's great! Over 200k!! Please consider that I only have 121 followers...

Who are these retweeters?

Some magic ahead: a lambda function! Our retweeters' dict can be considered a list of tuples with two elements: key (screen_name) and value(followers_count). If we want to see the most followed user first, we have to sort by the second element(value or [1]) and reverse the list. And let's get only the top 10 to save some space.

In [9]:
most_influence = sorted(retweeters.items(),
                        key=lambda rt: rt[1],
                        reverse=True)[:10]

Instead of printing it all in one line, let's print it pretty!

In [10]:
from pprint import pprint
pprint(most_influence)
[('codinghorror', 189319),
 ('nedbat', 4250),
 ('neilkod', 2447),
 ('bostonpython', 1555),
 ('doppenhe', 1311),
 ('software_daily', 911),
 ('MalwareMinigun', 641),
 ('r0ml', 637),
 ('h_rules', 516),
 ('csegura', 455)]

Thank you, @codinghorror!

If you want to try it yourself, feel free to get this code (or even the whole IPython notebook) on my github repo: https://github.com/ocarneiro/twitter-reach