Twitter data

Our goal is to create a program (daemon) that will run every set amount of time to query Twitter for tweets matching a given search query. We want to then store these tweets in a database for later analysis.

Goals

The goal of this project is to create a program that is run every set amount of time that retrieves recent tweets from Twitter matching a certain search query. The program should be able to

  • Retrieve tweets from Twitter matching any search criteria
  • Store retrieved tweets in a database
  • Handle network failures gracefully
  • Obey Twitter rate limits

Introduction

Luckily for us, Twitter offers and official API through which we can interact with their databases. To begin you will need to create an account with Twitter; we will use this account to request an API key for our application. You get get started by cloning the orie-5270-twitter.git repository.

Documentation

The Twitter API is well documented. The documentation home page is available at https://dev.twitter.com/overview/documentation. Interesting sections include authorizing your application and the actual REST API.

Requesting an API key

Log into your Twitter account, navigate to https://apps.twitter.com, and click the “Create New App” button. Once created you can click on your project to manage your API keys. It is very important that you keep your API keys secret. If you believe you have compromised your API key you need to go to this page and revoke the key immediately.

First steps

Your first goal should be to create a program that can request a which can be used by your program to make subsequent API calls. The previous link details the steps that need to be taken to achieve this; the following sections follow along with this documentation.

Throughout this section we will use the same keys as in the documentation, namely we assume we have

API_KEY = 'xvz1evFS4wEEPTGEFPHBog'
API_SECRET_KEY = 'L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg'

You will want to proceed with the keys for your application, although you should check to make sure your code correctly encodes these keys.

Encoding the consumer key and secret key

Relevant documentation

We join the API key and API secret key with a colon to create the bearer credentials.

bearer_credentials = '{}:{}'.format(API_KEY, API_SECRET_KEY)

We then base64 encode these credentials

import base64
encoded = base64.b64encode(bytes(bearer_credentials, 'utf-8'))
credentials = str(encoded, 'utf-8')

Note that we first encode the bearer_credentials string as UTF-8 data, and then decode it back to a standard Python string.

Obtaining a bearer token

Relevant documentation

Once you have constructed your credentials from above, you are ready to request a bearer token from Twitter. Using the requests library, this can be done as follows.

headers = {
    'Authorization': 'Basic {}'.format(credentials),
    'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
}
data = 'grant_type=client_credentials'
auth_endpoint = 'https://api.twitter.com/oauth2/token'
r = requests.post(auth_endpoint, headers=headers, data=data)

The variable r is a response object that we should inspect for our result. First, it should return the 200 status code (success).

if r.status_code != 200:
    raise RuntimeError('Bad status code: {}'.format(r.status_code))

You should of course use a more reasonable exception type, since your application will need to handle errors gracefully. Assuming you received the correct response, you now need to verify you were given a bearer token. We convert the content of the response to its JSON representation.

resp = r.json()
if resp['token_type'] != 'bearer':
    raise RuntimeError('Bad token type: {}'.format(resp['token_type'])

Again, use a better exception type. Finally, we can get our bearer token.

bearer_token = resp['access_token']

This token should be saved in your program to be used in future API calls. Note that this token will need to be refreshed periodically, so this entire process of obtaining a bearer token should be able to be reproduced with a single function call.

More APIs: Finding users

Relevant documentation

There are many APIs available; let’s explore one more. Let’s suppose we want to learn about the user @NotABubble` that the last tweet we found was in response to. Using the endpoint linked above we can retrieve information about individual users. First, let's get the user id and screen name of ``@NotAbubble.

>>> tweet['in_reply_to_user_id']
3294648321
>>> tweet['in_reply_to_screen_name']
'NotaBubble'

With this we can craft a request to get this user’s information.

params = {
    'user_id': tweet['in_reply_to_user_id'],
    'screen_name': tweet['in_reply_to_screen_name']
}

user_endpoint = 'https://api.twitter.com/1.1/users/show.json'
r = requests.get(user_endpoint, params=params, headers=headers)

if r.status_code != 200:
    raise RuntimeError('Bad status code: {}'.format(r.status_code)

user = r.json()

When I run this code, I get the following result.

>>> print(json.dumps(user, indent=4))
{
    "created_at": "Sat May 23 01:24:46 +0000 2015",
    "profile_text_color": "000000",
    "geo_enabled": false,
    "has_extended_profile": false,
    "profile_sidebar_border_color": "000000",
    "profile_background_tile": false,
    "entities": {
        "url": {
            "urls": [
                {
                    "display_url": "leadersinvestmentclub.com/the-team/",
                    "indices": [
                        0,
                        22
                    ],
                    "url": "http://t.co/ucbPJWbVpw",
                    "expanded_url": "http://leadersinvestmentclub.com/the-team/"
                }
            ]
        },
        "description": {
            "urls": []
        }
    },
    "verified": false,
    "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
    "utc_offset": null,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3294648321/1432344426",
    "profile_use_background_image": false,
    "url": "http://t.co/ucbPJWbVpw",
    "default_profile_image": false,
    "id": 3294648321,
    "notifications": null,
    "profile_location": null,
    "is_translation_enabled": false,
    "lang": "en",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/601921960986054658/hUh
Fx0hW_normal.jpg",
    "profile_sidebar_fill_color": "000000",
    "time_zone": null,
    "contributors_enabled": false,
    "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.pn
g",
    "id_str": "3294648321",
    "profile_background_color": "000000",
    "location": "",
    "description": "F+ rated analyst at Pumper Joffrey specializing in Not A Bubble",
    "profile_image_url": "http://pbs.twimg.com/profile_images/601921960986054658/hUhFx0hW_n
ormal.jpg",
    "protected": false,
    "profile_link_color": "3B94D9",
    "statuses_count": 1642,
    "followers_count": 94,
    "favourites_count": 37,
    "name": "Gene Shmunster",
    "screen_name": "NotaBubble",
    "following": null,
    "listed_count": 5,
    "is_translator": false,
    "status": {
        "created_at": "Mon Jan 11 17:51:13 +0000 2016",
        "coordinates": null,
        "in_reply_to_status_id_str": "686593687942467585",
        "retweeted": false,
        "favorite_count": 0,
        "id_str": "686606303578333184",
        "entities": {
            "user_mentions": [
                {
                    "indices": [
                        0,
                        15
                    ],
                    "name": "G Hawkins",
                    "id": 260559365,
                    "screen_name": "FilmProfessor9",
                    "id_str": "260559365"
                }
            ],
            "urls": [],
            "symbols": [],
            "hashtags": []
        },
        "geo": null,
        "text": "@FilmProfessor9 glad to see Gopro climbed some during my lunch... sell off
 was getting ridiculous.  Now at $15.4 again.  Yeesh!",
        "in_reply_to_user_id_str": "260559365",
        "lang": "en",
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "in_reply_to_screen_name": "FilmProfessor9",
        "contributors": null,
        "in_reply_to_status_id": 686593687942467585,
        "place": null,
        "id": 686606303578333184,
        "retweet_count": 0,
        "in_reply_to_user_id": 260559365,
        "truncated": false,
        "favorited": false
    },
    "follow_request_sent": null,
    "friends_count": 226,
    "default_profile": false
}