Twitter data¶

Our goal is to create a program (daemon) that will run every set amount of time to query Twitter for tweets matching a given search query. We want to then store these tweets in a database for later analysis.

Goals¶

The goal of this project is to create a program that is run every set amount of time that retrieves recent tweets from Twitter matching a certain search query. The program should be able to

Retrieve tweets from Twitter matching any search criteria
Store retrieved tweets in a database
Handle network failures gracefully
Obey Twitter rate limits

Introduction¶

Luckily for us, Twitter offers and official API through which we can interact with their databases. To begin you will need to create an account with Twitter; we will use this account to request an API key for our application. You get get started by cloning the orie-5270-twitter.git repository.

Documentation¶

The Twitter API is well documented. The documentation home page is available at https://dev.twitter.com/overview/documentation. Interesting sections include authorizing your application and the actual REST API.

Requesting an API key¶

Log into your Twitter account, navigate to https://apps.twitter.com, and click the “Create New App” button. Once created you can click on your project to manage your API keys. It is very important that you keep your API keys secret. If you believe you have compromised your API key you need to go to this page and revoke the key immediately.

First steps¶

Your first goal should be to create a program that can request a which can be used by your program to make subsequent API calls. The previous link details the steps that need to be taken to achieve this; the following sections follow along with this documentation.

Throughout this section we will use the same keys as in the documentation, namely we assume we have

API_KEY = 'xvz1evFS4wEEPTGEFPHBog'
API_SECRET_KEY = 'L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg'

You will want to proceed with the keys for your application, although you should check to make sure your code correctly encodes these keys.

Encoding the consumer key and secret key¶

Relevant documentation

We join the API key and API secret key with a colon to create the bearer credentials.

bearer_credentials = '{}:{}'.format(API_KEY, API_SECRET_KEY)

We then base64 encode these credentials

import base64
encoded = base64.b64encode(bytes(bearer_credentials, 'utf-8'))
credentials = str(encoded, 'utf-8')

Note that we first encode the bearer_credentials string as UTF-8 data, and then decode it back to a standard Python string.

Obtaining a bearer token¶

Relevant documentation

Once you have constructed your credentials from above, you are ready to request a bearer token from Twitter. Using the requests library, this can be done as follows.

headers = {
    'Authorization': 'Basic {}'.format(credentials),
    'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
}
data = 'grant_type=client_credentials'
auth_endpoint = 'https://api.twitter.com/oauth2/token'
r = requests.post(auth_endpoint, headers=headers, data=data)

The variable r is a response object that we should inspect for our result. First, it should return the 200 status code (success).

if r.status_code != 200:
    raise RuntimeError('Bad status code: {}'.format(r.status_code))

You should of course use a more reasonable exception type, since your application will need to handle errors gracefully. Assuming you received the correct response, you now need to verify you were given a bearer token. We convert the content of the response to its JSON representation.

resp = r.json()
if resp['token_type'] != 'bearer':
    raise RuntimeError('Bad token type: {}'.format(resp['token_type'])

Again, use a better exception type. Finally, we can get our bearer token.

bearer_token = resp['access_token']

This token should be saved in your program to be used in future API calls. Note that this token will need to be refreshed periodically, so this entire process of obtaining a bearer token should be able to be reproduced with a single function call.

Performing a search¶

Relevant documentation

Now that you have a bearer token, you can start to query the Twitter API. To start, actually go to twitter.com and use the search bar to test your search query. Let’s try searching for ‘$UBER’, the stock ticker symbol for Uber. You should see a variety of tweets related to Uber stock data. Feel free to modify your search term to try to get more interesting results. Once you are confident your search term returns results you’d like, let’s have our application run this search.

import requests

headers = {
    'Authorization': 'Bearer {}'.format(bearer_token),
}

params = {
    'q': '$UBER',
    'count': 3,
    'result_type': 'recent'
}

search_endpoint = 'https://api.twitter.com/1.1/search/tweets.json'
r = requests.get(search_endpoint, params=params, headers=headers)

if r.status_code != 200:
    raise RuntimeError('Bad status code: {}'.format(r.status_code)

tweets = r.json()

The variable tweets will now contain (up to) 3 tweets. Below I’ve listed the first tweet I get when I execute this search.

>>> import json
>>> tweet = tweets['statuses'][0]
>>> print(json.dumps(tweet, indent=4))
{
    "place": {
        "country": "United States",
        "name": "Manhattan",
        "id": "01a9a39529b27f36",
        "country_code": "US",
        "contained_within": [],
        "url": "https://api.twitter.com/1.1/geo/id/01a9a39529b27f36.json",
        "bounding_box": {
            "type": "Polygon",
            "coordinates": [
                [
                    [
                        -74.026675,
                        40.683935
                    ],
                    [
                        -73.910408,
                        40.683935
                    ],
                    [
                        -73.910408,
                        40.877483
                    ],
                    [
                        -74.026675,
                        40.877483
                    ]
                ]
            ]
        },
        "attributes": {},
        "place_type": "city",
        "full_name": "Manhattan, NY"
    },
    "in_reply_to_status_id_str": null,
    "lang": "en",
    "in_reply_to_user_id": 3294648321,
    "entities": {
        "hashtags": [],
        "user_mentions": [
            {
                "name": "Gene Shmunster",
                "screen_name": "NotaBubble",
                "indices": [
                    0,
                    11
                ],
                "id": 3294648321,
                "id_str": "3294648321"
            }
        ],
        "symbols": [
            {
                "indices": [
                    37,
                    42
                ],
                "text": "UBER"
            }
        ],
        "urls": []
    },
    "favorite_count": 0,
    "contributors": null,
    "in_reply_to_user_id_str": "3294648321",
    "retweeted": false,
    "is_quote_status": false,
    "text": "@NotaBubble  62 Billion valuation on $UBER ?  Talk about Bubbles. They need to
 talk it up so that they can raise even more cash via idiots.",
    "retweet_count": 0,
    "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
    "in_reply_to_status_id": null,
    "favorited": false,
    "geo": null,
    "coordinates": null,
    "created_at": "Mon Jan 11 17:01:06 +0000 2016",
    "id": 686593687942467585,
    "metadata": {
        "result_type": "recent",
        "iso_language_code": "en"
    },
    "in_reply_to_screen_name": "NotaBubble",
    "id_str": "686593687942467585",
    "user": {
        "follow_request_sent": null,
        "screen_name": "FilmProfessor9",
        "listed_count": 27,
        "has_extended_profile": false,
        "profile_image_url_https": "https://pbs.twimg.com/profile_images/1371916509/zaustin
_normal.jpg",
        "lang": "en",
        "profile_image_url": "http://pbs.twimg.com/profile_images/1371916509/zaustin_normal
.jpg",
        "name": "G Hawkins",
        "utc_offset": null,
        "entities": {
            "description": {
                "urls": []
            }
        },
        "geo_enabled": true,
        "profile_banner_url": "https://pbs.twimg.com/profile_banners/260559365/1407938044",
        "url": null,
        "description": "Former Wall St Bond Trader. Now Teach & Trade my own Money. MIT PhD
. Invest in Tech & Growth: $AAPL $FB $SBUX $DIS #PinkFloyd #FFNOW #PeteRose #StarTrek $NFLX
",
        "profile_sidebar_border_color": "4D044D",
        "protected": false,
        "followers_count": 373,
        "time_zone": null,
        "profile_link_color": "709917",
        "is_translation_enabled": false,
        "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/243
623269/barcino_245.JPG",
        "verified": false,
        "following": null,
        "favourites_count": 3990,
        "statuses_count": 4747,
        "default_profile_image": false,
        "profile_background_tile": true,
        "default_profile": false,
        "profile_sidebar_fill_color": "080508",
        "profile_use_background_image": true,
        "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_ima
ges/243623269/barcino_245.JPG",
        "friends_count": 864,
        "profile_background_color": "030003",
        "created_at": "Fri Mar 04 03:24:18 +0000 2011",
        "id": 260559365,
        "contributors_enabled": false,
        "is_translator": false,
        "notifications": null,
        "location": "New York City ",
        "profile_text_color": "FAF0FA",
        "id_str": "260559365"
    },
    "truncated": false
}

Obviously there is a huge amount of information here, which is great from a machine learning perspective. Scanning the document we can see that the content can be found with

>>> tweet['text']
'@NotaBubble  62 Billion valuation on $UBER ?  Talk about Bubbles. They need to talk it up so that they can raise even more cash via idiots.'

Other interesting fields are ‘retweeted’, ‘in_reply_to_user_id’, ‘place’, and ‘created_at’.

More APIs: Finding users¶

Relevant documentation

There are many APIs available; let’s explore one more. Let’s suppose we want to learn about the user @NotABubble` that the last tweet we found was in response to. Using the endpoint linked above we can retrieve information about individual users. First, let's get the user id and screen name of ``@NotAbubble.

>>> tweet['in_reply_to_user_id']
3294648321
>>> tweet['in_reply_to_screen_name']
'NotaBubble'

With this we can craft a request to get this user’s information.

params = {
    'user_id': tweet['in_reply_to_user_id'],
    'screen_name': tweet['in_reply_to_screen_name']
}

user_endpoint = 'https://api.twitter.com/1.1/users/show.json'
r = requests.get(user_endpoint, params=params, headers=headers)

if r.status_code != 200:
    raise RuntimeError('Bad status code: {}'.format(r.status_code)

user = r.json()

When I run this code, I get the following result.

>>> print(json.dumps(user, indent=4))
{
    "created_at": "Sat May 23 01:24:46 +0000 2015",
    "profile_text_color": "000000",
    "geo_enabled": false,
    "has_extended_profile": false,
    "profile_sidebar_border_color": "000000",
    "profile_background_tile": false,
    "entities": {
        "url": {
            "urls": [
                {
                    "display_url": "leadersinvestmentclub.com/the-team/",
                    "indices": [
                        0,
                        22
                    ],
                    "url": "http://t.co/ucbPJWbVpw",
                    "expanded_url": "http://leadersinvestmentclub.com/the-team/"
                }
            ]
        },
        "description": {
            "urls": []
        }
    },
    "verified": false,
    "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
    "utc_offset": null,
    "profile_banner_url": "https://pbs.twimg.com/profile_banners/3294648321/1432344426",
    "profile_use_background_image": false,
    "url": "http://t.co/ucbPJWbVpw",
    "default_profile_image": false,
    "id": 3294648321,
    "notifications": null,
    "profile_location": null,
    "is_translation_enabled": false,
    "lang": "en",
    "profile_image_url_https": "https://pbs.twimg.com/profile_images/601921960986054658/hUh
Fx0hW_normal.jpg",
    "profile_sidebar_fill_color": "000000",
    "time_zone": null,
    "contributors_enabled": false,
    "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.pn
g",
    "id_str": "3294648321",
    "profile_background_color": "000000",
    "location": "",
    "description": "F+ rated analyst at Pumper Joffrey specializing in Not A Bubble",
    "profile_image_url": "http://pbs.twimg.com/profile_images/601921960986054658/hUhFx0hW_n
ormal.jpg",
    "protected": false,
    "profile_link_color": "3B94D9",
    "statuses_count": 1642,
    "followers_count": 94,
    "favourites_count": 37,
    "name": "Gene Shmunster",
    "screen_name": "NotaBubble",
    "following": null,
    "listed_count": 5,
    "is_translator": false,
    "status": {
        "created_at": "Mon Jan 11 17:51:13 +0000 2016",
        "coordinates": null,
        "in_reply_to_status_id_str": "686593687942467585",
        "retweeted": false,
        "favorite_count": 0,
        "id_str": "686606303578333184",
        "entities": {
            "user_mentions": [
                {
                    "indices": [
                        0,
                        15
                    ],
                    "name": "G Hawkins",
                    "id": 260559365,
                    "screen_name": "FilmProfessor9",
                    "id_str": "260559365"
                }
            ],
            "urls": [],
            "symbols": [],
            "hashtags": []
        },
        "geo": null,
        "text": "@FilmProfessor9 glad to see Gopro climbed some during my lunch... sell off
 was getting ridiculous.  Now at $15.4 again.  Yeesh!",
        "in_reply_to_user_id_str": "260559365",
        "lang": "en",
        "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
        "in_reply_to_screen_name": "FilmProfessor9",
        "contributors": null,
        "in_reply_to_status_id": 686593687942467585,
        "place": null,
        "id": 686606303578333184,
        "retweet_count": 0,
        "in_reply_to_user_id": 260559365,
        "truncated": false,
        "favorited": false
    },
    "follow_request_sent": null,
    "friends_count": 226,
    "default_profile": false
}

Twitter data¶

Goals¶

Introduction¶

Documentation¶

Requesting an API key¶

First steps¶

Encoding the consumer key and secret key¶

Obtaining a bearer token¶

Performing a search¶

More APIs: Finding users¶

Table Of Contents

Related Topics

This Page