Twitter data ============ Our goal is to create a program (daemon) that will run every set amount of time to query Twitter for tweets matching a given search query. We want to then store these tweets in a database for later analysis. Goals ----- The goal of this project is to create a program that is run every set amount of time that retrieves recent tweets from Twitter matching a certain search query. The program should be able to - Retrieve tweets from Twitter matching any search criteria - Store retrieved tweets in a database - Handle network failures gracefully - Obey Twitter rate limits Introduction ------------ Luckily for us, Twitter offers and official API through which we can interact with their databases. To begin you will need to create an account with Twitter; we will use this account to request an API key for our application. You get get started by cloning the `orie-5270-twitter.git `_ repository. Documentation ------------- The Twitter API is well documented. The documentation home page is available at `https://dev.twitter.com/overview/documentation `_. Interesting sections include `authorizing your application `_ and the actual `REST API `_. Requesting an API key --------------------- Log into your Twitter account, navigate to `https://apps.twitter.com `_, and click the "Create New App" button. Once created you can click on your project to manage your API keys. `It is very important that you keep your API keys secret.` If you believe you have compromised your API key you need to go to this page and revoke the key immediately. First steps ----------- Your first goal should be to create a program that can request a which can be used by your program to make subsequent API calls. The previous link details the steps that need to be taken to achieve this; the following sections follow along with this documentation. Throughout this section we will use the same keys as in the documentation, namely we assume we have .. code-block:: python API_KEY = 'xvz1evFS4wEEPTGEFPHBog' API_SECRET_KEY = 'L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg' You will want to proceed with the keys for your application, although you should check to make sure your code correctly encodes these keys. Encoding the consumer key and secret key ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `Relevant documentation `_ We join the API key and API secret key with a colon to create the `bearer credentials`. .. code-block:: python bearer_credentials = '{}:{}'.format(API_KEY, API_SECRET_KEY) We then base64_ encode these credentials .. code-block:: python import base64 encoded = base64.b64encode(bytes(bearer_credentials, 'utf-8')) credentials = str(encoded, 'utf-8') Note that we first encode the ``bearer_credentials`` string as `UTF-8`_ data, and then decode it back to a standard Python string. .. _base64: https://en.wikipedia.org/wiki/Base64 .. _utf-8: https://en.wikipedia.org/wiki/UTF-8 .. _json: https://en.wikipedia.org/wiki/JSON Obtaining a bearer token ~~~~~~~~~~~~~~~~~~~~~~~~ `Relevant documentation `_ Once you have constructed your credentials from above, you are ready to request a bearer token from Twitter. Using the ``requests`` library, this can be done as follows. .. code-block:: python headers = { 'Authorization': 'Basic {}'.format(credentials), 'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8' } data = 'grant_type=client_credentials' auth_endpoint = 'https://api.twitter.com/oauth2/token' r = requests.post(auth_endpoint, headers=headers, data=data) The variable ``r`` is a ``response`` object that we should inspect for our result. First, it should return the 200 status code (success). .. code-block:: python if r.status_code != 200: raise RuntimeError('Bad status code: {}'.format(r.status_code)) You should of course use a more reasonable exception type, since your application will need to handle errors gracefully. Assuming you received the correct response, you now need to verify you were given a bearer token. We convert the content of the response to its JSON_ representation. .. code-block:: python resp = r.json() if resp['token_type'] != 'bearer': raise RuntimeError('Bad token type: {}'.format(resp['token_type']) Again, use a better exception type. Finally, we can get our bearer token. .. code-block:: python bearer_token = resp['access_token'] This token should be saved in your program to be used in future API calls. Note that this token will need to be refreshed periodically, so this entire process of obtaining a bearer token should be able to be reproduced with a single function call. Performing a search ~~~~~~~~~~~~~~~~~~~ `Relevant documentation `_ Now that you have a bearer token, you can start to query the Twitter API. To start, actually go to `twitter.com `_ and use the search bar to test your search query. Let's try searching for '$UBER', the stock ticker symbol for `Uber `_. You should see a variety of tweets related to Uber stock data. Feel free to modify your search term to try to get more interesting results. Once you are confident your search term returns results you'd like, let's have our application run this search. .. code-block:: python import requests headers = { 'Authorization': 'Bearer {}'.format(bearer_token), } params = { 'q': '$UBER', 'count': 3, 'result_type': 'recent' } search_endpoint = 'https://api.twitter.com/1.1/search/tweets.json' r = requests.get(search_endpoint, params=params, headers=headers) if r.status_code != 200: raise RuntimeError('Bad status code: {}'.format(r.status_code) tweets = r.json() The variable ``tweets`` will now contain (up to) 3 tweets. Below I've listed the first tweet I get when I execute this search. .. code-block:: python >>> import json >>> tweet = tweets['statuses'][0] >>> print(json.dumps(tweet, indent=4)) { "place": { "country": "United States", "name": "Manhattan", "id": "01a9a39529b27f36", "country_code": "US", "contained_within": [], "url": "https://api.twitter.com/1.1/geo/id/01a9a39529b27f36.json", "bounding_box": { "type": "Polygon", "coordinates": [ [ [ -74.026675, 40.683935 ], [ -73.910408, 40.683935 ], [ -73.910408, 40.877483 ], [ -74.026675, 40.877483 ] ] ] }, "attributes": {}, "place_type": "city", "full_name": "Manhattan, NY" }, "in_reply_to_status_id_str": null, "lang": "en", "in_reply_to_user_id": 3294648321, "entities": { "hashtags": [], "user_mentions": [ { "name": "Gene Shmunster", "screen_name": "NotaBubble", "indices": [ 0, 11 ], "id": 3294648321, "id_str": "3294648321" } ], "symbols": [ { "indices": [ 37, 42 ], "text": "UBER" } ], "urls": [] }, "favorite_count": 0, "contributors": null, "in_reply_to_user_id_str": "3294648321", "retweeted": false, "is_quote_status": false, "text": "@NotaBubble 62 Billion valuation on $UBER ? Talk about Bubbles. They need to talk it up so that they can raise even more cash via idiots.", "retweet_count": 0, "source": "Twitter Web Client", "in_reply_to_status_id": null, "favorited": false, "geo": null, "coordinates": null, "created_at": "Mon Jan 11 17:01:06 +0000 2016", "id": 686593687942467585, "metadata": { "result_type": "recent", "iso_language_code": "en" }, "in_reply_to_screen_name": "NotaBubble", "id_str": "686593687942467585", "user": { "follow_request_sent": null, "screen_name": "FilmProfessor9", "listed_count": 27, "has_extended_profile": false, "profile_image_url_https": "https://pbs.twimg.com/profile_images/1371916509/zaustin _normal.jpg", "lang": "en", "profile_image_url": "http://pbs.twimg.com/profile_images/1371916509/zaustin_normal .jpg", "name": "G Hawkins", "utc_offset": null, "entities": { "description": { "urls": [] } }, "geo_enabled": true, "profile_banner_url": "https://pbs.twimg.com/profile_banners/260559365/1407938044", "url": null, "description": "Former Wall St Bond Trader. Now Teach & Trade my own Money. MIT PhD . Invest in Tech & Growth: $AAPL $FB $SBUX $DIS #PinkFloyd #FFNOW #PeteRose #StarTrek $NFLX ", "profile_sidebar_border_color": "4D044D", "protected": false, "followers_count": 373, "time_zone": null, "profile_link_color": "709917", "is_translation_enabled": false, "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/243 623269/barcino_245.JPG", "verified": false, "following": null, "favourites_count": 3990, "statuses_count": 4747, "default_profile_image": false, "profile_background_tile": true, "default_profile": false, "profile_sidebar_fill_color": "080508", "profile_use_background_image": true, "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_ima ges/243623269/barcino_245.JPG", "friends_count": 864, "profile_background_color": "030003", "created_at": "Fri Mar 04 03:24:18 +0000 2011", "id": 260559365, "contributors_enabled": false, "is_translator": false, "notifications": null, "location": "New York City ", "profile_text_color": "FAF0FA", "id_str": "260559365" }, "truncated": false } Obviously there is a huge amount of information here, which is great from a machine learning perspective. Scanning the document we can see that the content can be found with .. code-block:: python >>> tweet['text'] '@NotaBubble 62 Billion valuation on $UBER ? Talk about Bubbles. They need to talk it up so that they can raise even more cash via idiots.' Other interesting fields are 'retweeted', 'in_reply_to_user_id', 'place', and 'created_at'. More APIs: Finding users ~~~~~~~~~~~~~~~~~~~~~~~~ `Relevant documentation `_ There are `many `_ APIs available; let's explore one more. Let's suppose we want to learn about the user ``@NotABubble` that the last tweet we found was in response to. Using the endpoint linked above we can retrieve information about individual users. First, let's get the user id and screen name of ``@NotAbubble``. .. code-block:: python >>> tweet['in_reply_to_user_id'] 3294648321 >>> tweet['in_reply_to_screen_name'] 'NotaBubble' With this we can craft a request to get this user's information. .. code-block:: python params = { 'user_id': tweet['in_reply_to_user_id'], 'screen_name': tweet['in_reply_to_screen_name'] } user_endpoint = 'https://api.twitter.com/1.1/users/show.json' r = requests.get(user_endpoint, params=params, headers=headers) if r.status_code != 200: raise RuntimeError('Bad status code: {}'.format(r.status_code) user = r.json() When I run this code, I get the following result. .. code-block:: python >>> print(json.dumps(user, indent=4)) { "created_at": "Sat May 23 01:24:46 +0000 2015", "profile_text_color": "000000", "geo_enabled": false, "has_extended_profile": false, "profile_sidebar_border_color": "000000", "profile_background_tile": false, "entities": { "url": { "urls": [ { "display_url": "leadersinvestmentclub.com/the-team/", "indices": [ 0, 22 ], "url": "http://t.co/ucbPJWbVpw", "expanded_url": "http://leadersinvestmentclub.com/the-team/" } ] }, "description": { "urls": [] } }, "verified": false, "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png", "utc_offset": null, "profile_banner_url": "https://pbs.twimg.com/profile_banners/3294648321/1432344426", "profile_use_background_image": false, "url": "http://t.co/ucbPJWbVpw", "default_profile_image": false, "id": 3294648321, "notifications": null, "profile_location": null, "is_translation_enabled": false, "lang": "en", "profile_image_url_https": "https://pbs.twimg.com/profile_images/601921960986054658/hUh Fx0hW_normal.jpg", "profile_sidebar_fill_color": "000000", "time_zone": null, "contributors_enabled": false, "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.pn g", "id_str": "3294648321", "profile_background_color": "000000", "location": "", "description": "F+ rated analyst at Pumper Joffrey specializing in Not A Bubble", "profile_image_url": "http://pbs.twimg.com/profile_images/601921960986054658/hUhFx0hW_n ormal.jpg", "protected": false, "profile_link_color": "3B94D9", "statuses_count": 1642, "followers_count": 94, "favourites_count": 37, "name": "Gene Shmunster", "screen_name": "NotaBubble", "following": null, "listed_count": 5, "is_translator": false, "status": { "created_at": "Mon Jan 11 17:51:13 +0000 2016", "coordinates": null, "in_reply_to_status_id_str": "686593687942467585", "retweeted": false, "favorite_count": 0, "id_str": "686606303578333184", "entities": { "user_mentions": [ { "indices": [ 0, 15 ], "name": "G Hawkins", "id": 260559365, "screen_name": "FilmProfessor9", "id_str": "260559365" } ], "urls": [], "symbols": [], "hashtags": [] }, "geo": null, "text": "@FilmProfessor9 glad to see Gopro climbed some during my lunch... sell off was getting ridiculous. Now at $15.4 again. Yeesh!", "in_reply_to_user_id_str": "260559365", "lang": "en", "source": "Twitter Web Client", "in_reply_to_screen_name": "FilmProfessor9", "contributors": null, "in_reply_to_status_id": 686593687942467585, "place": null, "id": 686606303578333184, "retweet_count": 0, "in_reply_to_user_id": 260559365, "truncated": false, "favorited": false }, "follow_request_sent": null, "friends_count": 226, "default_profile": false } .. LocalWords: API