Twitter data
============

Our goal is to create a program (daemon) that will run every set
amount of time to query Twitter for tweets matching a given search
query. We want to then store these tweets in a database for later
analysis.


Goals
-----

The goal of this project is to create a program that is run every set
amount of time that retrieves recent tweets from Twitter matching a
certain search query. The program should be able to

- Retrieve tweets from Twitter matching any search criteria

- Store retrieved tweets in a database

- Handle network failures gracefully

- Obey Twitter rate limits


Introduction
------------

Luckily for us, Twitter offers and official API through which we can
interact with their databases. To begin you will need to create an
account with Twitter; we will use this account to request an API key
for our application. You get get started by cloning the
`orie-5270-twitter.git <orie-5270-twitter.git>`_ repository.


Documentation
-------------

The Twitter API is well documented. The documentation home page is
available at `https://dev.twitter.com/overview/documentation
<https://dev.twitter.com/overview/documentation>`_. Interesting
sections include `authorizing your application
<https://dev.twitter.com/oauth>`_ and the actual `REST API
<https://dev.twitter.com/rest/public>`_.


Requesting an API key
---------------------

Log into your Twitter account, navigate to
`https://apps.twitter.com <https://apps.twitter.com>`_, and click the
"Create New App" button. Once created you can click on your project to
manage your API keys. `It is very important that you keep your API
keys secret.` If you believe you have compromised your API key you
need to go to this page and revoke the key immediately.


First steps
-----------

Your first goal should be to create a program that can request a which
can be used by your program to make subsequent API calls. The previous
link details the steps that need to be taken to achieve this; the
following sections follow along with this documentation.

Throughout this section we will use the same keys as in the
documentation, namely we assume we have

.. code-block:: python

   API_KEY = 'xvz1evFS4wEEPTGEFPHBog'
   API_SECRET_KEY = 'L8qq9PZyRg6ieKGEKhZolGC0vJWLw8iEJ88DRdyOg'

You will want to proceed with the keys for your application, although
you should check to make sure your code correctly encodes these keys.


Encoding the consumer key and secret key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`Relevant documentation <https://dev.twitter.com/oauth/application-only>`_

We join the API key and API secret key with a colon to create the
`bearer credentials`.

.. code-block:: python

   bearer_credentials = '{}:{}'.format(API_KEY, API_SECRET_KEY)

We then base64_ encode these credentials

.. code-block:: python

   import base64
   encoded = base64.b64encode(bytes(bearer_credentials, 'utf-8'))
   credentials = str(encoded, 'utf-8')

Note that we first encode the ``bearer_credentials`` string as
`UTF-8`_ data, and then decode it back to a standard Python string.

.. _base64: https://en.wikipedia.org/wiki/Base64

.. _utf-8: https://en.wikipedia.org/wiki/UTF-8

.. _json: https://en.wikipedia.org/wiki/JSON


Obtaining a bearer token
~~~~~~~~~~~~~~~~~~~~~~~~

`Relevant documentation <https://dev.twitter.com/oauth/application-only>`_

Once you have constructed your credentials from above, you are ready
to request a bearer token from Twitter. Using the ``requests``
library, this can be done as follows.

.. code-block:: python

   headers = {
       'Authorization': 'Basic {}'.format(credentials),
       'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8'
   }
   data = 'grant_type=client_credentials'
   auth_endpoint = 'https://api.twitter.com/oauth2/token'
   r = requests.post(auth_endpoint, headers=headers, data=data)

The variable ``r`` is a ``response`` object that we should inspect for
our result. First, it should return the 200 status code (success).

.. code-block:: python

   if r.status_code != 200:
       raise RuntimeError('Bad status code: {}'.format(r.status_code))

You should of course use a more reasonable exception type, since your
application will need to handle errors gracefully. Assuming you
received the correct response, you now need to verify you were given a
bearer token. We convert the content of the response to its JSON_
representation.

.. code-block:: python

   resp = r.json()
   if resp['token_type'] != 'bearer':
       raise RuntimeError('Bad token type: {}'.format(resp['token_type'])

Again, use a better exception type. Finally, we can get our bearer
token.

.. code-block:: python
                
   bearer_token = resp['access_token']

This token should be saved in your program to be used in future API
calls. Note that this token will need to be refreshed periodically, so
this entire process of obtaining a bearer token should be able to be
reproduced with a single function call.


Performing a search
~~~~~~~~~~~~~~~~~~~

`Relevant documentation <https://dev.twitter.com/rest/public/search>`_

Now that you have a bearer token, you can start to query the Twitter
API. To start, actually go to `twitter.com <https://twitter.com>`_ and
use the search bar to test your search query. Let's try searching for
'$UBER', the stock ticker symbol for `Uber <https://uber.com>`_. You
should see a variety of tweets related to Uber stock data. Feel free
to modify your search term to try to get more interesting
results. Once you are confident your search term returns results you'd
like, let's have our application run this search.

.. code-block:: python

   import requests

   headers = {
       'Authorization': 'Bearer {}'.format(bearer_token),
   }

   params = {
       'q': '$UBER',
       'count': 3,
       'result_type': 'recent'
   }

   search_endpoint = 'https://api.twitter.com/1.1/search/tweets.json'
   r = requests.get(search_endpoint, params=params, headers=headers)

   if r.status_code != 200:
       raise RuntimeError('Bad status code: {}'.format(r.status_code)

   tweets = r.json()

The variable ``tweets`` will now contain (up to) 3 tweets. Below I've
listed the first tweet I get when I execute this search.

.. code-block:: python

   >>> import json
   >>> tweet = tweets['statuses'][0]
   >>> print(json.dumps(tweet, indent=4))
   {
       "place": {
           "country": "United States",
           "name": "Manhattan",
           "id": "01a9a39529b27f36",
           "country_code": "US",
           "contained_within": [],
           "url": "https://api.twitter.com/1.1/geo/id/01a9a39529b27f36.json",
           "bounding_box": {
               "type": "Polygon",
               "coordinates": [
                   [
                       [
                           -74.026675,
                           40.683935
                       ],
                       [
                           -73.910408,
                           40.683935
                       ],
                       [
                           -73.910408,
                           40.877483
                       ],
                       [
                           -74.026675,
                           40.877483
                       ]
                   ]
               ]
           },
           "attributes": {},
           "place_type": "city",
           "full_name": "Manhattan, NY"
       },
       "in_reply_to_status_id_str": null,
       "lang": "en",
       "in_reply_to_user_id": 3294648321,
       "entities": {
           "hashtags": [],
           "user_mentions": [
               {
                   "name": "Gene Shmunster",
                   "screen_name": "NotaBubble",
                   "indices": [
                       0,
                       11
                   ],
                   "id": 3294648321,
                   "id_str": "3294648321"
               }
           ],
           "symbols": [
               {
                   "indices": [
                       37,
                       42
                   ],
                   "text": "UBER"
               }
           ],
           "urls": []
       },
       "favorite_count": 0,
       "contributors": null,
       "in_reply_to_user_id_str": "3294648321",
       "retweeted": false,
       "is_quote_status": false,
       "text": "@NotaBubble  62 Billion valuation on $UBER ?  Talk about Bubbles. They need to
    talk it up so that they can raise even more cash via idiots.",                           
       "retweet_count": 0,
       "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
       "in_reply_to_status_id": null,
       "favorited": false,
       "geo": null,
       "coordinates": null,
       "created_at": "Mon Jan 11 17:01:06 +0000 2016",
       "id": 686593687942467585,
       "metadata": {
           "result_type": "recent",
           "iso_language_code": "en"
       },
       "in_reply_to_screen_name": "NotaBubble",
       "id_str": "686593687942467585",
       "user": {
           "follow_request_sent": null,
           "screen_name": "FilmProfessor9",
           "listed_count": 27,
           "has_extended_profile": false,
           "profile_image_url_https": "https://pbs.twimg.com/profile_images/1371916509/zaustin
   _normal.jpg",                                                                             
           "lang": "en",
           "profile_image_url": "http://pbs.twimg.com/profile_images/1371916509/zaustin_normal
   .jpg",                                                                                    
           "name": "G Hawkins",
           "utc_offset": null,
           "entities": {
               "description": {
                   "urls": []
               }
           },
           "geo_enabled": true,
           "profile_banner_url": "https://pbs.twimg.com/profile_banners/260559365/1407938044",
           "url": null,
           "description": "Former Wall St Bond Trader. Now Teach & Trade my own Money. MIT PhD
   . Invest in Tech & Growth: $AAPL $FB $SBUX $DIS #PinkFloyd #FFNOW #PeteRose #StarTrek $NFLX
   ",                                                                                        
           "profile_sidebar_border_color": "4D044D",
           "protected": false,
           "followers_count": 373,
           "time_zone": null,
           "profile_link_color": "709917",
           "is_translation_enabled": false,
           "profile_background_image_url": "http://pbs.twimg.com/profile_background_images/243
   623269/barcino_245.JPG",                                                                  
           "verified": false,
           "following": null,
           "favourites_count": 3990,
           "statuses_count": 4747,
           "default_profile_image": false,
           "profile_background_tile": true,
           "default_profile": false,
           "profile_sidebar_fill_color": "080508",
           "profile_use_background_image": true,
           "profile_background_image_url_https": "https://pbs.twimg.com/profile_background_ima
   ges/243623269/barcino_245.JPG",                                                           
           "friends_count": 864,
           "profile_background_color": "030003",
           "created_at": "Fri Mar 04 03:24:18 +0000 2011",
           "id": 260559365,
           "contributors_enabled": false,
           "is_translator": false,
           "notifications": null,
           "location": "New York City ",
           "profile_text_color": "FAF0FA",
           "id_str": "260559365"
       },
       "truncated": false
   }

Obviously there is a huge amount of information here, which is great
from a machine learning perspective. Scanning the document we can see
that the content can be found with

.. code-block:: python

   >>> tweet['text']
   '@NotaBubble  62 Billion valuation on $UBER ?  Talk about Bubbles. They need to talk it up so that they can raise even more cash via idiots.'

Other interesting fields are 'retweeted', 'in_reply_to_user_id',
'place', and 'created_at'.


More APIs: Finding users
~~~~~~~~~~~~~~~~~~~~~~~~

`Relevant documentation <https://dev.twitter.com/rest/reference/get/users/show>`_

There are `many <https://dev.twitter.com/rest/public>`_ APIs
available; let's explore one more. Let's suppose we want to learn
about the user ``@NotABubble` that the last tweet we found was in
response to. Using the endpoint linked above we can retrieve
information about individual users. First, let's get the user id and
screen name of ``@NotAbubble``.

.. code-block:: python

   >>> tweet['in_reply_to_user_id']
   3294648321
   >>> tweet['in_reply_to_screen_name']
   'NotaBubble'

With this we can craft a request to get this user's information.

.. code-block:: python

   params = {
       'user_id': tweet['in_reply_to_user_id'],
       'screen_name': tweet['in_reply_to_screen_name']
   }

   user_endpoint = 'https://api.twitter.com/1.1/users/show.json'
   r = requests.get(user_endpoint, params=params, headers=headers)

   if r.status_code != 200:
       raise RuntimeError('Bad status code: {}'.format(r.status_code)

   user = r.json()

When I run this code, I get the following result.

.. code-block:: python

   >>> print(json.dumps(user, indent=4))
   {
       "created_at": "Sat May 23 01:24:46 +0000 2015",
       "profile_text_color": "000000",
       "geo_enabled": false,
       "has_extended_profile": false,
       "profile_sidebar_border_color": "000000",
       "profile_background_tile": false,
       "entities": {
           "url": {
               "urls": [
                   {
                       "display_url": "leadersinvestmentclub.com/the-team/",
                       "indices": [
                           0,
                           22
                       ],
                       "url": "http://t.co/ucbPJWbVpw",
                       "expanded_url": "http://leadersinvestmentclub.com/the-team/"
                   }
               ]
           },
           "description": {
               "urls": []
           }
       },
       "verified": false,
       "profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
       "utc_offset": null,
       "profile_banner_url": "https://pbs.twimg.com/profile_banners/3294648321/1432344426",
       "profile_use_background_image": false,
       "url": "http://t.co/ucbPJWbVpw",
       "default_profile_image": false,
       "id": 3294648321,
       "notifications": null,
       "profile_location": null,
       "is_translation_enabled": false,
       "lang": "en",
       "profile_image_url_https": "https://pbs.twimg.com/profile_images/601921960986054658/hUh
   Fx0hW_normal.jpg",                                                                        
       "profile_sidebar_fill_color": "000000",
       "time_zone": null,
       "contributors_enabled": false,
       "profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.pn
   g",                                                                                       
       "id_str": "3294648321",
       "profile_background_color": "000000",
       "location": "",
       "description": "F+ rated analyst at Pumper Joffrey specializing in Not A Bubble",
       "profile_image_url": "http://pbs.twimg.com/profile_images/601921960986054658/hUhFx0hW_n
   ormal.jpg",                                                                               
       "protected": false,
       "profile_link_color": "3B94D9",
       "statuses_count": 1642,
       "followers_count": 94,
       "favourites_count": 37,
       "name": "Gene Shmunster",
       "screen_name": "NotaBubble",
       "following": null,
       "listed_count": 5,
       "is_translator": false,
       "status": {
           "created_at": "Mon Jan 11 17:51:13 +0000 2016",
           "coordinates": null,
           "in_reply_to_status_id_str": "686593687942467585",
           "retweeted": false,
           "favorite_count": 0,
           "id_str": "686606303578333184",
           "entities": {
               "user_mentions": [
                   {
                       "indices": [
                           0,
                           15
                       ],
                       "name": "G Hawkins",
                       "id": 260559365,
                       "screen_name": "FilmProfessor9",
                       "id_str": "260559365"
                   }
               ],
               "urls": [],
               "symbols": [],
               "hashtags": []
           },
           "geo": null,
           "text": "@FilmProfessor9 glad to see Gopro climbed some during my lunch... sell off
    was getting ridiculous.  Now at $15.4 again.  Yeesh!",                                   
           "in_reply_to_user_id_str": "260559365",
           "lang": "en",
           "source": "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>",
           "in_reply_to_screen_name": "FilmProfessor9",
           "contributors": null,
           "in_reply_to_status_id": 686593687942467585,
           "place": null,
           "id": 686606303578333184,
           "retweet_count": 0,
           "in_reply_to_user_id": 260559365,
           "truncated": false,
           "favorited": false
       },
       "follow_request_sent": null,
       "friends_count": 226,
       "default_profile": false
   }
   
..  LocalWords:  API