Create your own dataset consuming Twitter API
Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.
##
OAuth2
Arguably OAuth2 needs a lot of heavy lift to authenticate when compared with other methods. To bypass the boilerplate we can use Curlicue, which is a small wrapper script that invokes curl with the necessary headers for OAuth.
Just download Curlicue and install it on your system. On my Mac I cloned the git repo and ran install
:
git clone [email protected]:decklin/curlicue.git
install curlicue curl-encode curlicue-setup contrib/twitpull /usr/local/bin
#
Setup
You need to create your Twitter’s application credentials. Create your own here and run your setup:
curlicue-setup \
'https://api.twitter.com/oauth/request_token' \
'https://api.twitter.com/oauth/authorize?oauth_token=$oauth_token' \
'https://api.twitter.com/oauth/access_token' \
~/.credentials
##
A Few Examples
Below there are a few examples on how to query Twitter:
##
Tweets Sent by Me
twitpull -f ~/.credentials \
statuses/user_timeline \
screen_name=arjones \
count=200 \
include_rts=1 > arjones.json
##
Tweets Mentioning Me
twitpull -f ~/.credentials \
statuses/mentions_timeline \
count=200 \
contributor_details=true \
include_rts=1 > arjones-mentions.json
##
Searching Hashtags
twitpull -f ~/.credentials \
search/tweets \
'q=#scala OR #data' \
count=100 \
include_entities=true > my-dataset.json
##
Consuming Streaming API
If you want to consume Twitter’s Streaming API there is the official client, hbc at github.