Skip to main content

Mutable Ideas

Tag: data

Twitter JSON Manipulation

Today I had to quickly find the most frequent Hashtags on my smallish dataset. After some research I just found a awesome shell tool to manipulate json: jq a json grep+sed+awk tool With jq everything else was simple, just pipeline a few commands: $ cat tweets.json | \ jq -r '.entities.hashtags[].text' | sort | uniq -c | \ sort -nr | $ cat tweets.json | \ jq '.text' | \ # select the text field on my JSON tr 'A-Z' 'a-z' | \ # convert text to lower case egrep -oe'#[0-9a-z_]+' | \ # select the hashtag sort | uniq -c | \ # count the number of different hashtags sort -nr | head -10 # reverse sort by frequency and get top 10 A couple of minutes later, the output was: