Creating a beautiful tagcloud from hashtags
Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.
Running the above command on your Twitter data, you will extract the top 100 must frequent hashtags. Go ahead and edit the file manually to remove irrelevant or too frequent hashtags.
$ cat tweets.json | \
jq -r '.entities.hashtags[].text' | tr 'A-Z' 'a-z' | \
sort | uniq -c | sort -nr | \
head -100 | awk '{print $2 ":" $1}' \
> hashtags.txt
You may receive some error messages like this jq: error: Cannot iterate over null
, this is because some tweets doesn’t contains any hashtags and jq
throws a error when it tries to extract the text
field. More about jq
on this post.
The hashtags.txt
file will looks like:
go:82263
r:76387
javascript:66695
c:60863
php:43428
java:29608
css:28545
python:22974
html5:22013
ruby:21729
...
...
...
Now go to Wordle Advanced and past the content of this archive. Save as PNG and you’re done!
If you prefer a more pythonic way, I found a excellent tutorial: A Wordcloud in Python
.