Working with CSV files on shell

Often we have to work with JSON data sets, now and then data comes on CSV format. I received a great tip from @diegodellera who told me about textql - Execute SQL against structured text like CSV or TSV.

2014-07-29

https://blog.arjon.es/2014/working-with-csv-files-on-shell/

#bash
#data

Spark Summit - Training Day

I took the TRACK B:Advanced Apache Spark Workshop and I can say it was really great learn more about Spark internals and its libraries. The Databricks’ team were awesome. All slides and training material are already online: Spark Summit 2014 Training.

2014-07-03

https://blog.arjon.es/2014/spark-summit-training-day/

#spark
#data

Spark Summit 2014 - Day 2

Following the Spark Summit 2014 - Day 2

2014-07-01

https://blog.arjon.es/2014/spark-summit-2014-day-2/

#spark
#data

Spark Summit 2014 - Day 2 (Afternoon)

Following the Spark Summit 2014 - Day 2

2014-07-01

https://blog.arjon.es/2014/spark-summit-2014-day-2-afternoon/

#spark
#data

Spark Summit 2014 - Day 1

My personal notes during Spark Summit 2014.

2014-06-30

https://blog.arjon.es/2014/spark-summit-2014-day-1/

#spark
#data

Spark Summit 2014 - Day 1 (Afternoon)

Following the Spark Summit 2014 - Afternoon talks notes

2014-06-30

https://blog.arjon.es/2014/spark-summit-2014-day-1-afternoon/

#spark
#data

Twitter JSON Manipulation

Today I had to quickly find the most frequent Hashtags on my smallish dataset. After some research I just found a awesome shell tool to manipulate json: jq a json grep+sed+awk tool With jq everything else was simple, just pipeline a few commands: $ cat tweets.json | \ jq -r '.entities.hashtags[].text' | sort | uniq -c | \ sort -nr | $ cat tweets.json | \ jq '.text' | \ # select the text field on my JSON tr 'A-Z' 'a-z' | \ # convert text to lower case egrep -oe'#[0-9a-z_]+' | \ # select the hashtag sort | uniq -c | \ # count the number of different hashtags sort -nr | head -10 # reverse sort by frequency and get top 10 A couple of minutes later, the output was:

2013-08-16

https://blog.arjon.es/2013/twitter-json-manipulation/