Which is the best tool for copying a large directory tree locally?

Recently we had to move a full Cassandra backup to another cluster of machines (another Datacenter on Cassandra’s jargon). Although it can be achieved using DC replication we opted for a more conservative approach and not change production configurations neither increase its load due data streaming. This post is quick comparison to find out which tool would perform better for copying a large directory tree locally. ## The Data One of our Cassandra’s clusters contains 12 nodes, each node has 532Gb of data distributed among 1,753,200 files (the /var/lib/cassandra folder).

2017-05-11

https://blog.arjon.es/2017/which-is-the-best-tool-for-copying-a-large-directory-tree-locally/

Be notified on Slack when a long process finishes

We run several processes that may take hours to complete and it is nice to be notified on a Slack channel when those processes finishes correctly. Using the Slack’s Incoming Webhooks API, a small bash script and a couple of tricks it is really simple!

2016-09-15

https://blog.arjon.es/2016/be-notified-on-slack-when-a-long-process-finishes/

Creating a beautiful tagcloud from hashtags

Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.

2016-01-15

https://blog.arjon.es/2016/creating-a-beautiful-tagcloud-from-hashtags/

Tips & Tricks to migrate MySQL between datacenters

Most of our data are stored on MySQL and Cassandra, MySQL was the primary data-store when we started up the company. Currently our MySQL workload is located at AWS RDS and we would like to give a try to Microsoft Azure. This writing is to document a few tricks we learned to reduce the total time of dump, transfer and restore. Hope it can help you too.

2016-01-01

https://blog.arjon.es/2016/tips-tricks-to-migrate-mysql-between-datacenters/

#mysql
#bash

Create your own dataset consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

2015-07-30

https://blog.arjon.es/2015/create-your-own-dataset-consuming-twitter-api/

#bash
#data

How to create interactive tweets heatmaps

This posts shows how to create heatmaps of conversations taking place on Twitter, this is a proof of concept technic to learn more about our current datasets, this knowledge would be latter applied to the product development cycle. My objective here is to share a simple way to create a quick visualization and be able to make an internal demo.

2015-06-20

https://blog.arjon.es/2015/how-to-create-interactive-tweets-heatmaps/

Querying json datasets with jq

Working with JSON datasets is really common task nowadays, almost any API will output information on this format, but is still complex to manipulate this format when compared with plain-text combined with common unix commands like cut, awk, sed, etc.

To reduce this gap jq was developed with exactly this paradigm in mind jq is like sed for JSON data. This post will walk through the details to: select fields (projection), flatten arrays, filter jsons based on a field value and convert JSON to CSV/TSV.

2014-11-18

https://blog.arjon.es/2014/querying-json-datasets-with-jq/

#bash
#json

Avoiding a Spark Job to die when disconnecting from shell

Today I launched a spark job that was taking to long to complete and I forgot to start it through screen so I need find a way to keep it running after I disconnect my terminal of the cluster.

2014-08-30

https://blog.arjon.es/2014/avoiding-a-spark-job-to-die-when-disconnecting-from-shell/

#spark
#bash

Working with CSV files on shell

Often we have to work with JSON data sets, now and then data comes on CSV format. I received a great tip from @diegodellera who told me about textql - Execute SQL against structured text like CSV or TSV.

2014-07-29

https://blog.arjon.es/2014/working-with-csv-files-on-shell/

#bash
#data

tmux

To maintain sessions on remote servers, as recommended by a friend, I started to use tmux.

2014-01-24

https://blog.arjon.es/2014/tmux/

#bash
#tmux

Mutable Ideas

Tag: bash