Which skills should I learn to become a Big Data Engineer?

A few days ago I received an email from a student of Universidad Tecnológica Nacional asking me for advice about what kind of skills he needed acquire to be hired as Big Data Engineer, I felt it was something worth writing about and hopefully it can generate a sane debate and help more people.

2015-01-08

https://blog.arjon.es/2015/which-skills-should-i-learn-to-become-a-big-data-engineer/

#jobs
#data

WISIT2014 - Clasificando Tweets en Realtime con Apache Spark

ABSTRACT: Apache Spark es un nuevo framework de procesamiento distribuido para big data, escrito en Scala con wrappers para Python y Java, que viene generando mucha atención de la comunidad por su potencia, simplicidad de uso y velocidad de procesamiento. Ya siendo llamado como el remplazo de Apache Hadoop.

2014-11-28

https://blog.arjon.es/2014/wisit2014-clasificando-tweets-en-realtime-con-apache-spark/

Querying json datasets with jq

Working with JSON datasets is really common task nowadays, almost any API will output information on this format, but is still complex to manipulate this format when compared with plain-text combined with common unix commands like cut, awk, sed, etc.

To reduce this gap jq was developed with exactly this paradigm in mind jq is like sed for JSON data. This post will walk through the details to: select fields (projection), flatten arrays, filter jsons based on a field value and convert JSON to CSV/TSV.

2014-11-18

https://blog.arjon.es/2014/querying-json-datasets-with-jq/

#bash
#json

AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics

ABSTRACT: Working with big volumes of data is a complicated task, but it’s even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS.

2014-11-13

https://blog.arjon.es/2014/aws-reinvent-2014-arc202-real-world-real-time-analytics/

I am Speaking at AWS re:Invent

2014-11-15 EDIT: Check the Talk recording and Slides here

November 13 I will be sharing the stage with Socialmetrix’s Solutions Architect Sebastian Montini at Amazon AWS re:Invent, we will talk about our experience developing Socialmetrix’s big data and realtime infrastructure, architecture evolution and lessons learned.

2014-10-21

https://blog.arjon.es/2014/i-am-speaking-at-aws-reinvent/

#talks

Construyendo una Infra de Big Data rentable y escalable

Esta charla fue presentada en la Maestría en Explotación de Datos y Descubrimiento del Conocimiento. En el marco de su 10° Aniversario bajo la temática Hablemos de Big Data (Big Data Talks)

2014-09-17

https://blog.arjon.es/2014/construyendo-una-infra-de-big-data-rentable-y-escalable/

#talks

Quick tips & tricks I learned working with Spark

A small collection of tips & tricks I learned working with Spark so far, I hope it can help you as well. If you have more tricks, please let me know!

2014-09-04

https://blog.arjon.es/2014/quick-tips-tricks-i-learned-working-with-spark/

#spark

Avoiding a Spark Job to die when disconnecting from shell

Today I launched a spark job that was taking to long to complete and I forgot to start it through screen so I need find a way to keep it running after I disconnect my terminal of the cluster.

2014-08-30

https://blog.arjon.es/2014/avoiding-a-spark-job-to-die-when-disconnecting-from-shell/

#spark
#bash

Data + Design Book: a intro to preparing and visualizing info

The book “DATA + DESIGN | a simple introduction to preparing and visualizing information” is a excellent reference to create visualization to several types of data, it guides you through simple and complex data with very clear Dos and Don’ts tips. On top of all it is free.

2014-08-26

https://blog.arjon.es/2014/data--design-book-a-intro-to-preparing-and-visualizing-info/

Introduccion a Apache Spark (en Español)

Introducción (en Español) al framework de procesamiento distribuido en memoria Apache Spark. Elementos básicos de Spark, RDD, incluye demo de las librerías SparkSQL y Spark Streaming

2014-08-14

https://blog.arjon.es/2014/introduccion-a-apache-spark-en-espanol/