Skip to main content

Mutable Ideas

Tag: bigdata

A data science toolkit inside a docker image, build it once, run everywhere

If you never heard about Jupyter Notebook, I highly recommend you to check it out. It have been my primary platform to build reports and data driven case studies. On this post I’d like to show how I create a simple and isolated environment with a Bash script and Docker to run JupyterLab. Recently Jupyter Notebook received a major overhauling and become JupyterLab - currently in beta, but the new platform looks fresh and very powerful.

Entrevista: Big Data y Privacidad de Datos @ Radio Univ. Nacional de La Plata

Esa mañana tuve la oportunidad de conversar con Marcos Clavelino, conductor del programa “Vueltas en el Aire” de la Radio Universidad Nacional de La Plata donde conversamos sobre Big Data, Privacidad y el caso del supuesto leak de información de usuarios por parte de Facebook. ## Escuche completo acá

Reasons to fall in love for Postgres

I’ve been working on analytics/big data field for 10+ years, during this time I’ve been working mostly with MySQL, MongoDB, Redis and Cassandra. Just a couple of years ago I started to really pay attention to Postgres, and my regret is not getting into it earlier… On this post I try to enumerate a few features I’m using and why I think you should try it too, before jumping into the architectural and operational complexity of multiple NoSQL.

Instalando Datastax Analytics (Cassandra y Spark) con Azure Templates

La última semana tuve la oportunidad de contar la experiencia de Socialmetrix instalando y configurando clusters de Datastax Analytics en Azure. Datastax brinda una solución comercial en un bundle, conteniendo Cassandra, Spark y Solr integrados. Las charlas se dieron en Argentina Big Data Meetup. Hosted by Jampp y el Nardoz Meetup. Hosted by Medallia

Where to Find Datasets to Learn Big Data & Data Science

Sometimes you just need data to learn how a algorithm works, to run a stress test or just to have a excuse to spin up several machines in a cluster and see how it crush the data. More often than not, it is incredibly hard to obtain data, and a few colleagues I’ve talked about had similar problem, so this post is a collection of links and references for datasets I know have been open source. Please contribute =)

Making Hadoop 2.6 + Spark-Cassandra driver play nice together

We have been using Spark Standalone deploy for more than one year now, but recently I tried to use Azure’s HDInsight which runs on Hadoop 2.6 (YARN deploy).

After provisioning the servers, all small tests worked fine, I have been able to run Spark-Shell, read and write to Blob Storage, until I tried to write to Datastax Cassandra cluster which constantly returned a error message: Exception in thread "main" java.io.IOException: Failed to open native connection to Cassandra at {10.0.1.4}:9042

Resumen del Taller: Introducción al Desarrollo de Aplicaciones para Big Data

Durante el mes de Agosto, Juan Pampliega y yo recibimos la invitación para armar un taller de Big Data en el Espacio Fundación Telefonica como un complemento a la exposición “Big Bang Data”. Este post es un resumen del evento y las referencias de lectura para los que no tuvieran la oportunidad de participar.

Vagrant + Spark + Zeppelin a toolbox to the Data Analyst (or Data Scientist)

Recently I built an environment to help me to teach Apache Spark, my initial thoughts were to use Docker but I found some issues specially when using older machines, so to avoid more blockers I decided to build a Vagrant image and also complement the package with Apache Zeppelin as UI. This Vagrant will build on Debian Jessie, with Oracle Java, Apache Spark 1.4.1 and Zeppelin (from the master branch).