Big Data Architecture - Reading List

Those are the posts and videos that I consider key to understand Big Data Architectures for Analytics:

2015-04-30

https://blog.arjon.es/2015/big-data-architecture-reading-list/

#bigdata

WISIT2014 - Clasificando Tweets en Realtime con Apache Spark

ABSTRACT: Apache Spark es un nuevo framework de procesamiento distribuido para big data, escrito en Scala con wrappers para Python y Java, que viene generando mucha atención de la comunidad por su potencia, simplicidad de uso y velocidad de procesamiento. Ya siendo llamado como el remplazo de Apache Hadoop.

2014-11-28

https://blog.arjon.es/2014/wisit2014-clasificando-tweets-en-realtime-con-apache-spark/

AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics

ABSTRACT: Working with big volumes of data is a complicated task, but it’s even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS.

2014-11-13

https://blog.arjon.es/2014/aws-reinvent-2014-arc202-real-world-real-time-analytics/

Hadoop and small files

Hadoop is really bad to handle small files, the framework is heavy and was not designed to work on small files.

We process social signals, this means tons of small JSON from Twitter, Facebook, Google Plus API’s. In order to improve the overall performance we use mainly two technics:

2014-02-04

https://blog.arjon.es/2014/hadoop-and-small-files/

Mutable Ideas

Tag: bigdata

Big Data Architecture - Reading List

WISIT2014 - Clasificando Tweets en Realtime con Apache Spark

AWS re:Invent 2014 | (ARC202) Real-World Real-Time Analytics

Hadoop and small files