Those are the posts and videos that I consider key to understand Big Data Architectures for Analytics:
Tag: bigdata
ABSTRACT: Apache Spark es un nuevo framework de procesamiento distribuido para big data, escrito en Scala con wrappers para Python y Java, que viene generando mucha atención de la comunidad por su potencia, simplicidad de uso y velocidad de procesamiento. Ya siendo llamado como el remplazo de Apache Hadoop.
ABSTRACT: Working with big volumes of data is a complicated task, but it’s even harder if you have to do everything in real time and try to figure it all out yourself. This session will use practical examples to discuss architectural best practices and lessons learned when solving real-time social media analytics, sentiment analysis, and data visualization decision-making problems with AWS.
Hadoop is really bad to handle small files, the framework is heavy and was not designed to work on small files.
We process social signals, this means tons of small JSON from Twitter, Facebook, Google Plus API’s. In order to improve the overall performance we use mainly two technics: