Tag: data

Setup de ambientes virtuales en Python3 (venv)

El uso de ambientes virtuales permite isolar las dependencias del proyecto de otras instaladas en las carpetas de sistemas. Cada .venv contiene su propia version de los binarios de Python y sus dependencias. A continuación una simple demostración de como se podria organizar venv para compartir el proyecto de forma efectiva.

2020-06-01

https://blog.arjon.es/2020/setup-de-ambientes-virtuales-en-python3-venv/

Data science approach to organizing my playlist

A couple of years ago I created a Spotify’s playlist where I add all tracks I liked, just as the main repository of things I’d like to listen to, no matter the mood I was when I added that song. As time goes, this playlist became less enjoyable to listen due to the change in rhythm - From listen to a Metal song it jumps to Bossa Nova, which is very annoying. This post contains a few data science approaches I applied to organize this playlist and what worked and what didn’t.

2020-05-25

https://blog.arjon.es/2020/data-science-approach-to-organizing-my-playlist/

Where to Find Datasets to Learn Big Data & Data Science

Sometimes you just need data to learn how a algorithm works, to run a stress test or just to have a excuse to spin up several machines in a cluster and see how it crush the data. More often than not, it is incredibly hard to obtain data, and a few colleagues I’ve talked about had similar problem, so this post is a collection of links and references for datasets I know have been open source. Please contribute =)

2016-03-04

https://blog.arjon.es/2016/where-to-find-datasets-to-learn-big-data-data-science/

Creating a beautiful tagcloud from hashtags

Although tagcloud seems a little bit outdated and criticized visualization format, I have no doubt it can be useful sometimes. And if you can create one with only a few key strokes it is pretty sweet. Below I’ll show the technic of extracting Twitter #hashtags but you can use this technic to virtually any text source.

2016-01-15

https://blog.arjon.es/2016/creating-a-beautiful-tagcloud-from-hashtags/

Reading compressed data with Spark using unknown file extensions

This post could also be called Reading .gz.tmp files with Spark. At Socialmetrix we have several pipelines writing logs to AWS S3, sometimes Apache Flume fails on the last phase to rename the final archive from .gz.tmp to .gz, therefore those files are unavailable to be read by SparkContext.textFile API. This post presents our workaround to process those files.

2015-10-02

https://blog.arjon.es/2015/reading-compressed-data-with-spark-using-unknown-file-extensions/

Vagrant + Spark + Zeppelin a toolbox to the Data Analyst (or Data Scientist)

Recently I built an environment to help me to teach Apache Spark, my initial thoughts were to use Docker but I found some issues specially when using older machines, so to avoid more blockers I decided to build a Vagrant image and also complement the package with Apache Zeppelin as UI. This Vagrant will build on Debian Jessie, with Oracle Java, Apache Spark 1.4.1 and Zeppelin (from the master branch).

2015-08-23

https://blog.arjon.es/2015/vagrant--spark--zeppelin-a-toolbox-to-the-data-analyst-or-data-scientist/

Create your own dataset consuming Twitter API

Several tutorials have an assumption you own a data set. Often that is not the case and you just can’t take advantage of the tutorial because you don’t have data to play along. To comply with social networks Terms and Conditions you can’t publish your data sets, but you can create your own! Follow through these few commands.

2015-07-30

https://blog.arjon.es/2015/create-your-own-dataset-consuming-twitter-api/

#bash
#data

How to create interactive tweets heatmaps

This posts shows how to create heatmaps of conversations taking place on Twitter, this is a proof of concept technic to learn more about our current datasets, this knowledge would be latter applied to the product development cycle. My objective here is to share a simple way to create a quick visualization and be able to make an internal demo.

2015-06-20

https://blog.arjon.es/2015/how-to-create-interactive-tweets-heatmaps/

Which skills should I learn to become a Big Data Engineer?

A few days ago I received an email from a student of Universidad Tecnológica Nacional asking me for advice about what kind of skills he needed acquire to be hired as Big Data Engineer, I felt it was something worth writing about and hopefully it can generate a sane debate and help more people.

2015-01-08

https://blog.arjon.es/2015/which-skills-should-i-learn-to-become-a-big-data-engineer/

#jobs
#data

Data + Design Book: a intro to preparing and visualizing info

The book “DATA + DESIGN | a simple introduction to preparing and visualizing information” is a excellent reference to create visualization to several types of data, it guides you through simple and complex data with very clear Dos and Don’ts tips. On top of all it is free.

2014-08-26

https://blog.arjon.es/2014/data--design-book-a-intro-to-preparing-and-visualizing-info/

to old posts