Sometimes production code misbehave and it’s complex to replicate the same conditions on test/stage environment. We have almost all ports of our server closed (as it should be), so IMHO the best option is to open a ssh tunnel.
This is an actual dead-lock situation!
I found this report very clear and straight forward to easy migration from Eclipse to IntelliJ IDEA, or as stated by their creators:
… report is to show Eclipse users, specifically, how to get started using IDEA faster and with less headaches …
I faced some bumps trying to install mysql-python
on my Mac OSX. I didn’t have MySQL installed on my computer because I use Vagrant to keep specific development environments accord with each project I’m work on.
When I installed Hive using Homebrew it installed the latest version (0.12.0 by the time), but when I work with Amazon EMR, their AMI comes with Hive 0.11.x. So I had to downgrade the Hive on my local computer.
Sometimes the dataset I had to handle just wasn’t that big nor the original data was available at any other medium but MySQL/RDS. In order to link Hive to other storages its necessary a StorageHandler.
From June 30 to July 2 will be held the Spark Summit. I’m really excited to attend this event and learn more about this awesome framework and its community.
With a fast-growing community of 20+ companies contributing to the project, Spark Summits foster connections between those related to and interested in the project.
Hadoop is really bad to handle small files, the framework is heavy and was not designed to work on small files.
We process social signals, this means tons of small JSON from Twitter, Facebook, Google Plus API’s. In order to improve the overall performance we use mainly two technics:
A collection of problems (and solutions) that we’ve faced to implement DynamoDB and Hive.