Avoiding a Spark Job to die when disconnecting from shell
Today I launched a spark job that was taking to long to complete and I forgot to start it through screen so I need find a way to keep it running after I disconnect my terminal of the cluster.
$ spark-submit ....
14/08/29 23:57:32 INFO TaskSetManager: Starting task 1.0:3303 as TID 11603 on executor 0: ip-xxxxx.ec2.internal (PROCESS_LOCAL)
14/08/29 23:57:32 INFO TaskSetManager: Serialized task 1.0:3303 as 2721 bytes in 0 ms
14/08/29 23:57:32 INFO TaskSetManager: Finished TID 11596 in 7724 ms on ip-xxxxx.ec2.internal (progress: 3298/4150)
14/08/29 23:57:32 INFO DAGScheduler: Completed ShuffleMapTask(1, 3296)
Here I sent the job to background, pressing CTRL
+Z
. It will stop the job
$ jobs
[1] suspended java
$ disown %1
Running disown and % job number
it marks the job to not receive kill
when I leave the main term.
After logout I get into another terminal to make sure everything worked as expected and Voilá :)
$ ps aux | grep spark
spark 9643 1.6 6.6 2038688 506744 ? Sl 19:33 4:25 /usr/lib/jvm/java-1.7.0/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.1-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3...
Also I could check the Spark UI and I could see my job slowly progressing …