Skip to main content

Mutable Ideas

Avoiding a Spark Job to die when disconnecting from shell

Today I launched a spark job that was taking to long to complete and I forgot to start it through screen so I need find a way to keep it running after I disconnect my terminal of the cluster.

$ spark-submit .... 

14/08/29 23:57:32 INFO TaskSetManager: Starting task 1.0:3303 as TID 11603 on executor 0: ip-xxxxx.ec2.internal (PROCESS_LOCAL)
14/08/29 23:57:32 INFO TaskSetManager: Serialized task 1.0:3303 as 2721 bytes in 0 ms
14/08/29 23:57:32 INFO TaskSetManager: Finished TID 11596 in 7724 ms on ip-xxxxx.ec2.internal (progress: 3298/4150)
14/08/29 23:57:32 INFO DAGScheduler: Completed ShuffleMapTask(1, 3296)

Here I sent the job to background, pressing CTRL+Z. It will stop the job

$  jobs
[1]  suspended java 

$  disown %1

Running disown and % job number it marks the job to not receive kill when I leave the main term.

After logout I get into another terminal to make sure everything worked as expected and Voilá :)

$ ps aux | grep spark

spark      9643  1.6  6.6 2038688 506744 ?      Sl   19:33   4:25 /usr/lib/jvm/java-1.7.0/bin/java -cp :::/root/ephemeral-hdfs/conf:/root/spark/conf:/root/spark/lib/spark-assembly-1.0.1-hadoop1.0.4.jar:/root/spark/lib/datanucleus-api-jdo-3...

Also I could check the Spark UI and I could see my job slowly progressing …

## References

How do I put an already running process under nohup