Thursday, July 7, 2016

Unstuck Spark/Zeppelin Jobs on Amazon EMR

Apache Zeppelin + Apache Spark is a perfect match. Basically, you can do the following in one console:

  • Data Ingestion
  • Data Discovery
  • Data Analytics
  • Data Visualization & Collaboration
As it's still under incubation, the error handling is still not as rock solid. Often, I have experienced Spark jobs being stuck for long time. Usually, restarting the Spark interpreter should do the trick. However, there are times that this simple trick won't work and the only way is to restart the Zeppelin daemon. On Amazon EMR console, do the following:
  1. /usr/lib/zeppelin/bin/zeppelin-daemon.sh stop
  2. /usr/lib/zeppelin/bin/zeppelin-daemon.sh start
If you wish to execute the scripts in zepplin account, which has a nologin shell. Execute following instead:
  1. sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop' zeppelin
  2. sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh start' zeppelin
If you encounter this Java connection error: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method), it's probably because Zeppelin starts the spark interpreter in a different process.

  1. Edit /etc/spark/conf/spark-defaults.conf
  2. Comment off the following line and restart Zeppelin

#spark.driver.memory              5g
Reference: http://stackoverflow.com/questions/32735645/hello-world-in-zeppelin-failed