Monday, August 15, 2016

Install iPython (Jupyter) Notebook on Amazon EMR

  1. Use the bootstrap script on this link to install iPython Notebook:
  2. Although the iPython server is running, it's not integrated with Spark. Follow the instructions according to this blog post:
  3. Create the initial SparkContext and SQL context as follows:

from pyspark import  SparkContext
sc = SparkContext( 'local', 'pyspark')
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

Friday, August 12, 2016

MySQL Driver Error in Apache Spark

I was following the Spark example to load data from MySQL database. See ""

There was an error upon executing:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 233, ip-172-22-11-249.ap-southeast-1.compute.internal): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

To force Spark to load the "com.mysql.jdbc.Driver", add the following option as highlighted below
val df = sqlContext
  .option("url", url)
  .option("dbtable", "people") 

Wednesday, August 10, 2016

Install New Interpreter in Zeppelin 0.6.x

In new Zeppelin 0.6.x, you can install new interpreters as follows:

  • List all available interpreter: 
  1. /usr/lib/zeppelin/bin/ --list
  • To install the specific interpreters: 
  1. /usr/lib/zeppelin/bin/ --name jdbc,hbase,postgresql

Friday, August 5, 2016

IAM Errors when Creating Amazon EMR

There are errors related to the lack of permissions in the EMR_EC2_DefaultRole whenever I launch a Amazon EMR cluster. After some searching on the support forum, the default EMR role may not be created automatically for you. Hence, I removed the old default role and created new one as follows:
  1. Create default role: 
    • aws emr create-default-roles
  2. Create instance profile: 
    • aws iam create-instance-profile --instance-profile-name EMR_EC2_DefaultRole
  3. Verify that instance profile exists but doesn't have any roles:
    • aws iam get-instance-profile --instance-profile-name EMR_EC2_DefaultRole
  4. Add the role using:
    • aws iam add-role-to-instance-profile --instance-profile-name EMR_EC2_DefaultRole --role-name EMR_EC2_DefaultRole