Tuesday, May 31, 2016

Multiple JSON Configurations for Amazon EMR cluster

To use multiple JSON configurations when you launch the new Amazon EMR cluster, I want to configure Spark to use dynamic allocation of executors and store Zeppelin notebook on S3 storage. Rename the bold red below according to your S3 bucket location. In the following example, create the folder '/user/notebook' under your-s3-bucket. You'll see new note.json under the S3 folder, as you create new Zeppelin notebooks.
[
    {
        "classification":"spark-defaults", 
        "properties": {
            "spark.serializer":"org.apache.spark.serializer.KryoSerializer", 
            "spark.dynamicAllocation.enabled":"true"}, 
        "configurations":[]
    },
    {
        "configurations":[
         {
            "classification":"export",
            "properties":{
               "ZEPPELIN_NOTEBOOK_S3_BUCKET":"your-s3-bucket",
               "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
               "ZEPPELIN_NOTEBOOK_USER":"user" 
            }
         }
      ],
      "classification":"zeppelin-env",
      "properties":{
      }
   }
]