A Networker's Log File

Saturday, August 27, 2016

Setting Up Cisco L2 and L3 Devices with GNS3 (1.5.2) for CCNA/CCNP Preparations

See my LinkedIn post on this topic: https://www.linkedin.com/pulse/setting-up-cisco-l2-l3-devices-gns3-152-ccnaccnp-preparations-yee

Monday, August 15, 2016

Install iPython (Jupyter) Notebook on Amazon EMR

Use the bootstrap script on this link to install iPython Notebook: https://github.com/awslabs/emr-bootstrap-actions/tree/master/ipython-notebook
Although the iPython server is running, it's not integrated with Spark. Follow the instructions according to this blog post: https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
Create the initial SparkContext and SQL context as follows:

from pyspark import  SparkContext
sc = SparkContext( 'local', 'pyspark')

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

Friday, August 12, 2016

MySQL Driver Error in Apache Spark

I was following the Spark example to load data from MySQL database. See "http://spark.apache.org/examples.html"

There was an error upon executing:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 233, ip-172-22-11-249.ap-southeast-1.compute.internal): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver

To force Spark to load the "com.mysql.jdbc.Driver", add the following option as highlighted below

val df = sqlContext
  .read
  .format("jdbc")
  .option("url", url)
  .option("dbtable", "people") 
  .option("driver","com.mysql.jdbc.Driver").load()

Wednesday, August 10, 2016

Install New Interpreter in Zeppelin 0.6.x

In new Zeppelin 0.6.x, you can install new interpreters as follows:

List all available interpreter:

/usr/lib/zeppelin/bin/install-interpreter.sh --list

To install the specific interpreters:

/usr/lib/zeppelin/bin/install-interpreter.sh --name jdbc,hbase,postgresql

Friday, August 5, 2016

IAM Errors when Creating Amazon EMR

There are errors related to the lack of permissions in the EMR_EC2_DefaultRole whenever I launch a Amazon EMR cluster. After some searching on the support forum, the default EMR role may not be created automatically for you. Hence, I removed the old default role and created new one as follows:

Create default role:

aws emr create-default-roles

Create instance profile:

aws iam create-instance-profile --instance-profile-name EMR_EC2_DefaultRole

Verify that instance profile exists but doesn't have any roles:

aws iam get-instance-profile --instance-profile-name EMR_EC2_DefaultRole

Add the role using:

aws iam add-role-to-instance-profile --instance-profile-name EMR_EC2_DefaultRole --role-name EMR_EC2_DefaultRole

Thursday, July 7, 2016

Unstuck Spark/Zeppelin Jobs on Amazon EMR

Apache Zeppelin + Apache Spark is a perfect match. Basically, you can do the following in one console:

Data Ingestion
Data Discovery
Data Analytics
Data Visualization & Collaboration

As it's still under incubation, the error handling is still not as rock solid. Often, I have experienced Spark jobs being stuck for long time. Usually, restarting the Spark interpreter should do the trick. However, there are times that this simple trick won't work and the only way is to restart the Zeppelin daemon. On Amazon EMR console, do the following:

/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop
/usr/lib/zeppelin/bin/zeppelin-daemon.sh start

If you wish to execute the scripts in zepplin account, which has a nologin shell. Execute following instead:

sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop' zeppelin
sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh start' zeppelin

If you encounter this Java connection error: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method), it's probably because Zeppelin starts the spark interpreter in a different process.

Edit /etc/spark/conf/spark-defaults.conf
Comment off the following line and restart Zeppelin

#spark.driver.memory              5g

Reference: http://stackoverflow.com/questions/32735645/hello-world-in-zeppelin-failed

Tuesday, May 31, 2016

Multiple JSON Configurations for Amazon EMR cluster

To use multiple JSON configurations when you launch the new Amazon EMR cluster, I want to configure Spark to use dynamic allocation of executors and store Zeppelin notebook on S3 storage. Rename the bold red below according to your S3 bucket location. In the following example, create the folder '/user/notebook' under your-s3-bucket. You'll see new note.json under the S3 folder, as you create new Zeppelin notebooks.

[
    {
        "classification":"spark-defaults", 
        "properties": {
            "spark.serializer":"org.apache.spark.serializer.KryoSerializer", 
            "spark.dynamicAllocation.enabled":"true"}, 
        "configurations":[]
    },
    {
        "configurations":[
         {
            "classification":"export",
            "properties":{
               "ZEPPELIN_NOTEBOOK_S3_BUCKET":"your-s3-bucket",
               "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
               "ZEPPELIN_NOTEBOOK_USER":"user" 
            }
         }
      ],
      "classification":"zeppelin-env",
      "properties":{
      }
   }
]