See my LinkedIn post on this topic: https://www.linkedin.com/pulse/setting-up-cisco-l2-l3-devices-gns3-152-ccnaccnp-preparations-yee
A Networker's Log File
I have a wide scope of interests in IT, which includes hyper-v private cloud, remote desktop services, server clustering, PKI, network security, routing & switching, enterprise network management, MPLS VPN on enterprise network etc. Started this blog for my quick reference and to share technical knowledge with our team members.
Saturday, August 27, 2016
Monday, August 15, 2016
Install iPython (Jupyter) Notebook on Amazon EMR
- Use the bootstrap script on this link to install iPython Notebook: https://github.com/awslabs/emr-bootstrap-actions/tree/master/ipython-notebook
- Although the iPython server is running, it's not integrated with Spark. Follow the instructions according to this blog post: https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
- Create the initial SparkContext and SQL context as follows:
from pyspark import SparkContext
sc = SparkContext( 'local', 'pyspark')
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
Labels:
amazon emr,
apache spark
Friday, August 12, 2016
MySQL Driver Error in Apache Spark
I was following the Spark example to load data from MySQL database. See "http://spark.apache.org/examples.html"
There was an error upon executing:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 233, ip-172-22-11-249.ap-southeast-1.compute.internal): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver
To force Spark to load the "com.mysql.jdbc.Driver", add the following option as highlighted below
There was an error upon executing:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 4 times, most recent failure: Lost task 0.3 in stage 20.0 (TID 233, ip-172-22-11-249.ap-southeast-1.compute.internal): java.lang.IllegalStateException: Did not find registered driver with class com.mysql.jdbc.Driver
To force Spark to load the "com.mysql.jdbc.Driver", add the following option as highlighted below
val df = sqlContext
.read
.format("jdbc")
.option("url", url)
.option("dbtable", "people")
.option("driver","com.mysql.jdbc.Driver").load()
Labels:
apache spark
Wednesday, August 10, 2016
Install New Interpreter in Zeppelin 0.6.x
In new Zeppelin 0.6.x, you can install new interpreters as follows:
- List all available interpreter:
/usr/lib/zeppelin/bin/install-interpreter.sh --list
- To install the specific interpreters:
/usr/lib/zeppelin/bin/install-interpreter.sh --name jdbc,hbase,postgresql
Labels:
amazon web services,
apache zeppelin
Friday, August 5, 2016
IAM Errors when Creating Amazon EMR
There are errors related to the lack of permissions in the EMR_EC2_DefaultRole whenever I launch a Amazon EMR cluster. After some searching on the support forum, the default EMR role may not be created automatically for you. Hence, I removed the old default role and created new one as follows:
- Create default role:
- aws emr create-default-roles
- Create instance profile:
- aws iam create-instance-profile --instance-profile-name EMR_EC2_DefaultRole
- Verify that instance profile exists but doesn't have any roles:
- aws iam get-instance-profile --instance-profile-name EMR_EC2_DefaultRole
- Add the role using:
- aws iam add-role-to-instance-profile --instance-profile-name EMR_EC2_DefaultRole --role-name EMR_EC2_DefaultRole
Labels:
amazon web services,
hadoop
Thursday, July 7, 2016
Unstuck Spark/Zeppelin Jobs on Amazon EMR
Apache Zeppelin + Apache Spark is a perfect match. Basically, you can do the following in one console:
If you encounter this Java connection error: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method), it's probably because Zeppelin starts the spark interpreter in a different process.
- Data Ingestion
- Data Discovery
- Data Analytics
- Data Visualization & Collaboration
As it's still under incubation, the error handling is still not as rock solid. Often, I have experienced Spark jobs being stuck for long time. Usually, restarting the Spark interpreter should do the trick. However, there are times that this simple trick won't work and the only way is to restart the Zeppelin daemon. On Amazon EMR console, do the following:
- /usr/lib/zeppelin/bin/zeppelin-daemon.sh stop
- /usr/lib/zeppelin/bin/zeppelin-daemon.sh start
If you wish to execute the scripts in zepplin account, which has a nologin shell. Execute following instead:
- sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh stop' zeppelin
- sudo -s /bin/bash -c '/usr/lib/zeppelin/bin/zeppelin-daemon.sh start' zeppelin
- Edit /etc/spark/conf/spark-defaults.conf
- Comment off the following line and restart Zeppelin
#spark.driver.memory 5g
Reference: http://stackoverflow.com/questions/32735645/hello-world-in-zeppelin-failed
Labels:
amazon web services,
apache spark,
apache zeppelin
Tuesday, May 31, 2016
Multiple JSON Configurations for Amazon EMR cluster
To use multiple JSON configurations when you launch the new Amazon EMR cluster, I want to configure Spark to use dynamic allocation of executors and store Zeppelin notebook on S3 storage. Rename the bold red below according to your S3 bucket location. In the following example, create the folder '/user/notebook' under your-s3-bucket. You'll see new note.json under the S3 folder, as you create new Zeppelin notebooks.
[ { "classification":"spark-defaults", "properties": { "spark.serializer":"org.apache.spark.serializer.KryoSerializer", "spark.dynamicAllocation.enabled":"true"}, "configurations":[] }, { "configurations":[ { "classification":"export", "properties":{ "ZEPPELIN_NOTEBOOK_S3_BUCKET":"your-s3-bucket", "ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo", "ZEPPELIN_NOTEBOOK_USER":"user"} } ], "classification":"zeppelin-env", "properties":{ } } ]
Labels:
amazon web services,
big data
Subscribe to:
Posts (Atom)