Hadoop – GC overhead limit exceeded error

In our Hadoop setup, we ended up having more than 1 million files in a single folder.  The folder had so many files, that any hdfs dfs command like -ls, -copyToLocal on the files was giving following error:

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.util.Arrays.copyOf(Arrays.java:2367)
        at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
        at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
        at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
        at java.lang.StringBuffer.append(StringBuffer.java:237)
        at java.net.URI.appendAuthority(URI.java:1852)
        at java.net.URI.appendSchemeSpecificPart(URI.java:1890)
        at java.net.URI.toString(URI.java:1922)
        at java.net.URI.<init>(URI.java:749)
        at org.apache.hadoop.fs.Path.initialize(Path.java:203)
        at org.apache.hadoop.fs.Path.<init>(Path.java:116)
        at org.apache.hadoop.fs.Path.<init>(Path.java:94)
        at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:230)
        at org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:263)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:732)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
        at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
        at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
        at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
        at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
        at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
        at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
        at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)

After doing some research, we added following environment variable to update Hadoop runtime options.

export HADOOP_OPTS="-XX:-UseGCOverheadLimit"

Adding this option fixed the GC error, but started throwing the following error, citing the lack of Java Heap space.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1351)
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
        at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy15.getListing(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:724)
        at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
        at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
        at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
        at org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
        at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
        at org.apache.hadoop.fs.shell.CommandWithDestination.recursePath(CommandWithDestination.java:291)
        at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
        at org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
        at org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:243)
        at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
        at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
        at org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:220)
        at org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
        at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

We modified the above export, and tried following instead.  Note that instead of  HADOOP_OPTS,  we needed to set HADOOP_CLIENT_OPTS fix this error. This was needed because all the hadoop commands run as a client.  HADOOP_OPTS needs to be setup for modifying actual Hadoop run time, and HADOOP_CLIENT_OPTS is needed to be setup for modifying run time for Hadoop command line client.

export HADOOP_CLIENT_OPTS="-XX:-UseGCOverheadLimit -Xmx4096m"

 

How to fix NameNode – SafeModeException

When you try to do any HDFS operation, you get following exception:

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): 
Cannot create directory /user/hadoopuser/dir/in. 
Name node is in safe mode.

What is Safe Mode in Hadoop?

Safe Mode in Hadoop is a maintenance state of NameNode.  During Safe Mode, HDFS cluster is read-only, and does not allow any changes. It doesn’t replicate or delete blocks.

When the Namenode Starts, it automatically enters the Safe mode, and it performs following initialization tasks:

  • Loads the file system namespace from the last known saved fsimage,
  • Loads the edit log file.
  • Applies edits log file changes on fsimage,  and created in new file system namespace.
  • Receives block reports containing information about block locations from all Datanodes

To leave Safe Mode, NameNode should collect reports for at least a specified threshold percentage of blocks and these should satisfy minimum replication condition.Even though this threshold may be reached fast, safe mode will extend to the configurable amount of time . This is make sure that remaining DataNodes check in before it starts replicating missing blocks or deleting over replicated blocks. After completion of block replication maintenance activity, the name node leaves safe mode automatically.

 

You can check if your Hadoop cluster by running following command:

hdfs dfsadmin -safemode get

If you just restarted your cluster, you should give it ample time to recover from Safemode.  This time can vary based on size of your cluster.  If its stuck in dhat state, then that can be fixed by using following command:

hdfs dfsadmin -safemode leave

How to parse argument parameters in bash shell?

This script will help you to understand how to parse argument parameters in bash shell.

Lets start by declaring some default values for parameters. This is to handle the case in which the argument parameters are not passed to the script.

#!/bin/sh
export ALPHA="DEFAULT_ALPHA"
export BETA="DEFAULT_BETA"

 

The next we add a case statement, and handle each argument specifically.  In case we find a match with the parameter name, we set the environment variable with the next token passed in the input.  After that shift operation removes the argument and argument parameter value from the stack, and proceeds to processing the next argument parameter.

while true ; do
  case "$1" in
    --alpha) export ALPHA="$2" ; shift 2 ;;
    --beta) export BETA="$2" ; shift 2 ;;
    *) break ;;
  esac
done

Next lets print the values once we are done with parsing, and store the above script to a file bash_command_parsing.sh.  Change the mode on this script so that it can be run as executable.

echo "GOT ALPHA $ALPHA"
echo "GOT BETA $BETA"

 

Lets run the script. First we run it without passing any argument, and then we pass it both arguments.

$> ./bash_command_parsing.sh
GOT ALPHA DEFAULT_ALPHA
GOT BETA DEFAULT_BETA
$> ./bash_command_parsing.sh --alpha 0.4 --beta 0.2
GOT ALPHA 0.4
GOT BETA 0.2

 

Note that the argument parsing is very unforgiving in this example, and bails out as soon as it encounters  any unhandled parameters.  In the first example, script passes the first argument correctly, and then bails out as soon as it sees a unhandled parameter.  In the second example, the the first parameter itself is unhandled, so script would not even try to parse the second parameter.

$> ./bash_command_parsing.sh --alpha 0.4 --beta1 0.2
GOT ALPHA 0.4
GOT BETA DEFAULT_BETA
 $> ./bash_command_parsing.sh --alpha1 0.4 --beta 0.2
GOT ALPHA DEFAULT_ALPHA
GOT BETA DEFAULT_BETA

VM warning: Insufficient space for shared memory file

While we were experimenting with MapReduce programs in our hadoop cluster, we started noticing following errors.

Java HotSpot(TM) 64-Bit Server VM warning: Insufficient space for shared memory file:
/tmp/hsperfdata_hdfs/28099
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Exception in thread "main" java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:80)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:107)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:81)
at org.apache.hadoop.util.RunJar.run(RunJar.java:209)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

On first look it seemed as if disk is full, and that would be causing the jobs to fail.  Further analysis showed that /tmp directory was mounted with allocated space of just 300M.  Remounting the /tmp drive with 2GB space solved the problem.

sudo mount -o remount,size=2G /tmp

Install a Multi Node Hadoop Cluster on Ubuntu 14.04

This article is about multi-node installation of Hadoop cluster.  You would need minimum of 2 ubuntu machines or virtual images to complete a multi-node installation.  If you want to just try out a single node cluster, follow this article on Installing Hadoop on Ubuntu 14.04.

I used Hadoop Stable version 2.6.0 for this article. I did this setup on a 3 node cluster.  For simplicity, i will designate one node as master, and 2 nodes as slaves (slave-1, and slave-2). Make sure all slave nodes are reachable from master node.  To avoid any unreachable hosts error, make sure you add the slave hostnames and ip addresses in /etc/hosts file. Similarly, slave nodes should be able to resolve master hostname.

Installing Java on Master and Slaves

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
# Updata Java runtime
$ sudo update-java-alternatives -s java-7-oracle

Disable IPv6

As of now Hadoop does not support IPv6, and is tested to work only on IPv4 networks.   If you are using IPv6, you need to switch Hadoop host machines to use IPv4.  The Hadoop Wiki link provides a one liner command to disable the IPv6.  If you are not using IPv6, skip this step:

sudo sed -i 's/net.ipv6.bindv6only\ =\ 1/net.ipv6.bindv6only\ =\ 0/' \
/etc/sysctl.d/bindv6only.conf && sudo invoke-rc.d procps restart

Setting up a Hadoop User

Hadoop talks to other nodes in the cluster using no-password ssh.   By having Hadoop run under a specific user context, it will be easy to distribute the ssh keys around in the Hadoop cluster.  Lets’s create a user hadoopuser on master as well as slave nodes.

# Create hadoopgroup
$ sudo addgroup hadoopgroup
# Create hadoopuser user
$ sudo adduser —ingroup hadoopgroup hadoopuser

Our next step will be to generate a ssh key for password-less login between master and slave nodes.  Run the following commands only on master node.  Run the last two commands for each slave node.  Password less ssh should be working before you can proceed with further steps.

# Login as hadoopuser
$ su - hadoopuser
#Generate a ssh key for the user
$ ssh-keygen -t rsa -P ""
#Authorize the key to enable password less ssh 
$ cat /home/hadoopuser/.ssh/id_rsa.pub >> /home/hadoopuser/.ssh/authorized_keys
$ chmod 600 authorized_keys
#Copy this key to slave-1 to enable password less ssh 
$ ssh-copy-id -i ~/.ssh/id_rsa.pub slave-1
#Make sure you can do a password less ssh using following command.
$ ssh slave-1

Download and Install Hadoop binaries on Master and Slave nodes

Pick the best mirror site to download the binaries from Apache Hadoop, and download the stable/hadoop-2.6.0.tar.gz for your installation.  Do this step on master and every slave node.  You can download the file once and the distribute to each slave node using scp command.

$ cd /home/hadoopuser
$ wget http://www.webhostingjams.com/mirror/apache/hadoop/core/stable/hadoop-2.2.0.tar.gz
$ tar xvf hadoop-2.2.0.tar.gz
$ mv hadoop-2.2.0 hadoop

Setup Hadoop Environment on Master and Slave Nodes

Copy and paste following lines into your .bashrc file under /home/hadoopuser. Do this step on master and every slave node.

# Set HADOOP_HOME
export HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME 
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
# Add Hadoop bin and sbin directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin;$HADOOP_HOME/sbin

Update hadoop-env.sh on Master and Slave Nodes

Update JAVA_HOME in /home/hadoopuser/hadoop/etc/hadoop/hadoop_env.sh to following. Do this step on master and every slave node.

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Common Terminologies
Before we start getting into configuration details, lets discuss some of the basic terminologies used in Hadoop.

  • Hadoop Distributed File System: A distributed file system that provides high-throughput access to application data. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. If you compare HDFS to a traditional storage structures ( e.g. FAT, NTFS), then NameNode is analogous to a Directory Node structure, and DataNode is analogous to actual file storage blocks.
  • Hadoop YARN: A framework for job scheduling and cluster resource management.
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Update Configuration Files
Add/update core-site.xml on Master and Slave nodes with following options.  Master and slave nodes should all be using the same value for this property fs.defaultFS,  and should be pointing to master node only.

  /home/hadoopuser/hadoop/etc/hadoop/core-site.xml (Other Options)
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoopuser/tmp</value>
  <description>Temporary Directory.</description>
</property>

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://master:54310</value>
  <description>Use HDFS as file storage engine</description>
</property>

 

Add/update mapred-site.xml on Master node only with following options.

  /home/hadoopuser/hadoop/etc/hadoop/mapred-site.xml (Other Options)
<property>
 <name>mapreduce.jobtracker.address</name>
 <value>master:54311</value>
 <description>The host and port that the MapReduce job tracker runs
  at. If “local”, then jobs are run in-process as a single map
  and reduce task.
</description>
</property>
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 <description>The framework for running mapreduce jobs</description>
</property>

 

Add/update hdfs-site.xml on Master and Slave Nodes. We will be adding following three entries to the file.

  • dfs.replication– Here I am using a replication factor of 2. That means for every file stored in HDFS, there will be one redundant replication of that file on some other node in the cluster.
  • dfs.namenode.name.dir – This directory is used by Namenode to store its metadata file.  Here i manually created this directory /hadoop-data/hadoopuser/hdfs/namenode on master and slave node, and use the directory location for this configuration.
  • dfs.datanode.data.dir – This directory is used by Datanode to store hdfs data blocks.  Here i manually created this directory /hadoop-data/hadoopuser/hdfs/datanode on master and slave node, and use the directory location for this configuration.
  /home/hadoopuser/hadoop/etc/hadoop/hdfs-site.xml (Other Options)
<property>
 <name>dfs.replication</name>
 <value>2</value>
 <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
 </description>
</property>
<property>
 <name>dfs.namenode.name.dir</name>
 <value>/hadoop-data/hadoopuser/hdfs/namenode</value>
 <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
 </description>
</property>
<property>
 <name>dfs.datanode.data.dir</name>
 <value>/hadoop-data/hadoopuser/hdfs/datanode</value>
 <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored.
 </description>
</property>

 

Add yarn-site.xml on Master and Slave Nodes.  This file is required for a Node to work as a Yarn Node.  Master and slave nodes should all be using the same value for the following properties,  and should be pointing to master node only.

  /home/hadoopuser/hadoop/etc/hadoop/yarn-site.xml
<property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
</property>
<property>
 <name>yarn.resourcemanager.scheduler.address</name>
 <value>master:8030</value>
</property> 
<property>
 <name>yarn.resourcemanager.address</name>
 <value>master:8032</value>
</property>
<property>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>master:8088</value>
</property>
<property>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>master:8031</value>
</property>
<property>
  <name>yarn.resourcemanager.admin.address</name>
  <value>master:8033</value>
</property>

 

Add/update slaves file on Master node only.  Add just name, or ip addresses of master and all slave node.  If file has an entry for localhost, you can remove that.  This file is just helper file that are used by hadoop scripts to start appropriate services on master and slave nodes.

  /home/hadoopuser/hadoop/etc/hadoop/slave
master
slave-1
slave-2

Format the Namenode
Before starting the cluster, we need to format the Namenode. Use the following command only on master node:

$ hdfs namenode -format

Start the Distributed Format System 

Run the following on master node command to start the DFS.

$ ./home/hadoopuser/hadoop/sbin/start-dfs.sh

You should observe the output to ascertain that it tries to start datanode on slave nodes one by one.   To validate the success, run following command on master nodes, and slave node.

$ su - hadoopuser
$ jps

The output of this command should list NameNode, SecondaryNameNode, DataNode on master node, and DataNode on all slave nodes.  If you don’t see the expected output, review the log files listed in Troubleshooting section.

Start the Yarn MapReduce Job tracker

Run the following command to start the Yarn mapreduce framework.

$ ./home/hadoopuser/hadoop/sbin/start-yarn.sh

To validate the success, run jps command again on master nodes, and slave node.The output of this command should list NodeManager, ResourceManager on master node, and NodeManager, on all slave nodes.  If you don’t see the expected output, review the log files listed in Troubleshooting section.

Review Yarn Web console

If all the services started successfully on all nodes, then you should see all of your nodes listed under Yarn nodes.  You can hit the following url on your browser and verify that:

http://master:8088/cluster/nodes

Lets’s execute a MapReduce example now

You should be all set to run a MapReduce example now. Run the following command

$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 30 100

Once the job is submitted you can validate that its running on the cluster by accessing following url.

http://master:8088/cluster/apps

Troubleshooting
Hadoop uses $HADOOP_HOME/logs directory. In case you get into any issues with your installation, that should be the first point to look at. In case, you need help with anything else, do leave me a comment.

Feedback and Questions?

if you have any feedback, or questions do leave a comment

Related Articles

Installing Hadoop on Ubuntu 14.04 ( Single Node Installation)

Hadoop Java HotSpot execstack warning

References

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

 

WARNING: UNPROTECTED PRIVATE KEY FILE!

If you try to copy your .ssh private keys from one machine to another, then you might see this error. You probably forgot to set correct permissions on your private key after copying the key to .ssh directory. It’s very important that these files be protected from any unauthorized access. Only owner of the key should be allowed access to the key files.

The complete error message:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Permissions 0777 for '/home/sumitc/.ssh/id_rsa' are too open.
It is recommended that your private key files are NOT accessible by others.
This private key will be ignored.
bad permissions: ignore key: /home/sumitc/.ssh/id_rsa

To fix this, you’ll need to reset the default permission on key files.  We can do this by resetting the permissions back to 600. That means only owner has read/write permissions to the key file.

sudo chmod 600 ~/.ssh/id_rsa
sudo chmod 600 ~/.ssh/id_rsa.pub

This should fix the permission error, and you should be able to do a ssh session correctly.

Error: [ng:cpws] Can’t copy! Making copies of Window or Scope instances is not supported.

I started getting this error while developing a AngularJS component. This error from AngularJS sort of tells what the problem is, but does not tell where the problem is. The error was confusing and got me some time to decrypt the cause of the error.

Error: [ng:cpws] Can't copy! Making copies of Window or Scope instances is not supported.
http://errors.angularjs.org/1.2.16/ng/cpws
at https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:78:12
at copy (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:844:11)
at copy (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:875:28)
at copy (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:858:23)
at copy (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:875:28)
at copy (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:858:23)
at Scope.$get.Scope.$digest (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:12250:47)
at Scope.$get.Scope.$apply (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:12516:24)
at HTMLDivElement. (https://ajax.googleapis.com/ajax/libs/angularjs/1.2.16/angular.js:18626:21)
at HTMLDivElement.m.event.dispatch (https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js:3:8436)

My code was not making any explicit copies of the scope. Rather, it was the AngularJS runtime which was making the copies of the parent scope while expanding a ng-repeat tag.  Further digging reveled that I had stored a reference of window.popup on my parent scope.  AngularJS code does not support copying the Windows object, so as to avoid causing cyclic references.   Removing this ‘Windows’ reference from the scope fixed this error.

How to do end-to-end testing of Websocket Multiplexing in NodeJS

WebSocket-multiplex is a small library on top of SockJS that allows you to do multiplexing over a single SockJS connection. Websocket multiplexing is a great way to exchange data between different server-client modules using a single websocket connection. You receive a logical separation of functionality, while at same time don’t have to unnecessarily open new TCP connection for different modules.

I have used Websocket multiplexing extensively, but unit testing these modules has always been a concern.  You don’t always want to run browser client to do end-to-end testing for all the multiplexed channels.   To overcome this, I wrote a small nodejs module websocket-multiplex-client over the client component of websocket multiplex library.  This has given me great flexibility to do complete end-to-end testing of my server side code.

Example end-to-end Unit Test involving Websocket Multiplex

We start by creating a SockJS service and a Multiplex Server.

var sockjs_opts = {sockjs_url: "http://cdn.sockjs.org/sockjs-0.3.min.js"};
var service = sockjs.createServer(sockjs_opts);
var multiplexer = new websocket_mutliplex.MultiplexServer(service);

This Multiplex server will be hosted by an express app.  Register SockJS to except websocket connections on url multiplex for the app server.  For simplicity we will run the webserver on port 8088.

 var app = express();
 var server = http.createServer(app);
 service.installHandlers(server, {prefix:'/multiplex'});
 server.listen(8088);

Next step is to register channel names and their handlers on server side. For example, we will register two handlers for channels Channel1 and Channel2. These handlers will be invoked whenever Client sends any data on these channels.

var channel1 = multiplexer.registerChannel('channel1');
 
 channel1.on('connection', function(conn) {
            conn.on('data', function(data) {
                conn.write('Data received on channel1: ' + data);
            });
 }); 
          

 var channel2 = multiplexer.registerChannel('channel2');
 
 channel2.on('connection', function(conn) {
            conn.on('data', function(data) {
                conn.write('Data received on channel2: ' + data);
            });
 });

To emulate the client side, we will use sockjs-client-node module. This module allows us to emulate sockJS client connection in NodeJS.  Let’s start by opening a websocket connection to our app server.    As soon as the websocket connection is opened, we will use websocket-multiplex-client to create a client multiplex handler.

var sockjsClient = new sockjs_client("http://127.0.0.1:8088/multiplex", null, { rtt: 201 });
sockjsClient.onopen = function() {
 console.log("sockjsClient:Open");
 var client = new multiplex_client(sockjsClient);

Next step is to register our client with channel1 and channel2.  onmessage handler for a channel will be invoked when ever server sends any message on that channel.  Similarly,  you can send any message to the server channel module, by channel1.send or channel2.send.

var channel1_client = client.channel('channel1');
 channel1_client.onmessage = function(msg){
     console.log("Received message on channel 1", msg)
 };

 var channel2_client = client.channel('channel2');
 channel2_client.onmessage = function(msg){
     console.log("Received message on channel 2", msg)
 };
 
setTimeout(function() {
   channel1_client.send('Hi to Channel 1'); 
   channel2_client.send('Hello to Channel 2 '); 
  }, 100);

Npm Module Install

You can use the websocket-multiplex-client in your nodejs app by using the following command.

npm -g install websocket-multiplex-client

Code and More documentation.

You can find more details about the module at npm link https://www.npmjs.org/package/websocket-multiplex-client.  Code and Examples can be downloaded from github at https://github.com/sumitchawla/websocket-multiplex-client

PostgresSQL JSON Add Delete Functions

Postgres 9.3 has good support for JSON types. There are lot of functions and operators available for parsing JSON. However it does not provide any functions for performing modifications to the JSON. You cannot even perform simple key add or delete operations on the JSON.  So you are pretty much stuck with a readonly only copy of JSON always.   Well there is no direct way of doing these operations , but you can use the HSTORE extension to overcome this shortcoming.  Postgres ships this module by default, but you need to make sure that its available as extension in your schema.   If not you can install the extension using following command.

CREATE EXTENSION hstore;

Function to Set a JSON Key Value
You can create following function to set a key into a JSON object

CREATE OR REPLACE FUNCTION set_key(json_in json, key_name text, key_value text)
	RETURNS json AS $$
	DECLARE item json;
	DECLARE fields hstore;
BEGIN
  -- Initialize the hstore with desired key value
  fields := hstore(key_name,key_value);

  -- Parse through Input Json and push each key into hstore 
  FOR item IN  SELECT row_to_json(r.*) FROM json_each_text(json_in) AS r
  LOOP
    --RAISE NOTICE 'Parsing Item % %', item->>'key', item->>'value';
    fields := (fields::hstore || hstore(item->>'key', item->>'value'));
  END LOOP;
   --RAISE NOTICE 'Result %', hstore_to_json(fields);
  RETURN hstore_to_json(fields);
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER
STRICT;

Example

SELECT set_key(('{"Name":"My Name", "Items" :[{ "Id" : 1, "Name" : "Name 1"}, { "Id" : 2, "Name 2" : "Item2 Name"}]}')::json, 'Id', '2');
-- Result
"{"Id": "2", "Name": "My Name", "Items": "[{ \"Id\" : 1, \"Name\" : \"Name 1\"}, { \"Id\" : 2, \"Name 2\" : \"Item2 Name\"}]"}"

Function to Delete a JSON Key Value
You can create following function to delete a key from a JSON object

CREATE OR REPLACE FUNCTION remove_key(json_in json, key_name text)
RETURNS json AS $$
DECLARE item json;
DECLARE fields hstore;
BEGIN
  -- Initialize the hstore with desired key being set to NULL
  fields := hstore(key_name,NULL);

  -- Parse through Input Json and push each key into hstore 
  FOR item IN  SELECT row_to_json(r.*) FROM json_each_text(json_in) AS r
  LOOP
   --RAISE NOTICE 'Parsing Item % %', item->>'key', item->>'value';
   fields := (fields::hstore || hstore(item->>'key', item->>'value'));
  END LOOP;
  --RAISE NOTICE 'Result %', hstore_to_json(fields);
  -- Remove the desired key from store
  fields := fields-key_name;
 
  RETURN hstore_to_json(fields);
END;
$$ LANGUAGE plpgsql
SECURITY DEFINER
STRICT;

Example

SELECT remove_key(('{ "Id" : "2" , "Name":"My Name", "Items" :[{ "Id" : 1, "Name" : "Name 1"}, { "Id" : 2, "Name 2" : "Item2 Name"}]}')::json, 'Id');

-- Result
"{"Name": "My Name", "Items": "[{ \"Id\" : 1, \"Name\" : \"Name 1\"}, { \"Id\" : 2, \"Name 2\" : \"Item2 Name\"}]"}"

Postgres Documentation for Handling JSON types:

http://www.postgresql.org/docs/9.3/static/functions-json.html

How to create a Web based File Browser using NodeJS, Express and JQuery Datatables

This week i released a NodeJS module file-browser.  file-browser is a nodejs utility to quickly create a HTTP based file share on your machine.  The module is available for download from npmjs at link https://www.npmjs.org/package/file-browser, and code is available on github https://github.com/sumitchawla/file-browser

How to install
npm -g install file-browser
How to Run

Change directory to the directory you want to browse. Then run the following command in that directory.

  file-browser

You would see the message Please open the link in your browser http://:8088 in your console. Now you can point your browser to your IP. For localhost access the files over http://127.0.0.1:8088

file-browser supports following command line switches for additional functinality.

    -p, --port <port>        Port to run the file-browser. Default value is 8088
    -e, --exclude <exclude>  File extensions to exclude. To exclude multiple extension pass -e multiple times. e.g. ( -e .js -e .cs -e .swp)
Screenshot
file-browser

Code 

Lets’s start by exploring the Server side NodeJS code first.

We start by creating an express app.  Then we add the current working directory as the public directory for the app.  This is the directory where you are running the command currently.  Adding the current working directory as a public directory makes all the files accessible for browsing.  Next, we add the module directory as a public directory.  This is the directory that will be containing all client side javascript and css files for our app.

var app = express();
var dir =  process.cwd();
app.use(express.static(dir)); //current working directory
app.use(express.static(__dirname)); //module directory
var server = http.createServer(app);

Next step is to create a files api in our app.  This is the api which would return the directory listing for either the top directory, or the requested directory.  API will returns metadata about all the files in the request directory.  This metadata will contain file name, file path and IsDirectory flag to differentiate files from directories.  For files we would also return file extension.  This will be used later to render file specific icons.

app.get('/files', function(req, res) {
 var currentDir =  dir;
 var query = req.query.path || '';
 if (query) currentDir = path.join(dir, query);
 console.log("browsing ", currentDir);
 fs.readdir(currentDir, function (err, files) {
     if (err) {
        throw err;
      }
      var data = [];
      files
      .forEach(function (file) {
        try {
                //console.log("processing ", file);
                var isDirectory = fs.statSync(path.join(currentDir,file)).isDirectory();
                if (isDirectory) {
                  data.push({ Name : file, IsDirectory: true, Path : path.join(query, file)  });
                } else {
                  var ext = path.extname(file);
                  if(program.exclude && _.contains(program.exclude, ext)) {
                    console.log("excluding file ", file);
                    return;
                  }       
                  data.push({ Name : file, Ext : ext, IsDirectory: false, Path : path.join(query, file) });
                }

        } catch(e) {
          console.log(e); 
        }        
        
      });
      data = _.sortBy(data, function(f) { return f.Name });
      res.json(data);
  });
});

Now lets get on the client side code. We would be creating a template.html file inside our lib directory.  This will be the main html page that will control all the client side interaction for our app. We start by including a set of bootstrap and font awesome css, and an app.css to control the layout of our app.  Similarly, we will be including a set of javascript files from our lib directory, and an app.js file. app.js includes the main interaction logic our our app, and is covered in next section.

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>File Browser</title>
    <link rel="stylesheet" href="/lib/bootstrap.min.css">
    <link rel="stylesheet" href="/lib/font-awesome/css/font-awesome.min.css">
    <link rel="stylesheet" href="/lib/app.css">
  </head>
  <body>
   <div class="panel panel-default mainpanel">
           <div class="panel-heading">
                   File Browser
                   <span class="up">
                    <i class="fa fa-level-up"></i> Up
                   </span> 
           </div>
      <div class="panel-body">
              <table class="linksholder">
              </table>
      </div>

  </div> 
    <script src="/lib/jquery.min.js"></script>
    <script src="/lib/bootstrap.min.js"></script>
    <script src="/lib/datatable/js/jquery.datatables.min.js"></script>
    <script src="/lib/app.js"></script>
  </body>
</html>

This html will be returned on hitting the home url of the app. Let’s add a server side code to do the same.

app.get('/', function(req, res) {
 res.redirect('lib/template.html'); 
});

app.js  is responsible for interacting with the server, and controling all the rendering logic of the app.  We start by creating a simple jQuery dataTable with default options on the HTMl element linksholder.  As soon as the app is loaded we will hit the /files api to get directory listing for the app directory, and will populate the datatable with the returned json array of metadata information about the files.

var table = $(".linksholder").dataTable();

  $.get('/files').then(function(data){
      table.fnClearTable();
      table.fnAddData(data);
  });

So we got the data from the server, and we have populated the datatable with it. Now lets work on some beautification of the datatable.  To control the UI, we need to modify the dataTable contructor call and pass it some options to control its UI.

First of all we would add following aoColumns option to control the table headers and the type of data that will be rendered in the datatable.  We will be rendering only one column in our datatable.  If the data row is  a directory, then we will use a fa-folder icon with the directory name.  If the data row is a file, then we will try to find a font-awesome icon for file type using the ext value from the metadata.  We will be also be adding a link to download the file from the server. File download path is controlled by data.Path.  This value contains the full relative path w.r.t to the startup directory of the app.

"aoColumns": [
          { "sTitle": "", "mData": null, "bSortable": false, "sClass": "head0", "sWidth": "55px",
            "render": function (data, type, row, meta) {
              if (data.IsDirectory) {
                return "<a href='#' target='_blank'><i class='fa fa-folder'></i>&nbsp;" + data.Name +"</a>";
              } else {
                return "<a href='/" + data.Path + "' target='_blank'><i class='fa " + getFileIcon(data.Ext) + "'></i>&nbsp;" + data.Name +"</a>";
              }
            }
          }
        ]

Second option we will be using is do some special consideration of directory rows.  Whenever a directory name is clicked, we would need to get the content of those directory from server.  To handle that we would need to attach following click handler for directory rows.

"fnCreatedRow" :  function( nRow, aData, iDataIndex ) {
          if (!aData.IsDirectory) return;
          var path = aData.Path;
          $(nRow).bind("click", function(e){
             $.get('/files?path='+ path).then(function(data){
              table.fnClearTable();
              table.fnAddData(data);
              currentPath = path;
            });
            e.preventDefault();
          });
        },


Finally lets create a app.css file and add some css rules for beautification of the app.

body { left:10%;width:80%;position:absolute; top:10%; color: black!important;}

.mainpanel { height:90%;min-height:400px;}

.linksholder ul { list-style-type: none; }

.linksholder li { font-size: 16px; line-height: 1.5em;}

.linksholder li .fa { margin-right: 5px; }

.linksholder a { color : black;}

.linksholder .fa { color: black;}

.dataTables_filter label { float: right; font-weight:normal; position: absolute; top : 10px; right: 5px; }

.dataTables_info { margin-top: 10px; background: ghostwhite; }

.up { margin-left: 20px; border:solid 1px black; cursor:pointer; width:100px; padding-left: 10px; padding-right:10px; }

Other AngularJS Articles

Error: [ng:cpws] Can’t copy! Making copies of Window or Scope instances is not supported.

AngularJS: Override default exception handler