Hadoop Installation on Local Machine (Single node Cluster)

The main goal of this tutorial is to start working with hadoop by making a simple single node hadoop cluster even at your home. And start working around with different tools, codes, syntax and software related to hadoop.

Note: here i am using ubuntu 12.04 with apache hadoop 1.2.1 (most stable version till date) for running the pseudo node cluster.

So now lets get started.

Prerequisite for Hadoop Installation.

Java 1.6 + (aka java 6)

Hadoop requires java 1.5+ for its working but Java 1.6 (aka java 6 ) is recommended.
So first thing you need in your machine is java 1.6. Check you have java 1.6 installed or not.

$ java -version

If it is not there you can install the same with the below command:

$ sudo apt-get install openjdk-6-jre

after installation check if java is installed properly or not :

If the above output comes, java is installed properly on your system. You can check for the installation package at   /usr/lib/jvm/

Adding a dedicated system user

I prefer to have a dedicated system user for hadoop and the same is also recommended. It helps to separate the hadoop installation with other software application and also with the user account running on the single node. So for creating a separate user you can use the below commands:

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add a user hduser and a group hadoop to your local machine.

Configuring SSH to localhost

Hadoop requires SSH access to manage its nodes. So for this single node installation of Hadoop we need to configure the SSH access to localhost. We will be creating this access for the hduser we created in the previous step.

$ sudo apt-get install openssh-server 

After the SSH server installation. we have to generate an SSH key for the hduser.

$ su - hduser
$ ssh-keygen -t rsa -P ""

Here the second command will generate a key pair with an empty password.

Note: Empty key is not recommended but here we are putting the key as empty as we don't want to enter the password every time hadoop interacts with its nodes.

Now since the key pair is generated we have to enable SSH access to local machine with this newly created key. For that you have put the below command.

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config

Finally you can check for the same using command:

$ ssh localhost

Hadoop Installation

Download and Extract Hadoop

So if you have all the above prerequisite in your machine,you are good to go with the hadoop installation.
First download Hadoop from HERE and extract the same at any location, i kept it at /usr/local. Also you need to change the owner permission of all files to hduser and group to hadoop.

$ cd /usr/local
$ sudo tar xzf hadoop-1.2.1.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

Update $HOME/.bashrc

Update the following lines at the end of $Home/.bashrc file of user hduser. Well if you are using a different shell than bash, you have to update the appropriate configuration file.

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less

Configuration File Setup

Till now we are almost done with the hadoop installation. Now what we have to do is, change a few properties of the configuration file provided in Hadoop Conf folder.
But before that we have to make a directory where we are going to save our data on the local node cluster. We will be saving our data on HDFS.

So lets create the directory and set the required ownership and permission.

$ sudo mkdir /tmp/hadoop_data
$ sudo chown hduser:hadoop /tmp/hadoop_data
$ sudo chmod 777 /tmp/hadoop_data

Now lets start changing a few of the required configuration file.

Note: you will find all these configuration file inside hadoop/conf directory where you have put your file. In my case it is at /usr/local/hadoop/conf.


Open the hadoop-env.sh file and change the only required environment variable for local machine installation. And it is JAVA_HOME. For this you just need to uncomment the below line and set the JAVA_HOME environment to your JDK/JRE directory. 

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre


In between <configuration> ... </configuration> put the below code:

  <description>directory for hadoop data</description>

  <description> data to be put on this URI</description>





Formatting and Starting the Single Node Cluster.

So if you are done till now successfully, you are done with the installation part. Now we just have to format the namenode and start the cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

the output will be something like:

Starting the single node cluster:

hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

After the start-up you will get an output like:

The above command starts the Namenode, Datanode, Secondary Namenode, Job Tracker and Task Tracker on your local machine. 

you can try using the JPS command to see if these services are running or not.

hduser@ubuntu:/usr/local/hadoop$ jps
2246 TaskTracker
1927 JobTracker
1944 DataNode
2091 SecondaryNameNode
2311 Jps
1993 NameNode

So here you are done with the Single node installation of hadoop on your local machine.

Hadoop Web Interfaces 

Hadoop comes with web interfaces which by default can be seen at the following location.

Namenode: http://localhost:50070/ 

JobTracker: http://localhost:50030/

Task Tracker: http://localhost:50060/

Still getting Error! Try this.

Disabling IPV6

To disable IPV6 on ubuntu, open /etc/sysctl.conf and the below lines

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Note: After Disabling IPV6 you have to reboot you computer for the change effect to take place. If this doen't work out for you, there are other methods also to disable the same that you can find on net.

So these are all the step by step procedure for making a single node cluster at your home and start working on the same. Hope you find it Helpful.

Let me know if you have any doubts in understanding anything into the comment section and i will be really glad to answer your questions :)

