Hadoop Installation on Local Machine (Single node Cluster)



As requested by many of our visitors and subscribes, here i am with the single node cluster installation of Hadoop on Ubuntu.

So if you are new to hadoop you can follow the below links to get some idea about:

What is Hadoop?  and also  Hadoop tutorial series.

The main goal of this tutorial is to start working with hadoop by making a simple single node hadoop cluster even at your home. And start working around with different tools, codes, syntax and software related to hadoop.

Note: here i am using ubuntu 12.04 with apache hadoop 1.2.1 (most stable version till date) for running the pseudo node cluster.

So now lets get started.

Prerequisite for Hadoop Installation.



Java 1.6 + (aka java 6)


Hadoop requires java 1.5+ for its working but Java 1.6 (aka java 6 ) is recommended.
So first thing you need in your machine is java 1.6. Check you have java 1.6 installed or not.


$ java -version

If it is not there you can install the same with the below command:

$ sudo apt-get install openjdk-6-jre


after installation check if java is installed properly or not :


If the above output comes, java is installed properly on your system. You can check for the installation package at   /usr/lib/jvm/


Adding a dedicated system user


I prefer to have a dedicated system user for hadoop and the same is also recommended. It helps to separate the hadoop installation with other software application and also with the user account running on the single node. So for creating a separate user you can use the below commands:

$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add a user hduser and a group hadoop to your local machine.




Configuring SSH to localhost


Hadoop requires SSH access to manage its nodes. So for this single node installation of Hadoop we need to configure the SSH access to localhost. We will be creating this access for the hduser we created in the previous step.

$ sudo apt-get install openssh-server 

After the SSH server installation. we have to generate an SSH key for the hduser.

$ su - hduser
$ ssh-keygen -t rsa -P ""



Here the second command will generate a key pair with an empty password.

Note: Empty key is not recommended but here we are putting the key as empty as we don't want to enter the password every time hadoop interacts with its nodes.

Now since the key pair is generated we have to enable SSH access to local machine with this newly created key. For that you have put the below command.

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known_hosts file. If you have any special SSH configuration for your local machine like a non-standard SSH port, you can define host-specific SSH options in $HOME/.ssh/config

Finally you can check for the same using command:

$ ssh localhost


Hadoop Installation



Download and Extract Hadoop


So if you have all the above prerequisite in your machine,you are good to go with the hadoop installation.
First download Hadoop from HERE and extract the same at any location, i kept it at /usr/local. Also you need to change the owner permission of all files to hduser and group to hadoop.

$ cd /usr/local
$ sudo tar xzf hadoop-1.2.1.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop


Update $HOME/.bashrc


Update the following lines at the end of $Home/.bashrc file of user hduser. Well if you are using a different shell than bash, you have to update the appropriate configuration file.

export HADOOP_HOME=/usr/local/hadoop
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
export PATH=$PATH:$HADOOP_HOME/bin


Configuration File Setup


Till now we are almost done with the hadoop installation. Now what we have to do is, change a few properties of the configuration file provided in Hadoop Conf folder.
But before that we have to make a directory where we are going to save our data on the local node cluster. We will be saving our data on HDFS.

So lets create the directory and set the required ownership and permission.

$ sudo mkdir /tmp/hadoop_data
$ sudo chown hduser:hadoop /tmp/hadoop_data
$ sudo chmod 777 /tmp/hadoop_data

Now lets start changing a few of the required configuration file.

Note: you will find all these configuration file inside hadoop/conf directory where you have put your file. In my case it is at /usr/local/hadoop/conf.

hadoop-env.sh

Open the hadoop-env.sh file and change the only required environment variable for local machine installation. And it is JAVA_HOME. For this you just need to uncomment the below line and set the JAVA_HOME environment to your JDK/JRE directory. 

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/openjdk-6-jre

core-site.xml

In between <configuration> ... </configuration> put the below code:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/tmp/hadoop_data</value>
  <description>directory for hadoop data</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description> data to be put on this URI</description>
</property>

mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>...
  </description>
</property>


hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>


Formatting and Starting the Single Node Cluster.


So if you are done till now successfully, you are done with the installation part. Now we just have to format the namenode and start the cluster.

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

the output will be something like:


Starting the single node cluster:


hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

After the start-up you will get an output like:


The above command starts the Namenode, Datanode, Secondary Namenode, Job Tracker and Task Tracker on your local machine. 

you can try using the JPS command to see if these services are running or not.

hduser@ubuntu:/usr/local/hadoop$ jps
2246 TaskTracker
1927 JobTracker
1944 DataNode
2091 SecondaryNameNode
2311 Jps
1993 NameNode


So here you are done with the Single node installation of hadoop on your local machine.



Hadoop Web Interfaces 


Hadoop comes with web interfaces which by default can be seen at the following location.

Namenode: http://localhost:50070/ 



JobTracker: http://localhost:50030/



Task Tracker: http://localhost:50060/




Still getting Error! Try this.


Disabling IPV6

To disable IPV6 on ubuntu, open /etc/sysctl.conf and the below lines

# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Note: After Disabling IPV6 you have to reboot you computer for the change effect to take place. If this doen't work out for you, there are other methods also to disable the same that you can find on net.


So these are all the step by step procedure for making a single node cluster at your home and start working on the same. Hope you find it Helpful.

Let me know if you have any doubts in understanding anything into the comment section and i will be really glad to answer your questions :)



If you like what you just read and want to continue your learning on BIGDATA you can subscribe to our Email and Like our facebook page





Find Comments below or Add one

El mehdi Tantaoui said...

Hi,

I've followed all the instructions but at the end http://localhost:50070 does not work for me. Need your help. Thanx

Deepak Kumar said...

@ El mehdi Tantaoui ... Can you elaborate you problem a bit more... What is the error that you are getting ? And have you checked it using JPS on terminal ? Are the 5 processes that i mentioned above running ?

Anonymous said...

Hi
i ve completed the entire process and when i try to list the processes using jps, i don't see namenode and datanode. I checked whether they are running using ps -ef; it displays datanode as well as namenode! but they re not displayed with jps command. Help me solve it!
Thanks

Shash said...

Awesome tutorial man..

A-shok said...

Sir ...Im using OEL R5 U4 in Vmware and there is possible for installation of hadoop

Rakesh Maurya said...

Hi Deepak
Can you please tell me how to install Hadoop on Windows 7? I found many tutorials but got kind of confused. It would be very helpful if you could provide steps for it.

Thanks

dwguy said...

the move command assumed the haddop dir is olderversion 1.03 it should be hadoop-1.2.1?

Post a Comment