Setup And Configure Single-Node Hadoop Installation
This describes how to setup and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Single node cluster means only one DataNode running and setting up all the NameNode, DataNode, ResourceManager and NodeManager on a single machine. It can easily and efficiently the sequential workflow in a smaller environment as compared to large environments which contains terabytes of data distributed across hundreds of machines.
Multi node cluster, there are more than one DataNode running and each DataNode is running on different machines. The multi node cluster is practically used in organizations for analyzing Big Data. In real time when we deal with petabytes of data, it needs to be distributed across hundreds of machines to be processed.
But for now i will show you how to Setup And Configure Single-Node cluster.
- GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
- Windows is also a supported platform but the followings steps are for Linux only.
- Java™ must be installed. Recommended Java versions are described at Hadoop Java Versions
- Ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons if the optional start and stop scripts are to be used. Additionally, it is recommmended that pdsh also be installed for better ssh resource management.
If your cluster doesn’t have the requisite software you will need to install it.
Prepare to Start the Hadoop Cluster
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors. For this example i will use hadoop version 3.2.1 that currently stable. You could download with script below :
$ wget https://www-eu.apache.org/dist/hadoop/common/stable/hadoop-3.2.1.tar.gz -O /tmp/hadoop-3.2.1.tar.gz
After your successfully download the distribution file you could find it at your /tmp folder. Unpack the downloaded Hadoop distribution.
$ tar -xvf /tmp/hadoop-3.2.1.tar.gz;
Now you could move the distribution file to the folder you want. But i prefer you move it to
/usr/local/hadoop. In this tutorial i will use that folder.
$ mv /tmp/hadoop-3.2.1 /usr/local/hadoop
In the distribution, edit the file
/usr/local/hadoop/etc/hadoop/hadoop-env.sh to define some parameters as follows:
If you didn’t now where your java binary path you could run this command to find out where is your java installation path :
$ dirname $(readlink -f $(which java))|sed 's^/bin^^'
Try the following command to find out your hadoop running:
$ bin/hadoop version
This will display the usage documentation for the hadoop script.
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process.
Setup and Configure Hadoop all complete.