HADOOP PSEUDO DISTRIBUTED MODE INSTALLTION: Hadoop Pseudo Distributed Mode Cluster

In this blog I have recorded detailed steps with supported screenshots to install and setup Hadoop cluster in a Pseudo Distributed Mode using your Windows 64 bit PC or laptop

This is a 3 step process
Step 1 – Install VM Player
Step 2 – Setup Lubuntu Virtual Machine
Step 3 – Install Hadoop

Step 1 – Install VM Player

1. Do google search for VMware player and Download VMware player. You can also download it from website
http://vmware-player.joydownload.com

Start the installation process by double clicking on the downloaded exe file

2. Continue clicking on Next button to complete the installation

3. Click on the VM Player desktop icon to open VM Player tool

STEP 2 – SETUP LUBUNTU VIRTUAL MACHINE

1. Download Lubuntu 12.04 image (lubuntu-12.04-alternate-i386.iso) file from http://cdimage.ubuntu.com/lubuntu/releases/12.04/release/

2. In VM Player click on “Create New Virtual Machine”

3. New Virtual Machine Wizard would Pop-up

4. Select radio button “Installer disc image file (iso):” Browse and select file “lubuntu-12.04-alternate-i386.iso” and click on NEXT.

5. On the Next window make no changes and just click NEXT

6. On the Next window changes Virtual machine name to “HadoopPseudoMode” (or as per your choice) and click NEXT

7. On the Next window make no changes and just click NEXT

8. On the Next window make no changes and just click FINISH

9. Your Lubuntu VM is ready

10. Next we have on Install the OS in this VM

11. Click on Play Virtual Machine link

12. You will get a popup to install updates. Do not do it. Just click “Remind me later”

13. Mouse will not work inside the Virtual machine until the OS is completely installed. You have to use keyboard only. At any point to come out of the VM machine and access your windows machine use hotkey CTRL + ATL

14. On the Language screen select “English” and press ENTER button.

15. On the next screen just press ENTER button

16. On the Language screen select “English” and press ENTER button.

17. On the next screen select “India” as location and press ENTER button.

18. On the next “Configure the Keyboard” screen, select “Yes” and press ENTER button

19. On the next “Configure the Keyboard” screen, press “Y” button from your keyboard

20. On the next “Configure the Keyboard” screen, press “W” button from your keyboard

21. On the next set of screens, select “NO” option and keep pressing “ENTER” button until you get the “Configure Keyboard” confirmation screen

22. Click on “Continue” and OS components would start loading

23. On the next screen, just select “Continue” and press ENTER button

24. On the next screen, enter username “adminuser” and select “Continue” and press ENTER button. Same screen would be popped-up to re-enter the username. Enter username “adminuser” and select “Continue” and press ENTER button

25. On the next screen, enter password “adminuser” and select “Continue” and press ENTER button. Same screen would be popped-up to re-enter the password. Enter password “adminuser” and select “Continue” and press ENTER button

26. On the next screen, select “NO” and press ENTER button

27. On the next screen, select “YES” and press ENTER button

28. On the next screen, press ENTER button

29. On the next screen, press ENTER button

30. On the next screen, press ENTER button

31. On the next screen, press ENTER button

32. On the next screen, select “YES” and press ENTER button

33. On the next screen, select “Continue” press ENTER button

34. The installation will continue for almost 15 min. Wait for it to progress

35. On the next screen, select “YES” press ENTER button

36. On the next screen, select “YES” press ENTER button

37. On the next screen, select “continue” and press ENTER button

38. This will complete the OS installation and the login screen would showup

39. Login into the adminuser account and your Lubuntu machine is up and ready

STEP 3 – INSTALL HADOOP

1. We are doing the Pseudo mode setup in which one system will be used to host namenode, secondary namenode and datanode. However each of them will run on a separate JVM (java virtual machine)

2. Login to the Lubuntu machine adminuser username and passoword

3. From the Menu Bar go to Accessories >> LXTerminal

4. Open LXTerminal to type commands

5. Let’s add a new group called "hadoop" using following command

sudo addgroup hadoop

6. You will be prompted for password. Give password of user with whom you are logged in. Enter password as adminuser (sudo is used if we want to run any command as super user(you can say admin of system).

7. Let’s add a new user called hduser in group hadoop

sudo adduser --ingroup hadoop hduser

8. You would be asked to enter a password for the new user hduser we are creating. Enter password as hduser (or as per your choice however do ensure to remember it)

9. You would be asked to enter a name and work details for the new user hduser. Leave then blank by just pressing enter button. A confirmation would popup for Y (yes) or N (no). Type Y and press enter. New user hduser is created.

10. Let’s give admin rights to hduser

sudo adduser hduser sudo

11. Logout using the logout option from the menubar of your VM and re-login with the hduser account

12. Open the LXTerminal to type commands

13. Let’s install Java 6 as Hadoop is developed in Java

sudo apt-get install openjdk-6-jdk

14. You will be prompted for a password. Enter password as hduser and then a confirmation would be prompted. Type Y and press enter

15. Java version 6 will be downloaded and installed. This will take around 5 minutes or more based on your internet download speed

16. Next we need to install ssh server. ssh stands for secure shell. To login remotely from one linux machine to other linux machine ssh is used and it gives access of shell of the remote machine.

sudo apt-get install openssh-server

17. You might be prompted for a password. Enter password as hduser and then a confirmation would be prompted. Type Y and press enter

18. SSH will be downloaded and installed. This will take few minutes

19. We can login to remote machine using following command

ssh <ip-address>

20. If you try ssh localhost, you will be prompted for password. We want to make this login password-less. One way of doing it is to use keys. We can generate keys using following command.

ssh-keygen -t rsa -P ""

21. It will prompt you to give path to store keys, don’t type anything, just press enter.

22. This command will generate two keys at "/home/hduser/.ssh/" path. id_rsa and id_rsa.pub.
a> id_rsa is private key.
b > id_rsa.pub is public key

23. To login into remote machine I will share my public key with that machine. In our case it is local machine, so following command is used.
ssh-copy-id -i /home/hduser/.ssh/id_rsa.pub hduser@localhost

24. This will prompted for confirmation. Type yes and enter. And then for password. Give password for hduser.

25. Localhost added to the list of know hosts confirmation message would show on screen

26. Now enter below command and you will not be prompted for any password which confirms the keys are shared properly
ssh localhost

27. Next step is to install Hadoop 1.3.0. You can download file hadoop-1.0.3.tar.gz from below listed paths in your windows system
https://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz

28. For us to move the hadoop-1.0.3.tar.gz file from windows to Lubuntu system we can use winscp or filezilla tool. We will download winscp555setup.exe from below listed path and install on windows system
http://winscp.net/download/winscp556setup.exe

29. Continue clicking next to finish the installation

30. Host name would be the ipaddress of the Lubuntu machine. To find that in your lubuntu machine enter command
Ifconfig

31. IP Address of my lubuntu machine is 192.168.xx.xxx

32. Enter the IP Address of your lubuntu machine in Winscp and click on login.

33. You would be prompted for username and password. Enter both as hduser

34. You will be connected to your lubuntu system on your right windows and windows system on the left. You can drag and drop and move files between system

35. Move the file hadoop-1.0.3.tar.gz file from windows to Lubuntu system under path /home/hduser/downloads

36. In the Lubuntu VM, go to folder path /home/hduser/downloads. Right click on hadoop-1.0.3.tar.gz file and click on extract

37. Rename the extracted folder from Hadoop-1.0.3 to Hadoop

38. Cut and move folder Hadoop to path /home/hduser

39. Now we need to make configurations in hadoop configuration file. You will find these files in "/home/hduser/hadoop/conf" folder.

40. There are 4 important files in this folder
hadoop-env.sh
hdfs-site.xml
mapred-site.xml
core-site.xml

41. hadoop-env.sh is a file which contains Hadoop environment related properties. Here we can set properties like where is java home, what is heap memory size, what is class path of hadoop, which version of IP to use etc. we will set java home in this file. For me java home is "/usr/lib/jvm/java-6-openjdk-i386". Open the file using leafpad

42. Search for # export JAVA_HOME and replace the entire line with the below line in file and save.

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

43. hdfs-site.xml is file which contains properties related to hdfs(hadoop distributed file system.). We need to set here the replication factor here. By default replication factor is 3. Since we are installing hadoop in single machine. We will set it to 1. Copy following in-between the configuration tag in file.

<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>

44. mapred-site.xml is a file that contains properties related to map reduce. we will set here ip address and port of machine on which job tracker is running. copy following in between configuration tag

<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>

45. core-site.xml is property file which contains property which are common or used by both map reduce and hdfs. here we will set ip address and port number of machine on which namenode will be running. Other property tells where should hadoop store files like fsimage and blocks etc. Copy following in between configuration tag.

<property>
<name>hadoop.tmp.dir</name>
<value>/home/hduser/hadoop_tmp_files</value>
<description>A base for other temporary directories.</description>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

46. Open terminal and format namenode with the following command. Namenode should be formatted only once, before you start using your hadoop cluster. If you format namnode later, you will lose all the data stored on hdfs. Notice that "/home/hduser/hadoop/bin/" folder contains all the important scripts to start hadoop, stop hadoop, access hdfs, format hdfs etc.

/home/hduser/hadoop/bin/hadoop namenode -format

47. Start hadoop using following command
/home/hduser/hadoop/bin/start-all.sh

48. Check if Hadoop is functioning using command
jps

49. It should show all java processes running. If not something went wrong during installation

namenode
secondary namenode
datanode
jobtracker
tasktracker

50. Hadoop has a webui. Open Chrome browser in your lubuntu machine and browse for Localhost:50070 This will show the complete summary of namenode and other details like Number of live nodes. Since we did a Pseudo Mode setup there is only one live node (datanode)

Hope this helped. Do reach out to me if you have any questions

If you found this blog useful, please convey your thanks by posting your comments

I have posted another blog on how to setup Hadoop Fully Distributed Mode (Multi Node) cluster. Link to view it http://hadoopfullydistributedmode.blogspot.in/

18 comments:

UnknownNovember 17, 2014 at 12:57 AM
Thanks
UnknownNovember 19, 2014 at 2:16 AM
This makes very easy to install Hadoop pseudo distributed mode It is very helpful.Thanks a lot.
Prashant KotiMarch 22, 2015 at 7:40 AM
Please let me know if I can install Hadoop on Core2Duo processor with 3-4 GB of RAM.
UnknownMay 23, 2015 at 11:47 PM
Can I use the lubunto in this configuration to install R, Rstudio and PSQL?
UnknownJuly 4, 2015 at 9:14 AM
Thanks, its really helpful for Hadoop starters
UnknownOctober 19, 2015 at 5:59 PM
Thanks a lot. You really saved me.
i tried sandbox clodera ubuntu and was struggling from 30 hours.
when i lost hpe your blog saved me. Thanks a lot.
if possible plz tell how to run simple word count program in hadoop using same setup
UnknownOctober 28, 2015 at 11:09 AM
Thanks alot brother..!
ssNovember 28, 2015 at 12:51 PM
I am getting problem in step 26 (2nd half).
26th step is "Now enter below command and you will not be prompted for any password which confirms the keys are shared properly
ssh localhost"

Upto step 25, it is running as per the steps. After 26 it is asking for the ' passphrase for /home/hduser/.ssh/id_rsa'
Plz help me to solve this problem and download Hadoop.

UnknownApril 15, 2017 at 10:39 AM
Great thanks for this useful article. Please I would like to know can I apply the same steps for hadoop 2.7 and java 1.6. I only have physical 4 GB ram. How much memory should I allocate
the lubuntu VM?
UnknownAugust 10, 2017 at 1:14 PM
{ Step 46 }
Permission denied in namenode -format command. Please reply as soon as possible.
amarJune 4, 2018 at 12:58 AM
nice information
maheshJune 17, 2018 at 11:47 PM
Thanks for sharing this Information. This content is so informative and helpful for many people.
Hadoop Training in Noida
UnknownJuly 21, 2018 at 3:35 AM
Thanks for sharing the valuable information to share with us. For more information please visit our website. Book Online for Hadoop Training In Ameerpet@ Best Institute

TechnogeekscsSeptember 15, 2018 at 12:26 AM
Great work, thank you...! Hadoop Pune
UnknownOctober 3, 2018 at 5:03 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Admin online training

HADOOP PSEUDO DISTRIBUTED MODE INSTALLTION

Thursday, October 30, 2014

Hadoop Pseudo Distributed Mode Cluster

Step 1 – Install VM Player

STEP 2 – SETUP LUBUNTU VIRTUAL MACHINE

STEP 3 – INSTALL HADOOP

18 comments:

Blog Archive