Hadoop 2. Running on Ubuntu for Single-Node Cluster - 2018

bogotobogo.com site search:

Linux - system, cmds & shell

Running Hadoop on Ubuntu

In the previous chapter (Setting up Hadoop on Ubuntu), we set up Hadoop on Ubunut. Now, we want to check if we did it correctly.

Starting Hadoop

$ pwd
/usr/local/hadoop/sbin
$ ls
distribute-exclude.sh    start-all.cmd        stop-all.sh
hadoop-daemon.sh         start-all.sh         stop-balancer.sh
hadoop-daemons.sh        start-balancer.sh    stop-dfs.cmd
hdfs-config.cmd          start-dfs.cmd        stop-dfs.sh
hdfs-config.sh           start-dfs.sh         stop-secure-dns.sh
httpfs.sh                start-secure-dns.sh  stop-yarn.cmd
mr-jobhistory-daemon.sh  start-yarn.cmd       stop-yarn.sh
refresh-namenodes.sh     start-yarn.sh        yarn-daemon.sh
slaves.sh                stop-all.cmd         yarn-daemons.sh

Run the following command:

$ /usr/local/hadoop/sbin/start-all.sh

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on our machine. My output looks like this:

$ /usr/local/hadoop/sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
14/02/24 22:17:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-K-PC.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-K-PC.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-K-PC.out
14/02/24 22:18:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-K-PC.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-K-PC.out

To check whether the expected Hadoop processes are running, we can use jps:

$ jps
5036 SecondaryNameNode
4869 DataNode
4183 ResourceManager
5573 Jps
5256 NodeManager

We can also check with netstat if Hadoop is listening on the configured ports:

$ netstat  -plten | grep java
tcp        0      0 0.0.0.0:50010           0.0.0.0:*               LISTEN      1001       61428       4869/java       
tcp        0      0 0.0.0.0:50075           0.0.0.0:*               LISTEN      1001       62496       4869/java       
tcp        0      0 0.0.0.0:50020           0.0.0.0:*               LISTEN      1001       62313       4869/java       
tcp        0      0 0.0.0.0:50090           0.0.0.0:*               LISTEN      1001       62866       5036/java       
tcp6       0      0 :::8030                 :::*                    LISTEN      1001       56753       4183/java       
tcp6       0      0 :::8031                 :::*                    LISTEN      1001       57468       4183/java       
tcp6       0      0 :::8032                 :::*                    LISTEN      1001       57504       4183/java       
tcp6       0      0 :::8033                 :::*                    LISTEN      1001       57544       4183/java       
tcp6       0      0 :::8040                 :::*                    LISTEN      1001       65972       5256/java       
tcp6       0      0 :::8042                 :::*                    LISTEN      1001       67666       5256/java       
tcp6       0      0 :::34229                :::*                    LISTEN      1001       65960       5256/java       
tcp6       0      0 :::8088                 :::*                    LISTEN      1001       57456       4183/java

Stopping Hadoop

To stop all the daemons running on our machine, run the following command:

$ /usr/local/hadoop/sbin/stop-all.sh
This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh
14/02/24 22:22:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: no namenode to stop
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
14/02/24 22:22:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop

Running MapRdeuce

Now it's time to run our first Hadoop MapReduce job. We will use the WordCount example job which reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occurred, separated by a tab.

We will use three ebooks from Project Gutenberg for this example that can be downloaded from the following links:

Download each as text files in Plain Text UTF-8 encoding and store the files in a local temporary directory of choice, for example /tmp/gutenberg:

$ ls -l
total 3512
-rw-r--r-- 1 hduser hadoop  661806 Feb 24 21:44 pg20417.txt
-rw-r--r-- 1 hduser hadoop 1540091 Feb 24 21:44 pg4300.txt
-rw-r--r-- 1 hduser hadoop 1391683 Feb 24 21:44 pg5000.txt

We need to restart our Hadoop cluster if it's not running already:

$ /usr/local/hadoop/sbin/start-all.sh

Local example data to HDFS

Before we run the actual MapReduce job, we need to copy the files from our local file system to Hadoop's HDFS:

$ pwd
/usr/local/hadoop/bin

$ ls
container-executor  hadoop  hadoop.cmd  hdfs  hdfs.cmd  mapred  mapred.cmd  rcc  test-container-executor  yarn  yarn.cmd

$ hadoop dfs -copyFromLocal /tmp/gutenberg /home/hduser/gutenberg
$  hadoop dfs -copyFromLocal /tmp/gutenberg /home/hduser/gutenberg
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

14/02/24 22:59:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
copyFromLocal: Call From K-PC/127.0.1.1 to localhost:54310 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

I stopped here because of the connection error. I have to figure it out, I will comeback later.

Reference: