We have already started NameNode, DataNode, Resource Manager and Node Manager and now we are ready to execute Hadoop MapReduce job in the Single Node (pseudo-distributed mode) cluster.
We will run wordcount
MapReduce job available in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar
to count number of words in input file.
- Create a directory in HDFS by executing following command
hdfs dfs -mkdir -p InputDir
You would observe that in Hadoop File system, new directory has been created.
- Create some text file(say 'SampleInput.txt') in local disk with some content written in it.
We will copy this file to the newly created 'input' directory in HDFS by executing following command.
hdfs dfs -copyFromLocal C:/SampleInput.txt InputDir
You would observe that file is present in Hadoop file system.
- Next step would be to run wordcount MapReduce job provided in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar.
yarn jar "C:/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount InputDir OutputDir
InputDir is the directory in file system where SampleInput.txt file is present. Above wordcount program will process the number of words in the file and store them in output file OutputDir
- You can see that new Job is created to perform this task, Job can be tracked in the NodeManager web interface.
- Let’s check if wordcount mapreduce application has successfully done its task by checking the content of OutputDir file. Execute command
hdfs dfs -cat OutputDir/*
You should see similar output like below which will show the frequency of each word in the input file.
- When you’re done, stop the daemons with below command