Running Hadoop Job Remotely from a Java Client
I have VirtualBox VM running HBase and Hadoop in a pseudodistributed mode.
I have modified some simple MapReduce code to count the number of rows in
a given HBase table (the Hbase MapReduce RowCounter code). When I compile
the modified code into a jar file, transfer it to the VM, and run it
normally via the hadoop command line, everything is great. However, what I
want to be able to do is to run it from my Java client on my Windows
machine (from the Java code, not via an ssh command to execute hadoop
command lines – i.e., hadoop jar ). When I try to run it from the Windows
side (Java client), all the necessary connections are made into Hadoop and
HBase on the VM, but I receive a "classnotfoundexception" that Hadoop
cannot find my Mapper class.
I have manually copied the jar file onto HDFS and tried to point the Java
client to the location via setting the configuration option
(conf.set("mapred.jar", "hdfs:///RowCountTest.jar");). However, it still
cannot locate the class (don't know if it is even looking for the jar).
First, do you know what needs to be done in order for Hadoop to recognize
the class file in a jar stored HDFS when running a job from a remote
client?
Second, do you know if there is any way to "pass" the necessary class
files along with the job to the cluster without pre-loading the jar file?
No comments:
Post a Comment