HDFS Client Configs for talking to HA Hadoop NameNodes

One more simple thing, that had relatively scarce documentation out on the Internet.

As you might know, Hadoop NameNodes finally became HA in 2.0. The HDFS client configuration, which is already a little bit tedious, became more complicated.

Traditionally, there were two ways to configure a HDFS client (lets stick to Java)

Copy over the entire Hadoop config directory with all the xml files, place it somewhere in the classpath of your app or construct a Hadoop Configuration object by manually adding in those files.
Simply provide the HDFS NameNode URI and let the client do the rest.

Configuration conf = new Configuration(false);
conf.set("fs.default.name", "hdfs://localhost:8020"); // this is deprecated now
conf.set("fs.defaultFS", "hdfs://localhost:8020");
FileSystem fs = FileSystem.get(conf);

Most people prefer 2, unless you need way more configs from the actual xml config files, at which point it actually makes sense to copy the entire directory over. Now, with NameNodes being HA, which NameNode's URI do you use? The answer is : the active Namenode's rpc address. But then, your client can fail if the active Namenode becomes passive or dies.

So, here's how you deal with this. (a simple program that copies files between local filesystem and HDFS)

Basically, you point your fs.defaultFS at your nameservice and let the client know how its configured (the backing namenodes) and how to fail over between them.

THE END

Comments

UnknownSeptember 29, 2015 at 2:24 AM
this code is not working for HA.
ReplyDelete
Replies
FrodeJanuary 15, 2016 at 3:40 PM
When using zookeper for maintaning state of namenodes (automatic failover), I would think that we should query the zookeepers for namenode-address.
ReplyDelete
Replies
kunalkMay 2, 2016 at 3:20 AM
What if a SOCKS server was being used to connect to HDFS initially? How can that be used in the case of HA
ReplyDelete
Replies
Marcus AidleyAugust 17, 2016 at 9:24 AM
Thanks very much. This code is working for me.
ReplyDelete
Replies

Add comment

throw new Exception()

Search This Blog

HDFS Client Configs for talking to HA Hadoop NameNodes

Labels

Comments

Post a Comment

Popular posts from this blog

Learning Spark Streaming #1

Setting up Hadoop/YARN/Spark/Hive on Mac OSX El Capitan