One more simple thing, that had relatively scarce documentation out on the Internet.
As you might know, Hadoop NameNodes finally became HA in 2.0. The HDFS client configuration, which is already a little bit tedious, became more complicated.
Traditionally, there were two ways to configure a HDFS client (lets stick to Java)
- Copy over the entire Hadoop config directory with all the xml files, place it somewhere in the classpath of your app or construct a Hadoop Configuration object by manually adding in those files.
- Simply provide the HDFS NameNode URI and let the client do the rest.
Configuration conf = new Configuration(false);Most people prefer 2, unless you need way more configs from the actual xml config files, at which point it actually makes sense to copy the entire directory over. Now, with NameNodes being HA, which NameNode's URI do you use? The answer is : the active Namenode's rpc address. But then, your client can fail if the active Namenode becomes passive or dies.
conf.set("fs.default.name", "hdfs://localhost:8020"); // this is deprecated now
conf.set("fs.defaultFS", "hdfs://localhost:8020");
FileSystem fs = FileSystem.get(conf);
So, here's how you deal with this. (a simple program that copies files between local filesystem and HDFS)
Basically, you point your fs.defaultFS at your nameservice and let the client know how its configured (the backing namenodes) and how to fail over between them.
THE END
this code is not working for HA.
ReplyDeleteWhen using zookeper for maintaning state of namenodes (automatic failover), I would think that we should query the zookeepers for namenode-address.
ReplyDeletecan you give an example
DeleteWhat if a SOCKS server was being used to connect to HDFS initially? How can that be used in the case of HA
ReplyDeleteThanks very much. This code is working for me.
ReplyDelete