Skip to main content

HDFS Client Configs for talking to HA Hadoop NameNodes

One more simple thing, that had relatively scarce documentation out on the Internet.

As you might know, Hadoop NameNodes finally became HA in 2.0. The HDFS client configuration, which is already a little bit tedious, became more complicated.

Traditionally, there were two ways to configure a HDFS client (lets stick to Java)

  1. Copy over the entire Hadoop config directory with all the xml files, place it somewhere in the classpath of your app or construct a Hadoop Configuration object by manually adding in those files.
  2. Simply provide the HDFS NameNode URI and let the client do the rest. 

        Configuration conf = new Configuration(false);
        conf.set("", "hdfs://localhost:8020"); // this is deprecated now
        conf.set("fs.defaultFS", "hdfs://localhost:8020");
        FileSystem fs = FileSystem.get(conf);
Most people prefer 2, unless you need way more configs from the actual xml config files, at which point it actually makes sense to copy the entire directory over.  Now, with NameNodes being HA, which NameNode's URI do you use? The answer is : the active Namenode's rpc address. But then, your client can fail if the active Namenode becomes passive or dies.

So, here's how you deal with this. (a simple program that copies files between local filesystem and HDFS)

Basically, you point your fs.defaultFS at your nameservice and let the client know how its configured (the backing namenodes) and how to fail over between them.



  1. this code is not working for HA.

  2. When using zookeper for maintaning state of namenodes (automatic failover), I would think that we should query the zookeepers for namenode-address.

  3. What if a SOCKS server was being used to connect to HDFS initially? How can that be used in the case of HA

  4. Thanks very much. This code is working for me.


Post a Comment

Popular posts from this blog

Learning Spark Streaming #1

I have been doing a lot of Spark in the past few months, and of late, have taken a keen interest in Spark Streaming. In a series of posts, I intend to cover a lot of details about Spark streaming and even other stream processing systems in general, either presenting technical arguments/critiques, with any micro benchmarks as needed.

Some high level description of Spark Streaming (as of 1.4),  most of which you can find in the programming guide.  At a high level, Spark streaming is simply a spark job run on very small increments of input data (i.e micro batch), every 't' seconds, where t can be as low as 1 second.

As with any stream processing system, there are three big aspects to the framework itself.

Ingesting the data streams : This is accomplished via DStreams, which you can think of effectively as a thin wrapper around an input source such as Kafka/HDFS which knows how to read the next N entries from the input.The receiver based approach is a little complicated IMHO, and …

Thoughts On Adding Spatial Indexing to Voldemort

This weekend, I set out to explore something that has always been a daemon running at the back of my head. What would it mean to add Spatial Indexing support to Voldemort, given that Voldemort supports a pluggable storage layer.. Would it fit well with the existing Voldemort server architecture? Or would it create a frankenstein freak show where two systems essentially exist side by side under one codebase... Let's explore..

Basic Idea The 50000 ft blueprint goes like this.

Implement a new Storage Engine on top Postgres sql (Sorry innoDB, you don't have true spatial indexes yet and Postgres is kick ass)Implement a new smart partitioning layer that maps a given geolocation to a subset of servers in the cluster (There are a few ways to do this. But this needs to be done to get an efficient solution. I don't believe in naive spraying of results to all servers)Support "geolocation" as a new standard key serializer type in Voldemort. The values will still be  opaque b…

Enabling SSL in MAMP

Open /Applications/MAMP/conf/apache/httpd.conf, Add the line LoadModule ssl_module modules/         or uncomment this out if already in the conf file
Also add lines to listen on 80, if not there alreadyListen 80ServerName localhost:80 Open /Applications/MAMP/conf/apache/ssl.conf. Remove all lines as well as . Find the line defining SSLCertificateFile and SSLCertificateKeyFile, set it to SSLCertificateFile /Applications/MAMP/conf/apache/ssl/server.crt SSLCertificateKeyFile /Applications/MAMP/conf/apache/ssl/server.keyCreate a new folder /Applications/MAMP/conf/apache/ssl. Drop into the terminal and navigate to the new foldercd /Applications/MAMP/conf/apache/sslCreate a private key, giving a password openssl genrsa -des3 -out server.key 1024Remove the password cp server.key server-pw.key openssl rsa -in server-pw.key -out server.keyCreate a certificate signing request, pressing return for default values openssl req -new -key server.key -o…