Skip to main content

Page cache perils

I am sharing my experience with benchmarking the Voldemort Server- effectively a Java key-value storage server application on Linux (referred as 'server' from here on). The application handles put(k,v) and get(k), like a standard HashMap, only difference being these are calls over the network, and the entries need to be persistent.What the post talks about is generic and could apply to most java server applications.

The goal here was to run a workload against the server, in a manner that it comes off disk, so we exercise the worst case path. As easy as it may sound, certain things come in our way

  • OS Page Cache - Linux caches files the server writes/reads (unless you do direct I/O and build your own cache on top of it)
  • JVM memory management - the JVM heap shares the same physical memory as the page cache and you don't have very fine control over how much memory it will take. 
  • This interplay is tricky as we will see below, in terms of actually controlling the page cache size (I know you are not supposed to), so our test parameters be met.
  • In my case, we have a BDB JE cache on top of the JVM, which makes the page cache pointless, since we are double caching, the same data. But, its another separate issue. Lets leave that out now.


Lets us some notations, for different knobs/parameters that play together here.

RAMSIZE - actual physical memory on the machine
JVMMAX   - Maximum size of jvm you configure using -Xmx
JVMSTART - Start size of the jvm, -Xms
JVMUSED - actual  memory that the JVM uses.

Lets explore our options.



Generate lots of data



This is a practical option since if you generate data several times higher than the RAMSIZE, a large portion of your workload will come off disk. Problem is that with modern servers (+ slow SAS disks) with close to 100GB of ram, this process takes a long time. i.e You need to generate 1TB of data to make sure only 10% of the workload will come off cache on an average.


Use a memory hogger (see mlock()) to clamp down RAMSIZE - JVMMAX
The idea is to clamp down a portion of the ram, virtually shrinking it, so that Linux feels the shortage of memory and does not populate the page cache as much.
But, the jvm does not consume JVMMAX bytes right away, hence JVMMAX - JVMUSED is available for Linux, to use as it pleases.

Set JVMSTART=RAMSIZE 
The idea here is to force the jvm heap to be as large as the RAM, leaving little space for the page cache.
But, as we saw before, jvm does not allocate this many, even if you explicitly set the start size.

Hog up RAMSIZE - JVMUSED
The idea is to examine the actual amount of memory the server uses for the workload, with a dry run, and then clamp down the rest using a hogger.
But, the jvm thinks that its crunched for memory (lets not go deep into GC tuning here), and starts GCing a lot, potentially affecting the experiment.

Tune GC so you won't GC
The idea (a bit far fetched) is to tune the server gc settings so it won't gc in the case above.
But, its quite hard to do in practice and what if you want to run different workloads and your gc settings are workload specific.

Disable page cache altogether
The idea is to periodically do a  sync; echo 0 > /proc/sys/vm/drop_caches which should flush all of the page cache.
But, this takes up some CPU and if the server is cpu bound, you might be potentially altering the experiment.



So, whats the verdict you ask ? :)  Attempt one of the above if in your particular scenario, the "buts" are non existent somehow or simply generate lots of data.










Comments

Popular posts from this blog

Learning Spark Streaming #1

I have been doing a lot of Spark in the past few months, and of late, have taken a keen interest in Spark Streaming. In a series of posts, I intend to cover a lot of details about Spark streaming and even other stream processing systems in general, either presenting technical arguments/critiques, with any micro benchmarks as needed.

Some high level description of Spark Streaming (as of 1.4),  most of which you can find in the programming guide.  At a high level, Spark streaming is simply a spark job run on very small increments of input data (i.e micro batch), every 't' seconds, where t can be as low as 1 second.

As with any stream processing system, there are three big aspects to the framework itself.


Ingesting the data streams : This is accomplished via DStreams, which you can think of effectively as a thin wrapper around an input source such as Kafka/HDFS which knows how to read the next N entries from the input.The receiver based approach is a little complicated IMHO, and …

Thoughts On Adding Spatial Indexing to Voldemort

This weekend, I set out to explore something that has always been a daemon running at the back of my head. What would it mean to add Spatial Indexing support to Voldemort, given that Voldemort supports a pluggable storage layer.. Would it fit well with the existing Voldemort server architecture? Or would it create a frankenstein freak show where two systems essentially exist side by side under one codebase... Let's explore..

Basic Idea The 50000 ft blueprint goes like this.

Implement a new Storage Engine on top Postgres sql (Sorry innoDB, you don't have true spatial indexes yet and Postgres is kick ass)Implement a new smart partitioning layer that maps a given geolocation to a subset of servers in the cluster (There are a few ways to do this. But this needs to be done to get an efficient solution. I don't believe in naive spraying of results to all servers)Support "geolocation" as a new standard key serializer type in Voldemort. The values will still be  opaque b…

Enabling SSL in MAMP

Open /Applications/MAMP/conf/apache/httpd.conf, Add the line LoadModule ssl_module modules/mod_ssl.so         or uncomment this out if already in the conf file
Also add lines to listen on 80, if not there alreadyListen 80ServerName localhost:80 Open /Applications/MAMP/conf/apache/ssl.conf. Remove all lines as well as . Find the line defining SSLCertificateFile and SSLCertificateKeyFile, set it to SSLCertificateFile /Applications/MAMP/conf/apache/ssl/server.crt SSLCertificateKeyFile /Applications/MAMP/conf/apache/ssl/server.keyCreate a new folder /Applications/MAMP/conf/apache/ssl. Drop into the terminal and navigate to the new foldercd /Applications/MAMP/conf/apache/sslCreate a private key, giving a password openssl genrsa -des3 -out server.key 1024Remove the password cp server.key server-pw.key openssl rsa -in server-pw.key -out server.keyCreate a certificate signing request, pressing return for default values openssl req -new -key server.key -o…