Skip to main content

Memory allocation speed check


Traditionally, in high performance systems, repeatedly allocating and deallocating memory has been found to be costly. (i.e a malloc vs free cycle). Hence, people resorted to building their own memory pool on top of the OS, dealing with fragmentation/free list maintenance etc. One of the popular techniques to doing this being Slab allocators .

This post is about doing a reality check about the cost of explicitly doing an alloc() and free() cycle, given that most popular OS es, specifically Linux gotten better at memory allocation recently. Along the way, I will also compare the JVM memory allocations (which should be faster since we pay a premium for the freeing of memory via garbage collection).

So, all set here we go.  The following is a comparison of native c allocations, java jvm based allocation, java direct buffer allocation. For each of them we measure the following.

  • Allocation/free rate (rate/s): This gives you an upper bound on the single threaded throughput of your application. You cannot possibly do more operations than this, without building your own memory pools.
  • Cost of allocation in mircoseconds (cost-μs): This gives you a lower bound on the latency. You absolutely have to pay atleast this cost per operation without building your own memory pool.

I make a million allocations (except for 100kb case,where I do 100K allocations) varying the object size. The machine is a beefy linux server running Linux 2.6 kernel. For java, I use the following jvm options.

-server -Xms22g -Xmx22g -XX:+AlwaysPreTouch -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2

Swap is turned off and pretouch flag ensures jvm heap is warmed up.

Object size malloc()-rate/s malloc()-cost-μs free()-rate/s free()-cost-μs Java Heap alloc-rate/s Java Heap alloc-cost-μs Java Direct alloc-rate/s Java Direct alloc-cost-μs
1kb 2,451,064 0.40 8,685,833 0.11 3,992,191 0.25 268,517 3.72
10kb 616,889 1.62 2,928,069 0.34 743,676* 1.34 116,713 8.56
100kb 566,328 1.76 1,890,573 0.52 81,370* 12.28 20,201 49.5

The 10kb and 100kb runs for Java heap allocation had garbage collections, while direct allocation does not have any collections.
10kb run   : 5 ParNews, ~2ms
100kb run :  95 ParNews, 3ms avg, 30ms max
 Here are some observations.
  • In general, all allocation costs increase with object size. 
  • Small object allocations are faster in java as expected than malloc()
  • For some reason free() seems to be cheaper/faster than malloc(). Fragmentation tests will tell the real story.
  • Java heap allocations considerably slow down with larger objects, when GC kicks in. i.e under memory pressure. native c based allocation don't seem to degrade that drastically from 10kb to 100kb
  • Java heap allocations are always faster than direct allocations, which is expected.
  • Java direct allocations are supposed to be just a JNI malloc() call. But with 100kb, it seriously slows down and might become a real bottleneck. If you have smal objects mostly, you may be able to get away.

You can get the C program and java program used for this here : malloctest.c  AllocTest.java


Next, we will test access speed by basically copying memory (copy rate/s & copy cost-μs) and writing a specific index in the byte array (write rate/s & write cost-μs).
We will be using 100kb byte buffers and average across 1000 operations.

c-write rate/s 6590
c-write cost-μs 151.73
c-copy rate/s 19361
c-copy cost-μs 51.64
Java Heap-write rate/s 5932
Java Heap-write cost-μs 168.56
Java Heap-copy rate/s 30175
Java Heap-copy cost-μs 33.136
Java Direct-write rate/s 4513
Java Direct-write cost-μs 221.5
Java Direct-copy rate/s 30096
Java Direct-copy cost-μs 33.22

  • index updates are fastest with c and slowest in java direct buffer. 
  • No huge difference between accessing/copying direct buffers vs jvm heap byte[]
 Code used for these. accesstest.c AccessHeapTest.java AccessDirectTest.java

Next, we will test how the memory management schemes hold up against heavily fragmented workload. i.e rapid alloc/frees of randomly sized elements
<Coming Soon>




Comments

Popular posts from this blog

Learning Spark Streaming #1

I have been doing a lot of Spark in the past few months, and of late, have taken a keen interest in Spark Streaming . In a series of posts, I intend to cover a lot of details about Spark streaming and even other stream processing systems in general, either presenting technical arguments/critiques, with any micro benchmarks as needed. Some high level description of Spark Streaming (as of 1.4),  most of which you can find in the programming guide .  At a high level, Spark streaming is simply a spark job run on very small increments of input data (i.e micro batch), every 't' seconds, where t can be as low as 1 second. As with any stream processing system, there are three big aspects to the framework itself. Ingesting the data streams : This is accomplished via DStreams, which you can think of effectively as a thin wrapper around an input source such as Kafka/HDFS which knows how to read the next N entries from the input. The receiver based approach is a little comp...

Setting up Hadoop/YARN/Spark/Hive on Mac OSX El Capitan

If you are like me, who loves to have everything you are developing against working locally in a mini-integration environment, read on Here, we attempt to get some pretty heavy-weight stuff working locally on your mac, namely Hadoop (Hadoop2/HDFS) YARN (So you can submit MR jobs) Spark (We will illustrate with Spark Shell, but should work on YARN mode as well) Hive (So we can create some tables and play with it)  We will use the latest stable Cloudera distribution, and work off the jars. Most of the methodology is borrowed from here , we just link the four pieces together nicely in this blog.  Download Stuff First off all, make sure you have Java 7/8 installed, with JAVA_HOME variable setup to point to the correct location. You have to download the CDH tarballs for Hadoop, Zookeeper, Hive from the tarball page (CDH 5.4.x page ) and untar them under a folder (refered to as CDH_HOME going forward) as hadoop, zookeeper $ ls $HOME /bin/cdh/5.4.7 hadoop ...

Creating arbitrary grid layouts in HTML/CSS

I love building websites. But, the long arduous efforts to get page layouts done, scare me. So started experimenting with ways to create arbitrary grid layouts, so I can write a layout manager someday , to take care of this business for me. I refer the reader to this excellent post on "What does width:100% do in CSS"  to get a feel for block elements vs inline elements. To summarize, block elements generate a newline before and after them, but setting their width to something takes effect absolutely, i.e whether or not there is enough content inside the element to fill up the space. This is much like "fill_parent" in the android world. Inline elements are the opposite. They don't have the newline , but by default they expand only as much as their content, like "wrap_content" in android world. However, what we need is a container that would behave as a block element, housing child elements that will be laid out horizontally, as well as vertica...