Skip to main content

Memory allocation speed check


Traditionally, in high performance systems, repeatedly allocating and deallocating memory has been found to be costly. (i.e a malloc vs free cycle). Hence, people resorted to building their own memory pool on top of the OS, dealing with fragmentation/free list maintenance etc. One of the popular techniques to doing this being Slab allocators .

This post is about doing a reality check about the cost of explicitly doing an alloc() and free() cycle, given that most popular OS es, specifically Linux gotten better at memory allocation recently. Along the way, I will also compare the JVM memory allocations (which should be faster since we pay a premium for the freeing of memory via garbage collection).

So, all set here we go.  The following is a comparison of native c allocations, java jvm based allocation, java direct buffer allocation. For each of them we measure the following.

  • Allocation/free rate (rate/s): This gives you an upper bound on the single threaded throughput of your application. You cannot possibly do more operations than this, without building your own memory pools.
  • Cost of allocation in mircoseconds (cost-μs): This gives you a lower bound on the latency. You absolutely have to pay atleast this cost per operation without building your own memory pool.

I make a million allocations (except for 100kb case,where I do 100K allocations) varying the object size. The machine is a beefy linux server running Linux 2.6 kernel. For java, I use the following jvm options.

-server -Xms22g -Xmx22g -XX:+AlwaysPreTouch -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2

Swap is turned off and pretouch flag ensures jvm heap is warmed up.

Object size malloc()-rate/s malloc()-cost-μs free()-rate/s free()-cost-μs Java Heap alloc-rate/s Java Heap alloc-cost-μs Java Direct alloc-rate/s Java Direct alloc-cost-μs
1kb 2,451,064 0.40 8,685,833 0.11 3,992,191 0.25 268,517 3.72
10kb 616,889 1.62 2,928,069 0.34 743,676* 1.34 116,713 8.56
100kb 566,328 1.76 1,890,573 0.52 81,370* 12.28 20,201 49.5

The 10kb and 100kb runs for Java heap allocation had garbage collections, while direct allocation does not have any collections.
10kb run   : 5 ParNews, ~2ms
100kb run :  95 ParNews, 3ms avg, 30ms max
 Here are some observations.
  • In general, all allocation costs increase with object size. 
  • Small object allocations are faster in java as expected than malloc()
  • For some reason free() seems to be cheaper/faster than malloc(). Fragmentation tests will tell the real story.
  • Java heap allocations considerably slow down with larger objects, when GC kicks in. i.e under memory pressure. native c based allocation don't seem to degrade that drastically from 10kb to 100kb
  • Java heap allocations are always faster than direct allocations, which is expected.
  • Java direct allocations are supposed to be just a JNI malloc() call. But with 100kb, it seriously slows down and might become a real bottleneck. If you have smal objects mostly, you may be able to get away.

You can get the C program and java program used for this here : malloctest.c  AllocTest.java


Next, we will test access speed by basically copying memory (copy rate/s & copy cost-μs) and writing a specific index in the byte array (write rate/s & write cost-μs).
We will be using 100kb byte buffers and average across 1000 operations.

c-write rate/s 6590
c-write cost-μs 151.73
c-copy rate/s 19361
c-copy cost-μs 51.64
Java Heap-write rate/s 5932
Java Heap-write cost-μs 168.56
Java Heap-copy rate/s 30175
Java Heap-copy cost-μs 33.136
Java Direct-write rate/s 4513
Java Direct-write cost-μs 221.5
Java Direct-copy rate/s 30096
Java Direct-copy cost-μs 33.22

  • index updates are fastest with c and slowest in java direct buffer. 
  • No huge difference between accessing/copying direct buffers vs jvm heap byte[]
 Code used for these. accesstest.c AccessHeapTest.java AccessDirectTest.java

Next, we will test how the memory management schemes hold up against heavily fragmented workload. i.e rapid alloc/frees of randomly sized elements
<Coming Soon>




Comments

Popular posts from this blog

Learning Spark Streaming #1

I have been doing a lot of Spark in the past few months, and of late, have taken a keen interest in Spark Streaming. In a series of posts, I intend to cover a lot of details about Spark streaming and even other stream processing systems in general, either presenting technical arguments/critiques, with any micro benchmarks as needed.

Some high level description of Spark Streaming (as of 1.4),  most of which you can find in the programming guide.  At a high level, Spark streaming is simply a spark job run on very small increments of input data (i.e micro batch), every 't' seconds, where t can be as low as 1 second.

As with any stream processing system, there are three big aspects to the framework itself.


Ingesting the data streams : This is accomplished via DStreams, which you can think of effectively as a thin wrapper around an input source such as Kafka/HDFS which knows how to read the next N entries from the input.The receiver based approach is a little complicated IMHO, and …

Thoughts On Adding Spatial Indexing to Voldemort

This weekend, I set out to explore something that has always been a daemon running at the back of my head. What would it mean to add Spatial Indexing support to Voldemort, given that Voldemort supports a pluggable storage layer.. Would it fit well with the existing Voldemort server architecture? Or would it create a frankenstein freak show where two systems essentially exist side by side under one codebase... Let's explore..

Basic Idea The 50000 ft blueprint goes like this.

Implement a new Storage Engine on top Postgres sql (Sorry innoDB, you don't have true spatial indexes yet and Postgres is kick ass)Implement a new smart partitioning layer that maps a given geolocation to a subset of servers in the cluster (There are a few ways to do this. But this needs to be done to get an efficient solution. I don't believe in naive spraying of results to all servers)Support "geolocation" as a new standard key serializer type in Voldemort. The values will still be  opaque b…

Enabling SSL in MAMP

Open /Applications/MAMP/conf/apache/httpd.conf, Add the line LoadModule ssl_module modules/mod_ssl.so         or uncomment this out if already in the conf file
Also add lines to listen on 80, if not there alreadyListen 80ServerName localhost:80 Open /Applications/MAMP/conf/apache/ssl.conf. Remove all lines as well as . Find the line defining SSLCertificateFile and SSLCertificateKeyFile, set it to SSLCertificateFile /Applications/MAMP/conf/apache/ssl/server.crt SSLCertificateKeyFile /Applications/MAMP/conf/apache/ssl/server.keyCreate a new folder /Applications/MAMP/conf/apache/ssl. Drop into the terminal and navigate to the new foldercd /Applications/MAMP/conf/apache/sslCreate a private key, giving a password openssl genrsa -des3 -out server.key 1024Remove the password cp server.key server-pw.key openssl rsa -in server-pw.key -out server.keyCreate a certificate signing request, pressing return for default values openssl req -new -key server.key -o…