Traditionally, in high performance systems, repeatedly allocating and deallocating memory has been found to be costly. (i.e a malloc vs free cycle). Hence, people resorted to building their own memory pool on top of the OS, dealing with fragmentation/free list maintenance etc. One of the popular techniques to doing this being Slab allocators .
This post is about doing a reality check about the cost of explicitly doing an alloc() and free() cycle, given that most popular OS es, specifically Linux gotten better at memory allocation recently. Along the way, I will also compare the JVM memory allocations (which should be faster since we pay a premium for the freeing of memory via garbage collection).
So, all set here we go. The following is a comparison of native c allocations, java jvm based allocation, java direct buffer allocation. For each of them we measure the following.
- Allocation/free rate (rate/s): This gives you an upper bound on the single threaded throughput of your application. You cannot possibly do more operations than this, without building your own memory pools.
- Cost of allocation in mircoseconds (cost-μs): This gives you a lower bound on the latency. You absolutely have to pay atleast this cost per operation without building your own memory pool.
I make a million allocations (except for 100kb case,where I do 100K allocations) varying the object size. The machine is a beefy linux server running Linux 2.6 kernel. For java, I use the following jvm options.
-server -Xms22g -Xmx22g -XX:+AlwaysPreTouch -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2
Swap is turned off and pretouch flag ensures jvm heap is warmed up.
|Object size||malloc()-rate/s||malloc()-cost-μs||free()-rate/s||free()-cost-μs||Java Heap alloc-rate/s||Java Heap alloc-cost-μs||Java Direct alloc-rate/s||Java Direct alloc-cost-μs|
The 10kb and 100kb runs for Java heap allocation had garbage collections, while direct allocation does not have any collections.
10kb run : 5 ParNews, ~2msHere are some observations.
100kb run : 95 ParNews, 3ms avg, 30ms max
- In general, all allocation costs increase with object size.
- Small object allocations are faster in java as expected than malloc()
- For some reason free() seems to be cheaper/faster than malloc(). Fragmentation tests will tell the real story.
- Java heap allocations considerably slow down with larger objects, when GC kicks in. i.e under memory pressure. native c based allocation don't seem to degrade that drastically from 10kb to 100kb
- Java heap allocations are always faster than direct allocations, which is expected.
- Java direct allocations are supposed to be just a JNI malloc() call. But with 100kb, it seriously slows down and might become a real bottleneck. If you have smal objects mostly, you may be able to get away.
You can get the C program and java program used for this here : malloctest.c AllocTest.java
Next, we will test access speed by basically copying memory (copy rate/s & copy cost-μs) and writing a specific index in the byte array (write rate/s & write cost-μs).
We will be using 100kb byte buffers and average across 1000 operations.
|Java Heap-write rate/s||5932|
|Java Heap-write cost-μs||168.56|
|Java Heap-copy rate/s||30175|
|Java Heap-copy cost-μs||33.136|
|Java Direct-write rate/s||4513|
|Java Direct-write cost-μs||221.5|
|Java Direct-copy rate/s||30096|
|Java Direct-copy cost-μs||33.22|
- index updates are fastest with c and slowest in java direct buffer.
- No huge difference between accessing/copying direct buffers vs jvm heap byte
Next, we will test how the memory management schemes hold up against heavily fragmented workload. i.e rapid alloc/frees of randomly sized elements