Memory allocation speed check

Traditionally, in high performance systems, repeatedly allocating and deallocating memory has been found to be costly. (i.e a malloc vs free cycle). Hence, people resorted to building their own memory pool on top of the OS, dealing with fragmentation/free list maintenance etc. One of the popular techniques to doing this being Slab allocators .

This post is about doing a reality check about the cost of explicitly doing an alloc() and free() cycle, given that most popular OS es, specifically Linux gotten better at memory allocation recently. Along the way, I will also compare the JVM memory allocations (which should be faster since we pay a premium for the freeing of memory via garbage collection).

So, all set here we go. The following is a comparison of native c allocations, java jvm based allocation, java direct buffer allocation. For each of them we measure the following.

Allocation/free rate (rate/s): This gives you an upper bound on the single threaded throughput of your application. You cannot possibly do more operations than this, without building your own memory pools.
Cost of allocation in mircoseconds (cost-μs): This gives you a lower bound on the latency. You absolutely have to pay atleast this cost per operation without building your own memory pool.

I make a million allocations (except for 100kb case,where I do 100K allocations) varying the object size. The machine is a beefy linux server running Linux 2.6 kernel. For java, I use the following jvm options.

-server -Xms22g -Xmx22g -XX:+AlwaysPreTouch -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2

Swap is turned off and pretouch flag ensures jvm heap is warmed up.

Object size	malloc()-rate/s	malloc()-cost-μs	free()-rate/s	free()-cost-μs	Java Heap alloc-rate/s	Java Heap alloc-cost-μs	Java Direct alloc-rate/s	Java Direct alloc-cost-μs
1kb	2,451,064	0.40	8,685,833	0.11	3,992,191	0.25	268,517	3.72
10kb	616,889	1.62	2,928,069	0.34	743,676*	1.34	116,713	8.56
100kb	566,328	1.76	1,890,573	0.52	81,370*	12.28	20,201	49.5

The 10kb and 100kb runs for Java heap allocation had garbage collections, while direct allocation does not have any collections.

10kb run : 5 ParNews, ~2ms
100kb run : 95 ParNews, 3ms avg, 30ms max

Here are some observations.

In general, all allocation costs increase with object size.
Small object allocations are faster in java as expected than malloc()
For some reason free() seems to be cheaper/faster than malloc(). Fragmentation tests will tell the real story.
Java heap allocations considerably slow down with larger objects, when GC kicks in. i.e under memory pressure. native c based allocation don't seem to degrade that drastically from 10kb to 100kb
Java heap allocations are always faster than direct allocations, which is expected.
Java direct allocations are supposed to be just a JNI malloc() call. But with 100kb, it seriously slows down and might become a real bottleneck. If you have smal objects mostly, you may be able to get away.

You can get the C program and java program used for this here : malloctest.c AllocTest.java

Next, we will test access speed by basically copying memory (copy rate/s & copy cost-μs) and writing a specific index in the byte array (write rate/s & write cost-μs).
We will be using 100kb byte buffers and average across 1000 operations.

c-write rate/s	6590
c-write cost-μs	151.73
c-copy rate/s	19361
c-copy cost-μs	51.64
Java Heap-write rate/s	5932
Java Heap-write cost-μs	168.56
Java Heap-copy rate/s	30175
Java Heap-copy cost-μs	33.136
Java Direct-write rate/s	4513
Java Direct-write cost-μs	221.5
Java Direct-copy rate/s	30096
Java Direct-copy cost-μs	33.22

index updates are fastest with c and slowest in java direct buffer.
No huge difference between accessing/copying direct buffers vs jvm heap byte[]

Code used for these. accesstest.c AccessHeapTest.java AccessDirectTest.java

Next, we will test how the memory management schemes hold up against heavily fragmented workload. i.e rapid alloc/frees of randomly sized elements
<Coming Soon>

throw new Exception()

Search This Blog

Memory allocation speed check

Comments

Post a Comment

Popular posts from this blog

Learning Spark Streaming #1

Setting up Hadoop/YARN/Spark/Hive on Mac OSX El Capitan

HDFS Client Configs for talking to HA Hadoop NameNodes