Benchmark of Memory Allocators on a Multi-Threaded Ruby Program

2 min readJan 27, 2019

The idea of this benchmark came from the Nate Berkopec’s blog post published on December 4, 2017, which demonstrates how multi-threaded Ruby programs may be consuming much more memory that they need, due to memory fragmentation caused by malloc.

I’ve done the tests using concurrent-ruby in a simple 34 lines script that can be found in this repository — it is very easy to run in a working Ruby environment. All the results are committed in the logs folder as well.

Memory usage and performance were measured for each of these factors:

glibc 2.23
glibc 2.23 with MALLOC_ARENA_MAX=2
glibc 2.23 with MALLOC_ARENA_MAX=4
jemalloc 3.6
jemalloc 5.1
jemalloc 5.1 with MALLOC_CONF=narenas:2
jemalloc 5.1 with MALLOC_CONF=narenas:4
tcmalloc 4.2.6

I’ve done five repetitions and you can see the results below.

Memory Usage

Amount of memory being used (in megabytes) during the tests for each factor.

This is a plot of a particular iteration, but all of them followed the same pattern.

It is interesting to notice that the mean memory usage ranged from 394.79MB to 4425.28MB only by changing these parameters. The best four factors in terms of memory consumption were:

tcmalloc 4.2.6: ~402.8MB average
jemalloc 3.6: ~435.4MB average
glibc 2.23 with MALLOC_ARENA_MAX=2: ~976MB average
glibc 2.23 with MALLOC_ARENA_MAX=4: ~1357.3MB average

Performance

Regarding performance, I’ve got the following numbers:

glibc 2.23 with MALLOC_ARENA_MAX=2: ~64 seconds average
glibc 2.23 with MALLOC_ARENA_MAX=4: ~65.2 seconds average
jemalloc 5.1 with MALLOC_CONF=narenas:2: ~68.5 seconds average
jemalloc 5.1 with MALLOC_CONF=narenas:4: ~70.4 seconds average
glibc 2.23: ~70.5 seconds average
jemalloc 5.1: ~78.2 seconds average
jemalloc 3.6: ~87.7 seconds average
tcmalloc 4.2.6: ~96.8 seconds average

We would need to make more repetitions to understand if MALLOC_ARENA_MAX=2 and MALLOC_ARENA_MAX=4 have a significant difference over the performance here. The standard deviation overall was 1–4 seconds.

Conclusion

In this experiment, the better cost-benefit is to use glibc with two arenas — we’ve got the faster performance with reasonable memory usage.

As it is mention in this Heroku article, setting the number of arenas is usually a trade-off between memory consumption and performance, however, it was not what we observed in this particular benchmark.

Benchmark of Memory Allocators on a Multi-Threaded Ruby Program

Memory Usage

Performance

Conclusion

Written by André Guimarães Sakata

Responses (2)