Benchmark of Memory Allocators on a Multi-Threaded Ruby Program
The idea of this benchmark came from the Nate Berkopec’s blog post published on December 4, 2017, which demonstrates how multi-threaded Ruby programs may be consuming much more memory that they need, due to memory fragmentation caused by malloc
.
I’ve done the tests using concurrent-ruby
in a simple 34 lines script that can be found in this repository — it is very easy to run in a working Ruby environment. All the results are committed in the logs
folder as well.
Memory usage and performance were measured for each of these factors:
- glibc 2.23
- glibc 2.23 with
MALLOC_ARENA_MAX=2
- glibc 2.23 with
MALLOC_ARENA_MAX=4
- jemalloc 3.6
- jemalloc 5.1
- jemalloc 5.1 with
MALLOC_CONF=narenas:2
- jemalloc 5.1 with
MALLOC_CONF=narenas:4
- tcmalloc 4.2.6
I’ve done five repetitions and you can see the results below.
Memory Usage
This is a plot of a particular iteration, but all of them followed the same pattern.
It is interesting to notice that the mean memory usage ranged from 394.79MB to 4425.28MB only by changing these parameters. The best four factors in terms of memory consumption were:
- tcmalloc 4.2.6: ~402.8MB average
- jemalloc 3.6: ~435.4MB average
- glibc 2.23 with
MALLOC_ARENA_MAX=2
: ~976MB average - glibc 2.23 with
MALLOC_ARENA_MAX=4
: ~1357.3MB average
Performance
Regarding performance, I’ve got the following numbers:
- glibc 2.23 with
MALLOC_ARENA_MAX=2
: ~64 seconds average - glibc 2.23 with
MALLOC_ARENA_MAX=4
: ~65.2 seconds average - jemalloc 5.1 with
MALLOC_CONF=narenas:2
: ~68.5 seconds average - jemalloc 5.1 with
MALLOC_CONF=narenas:4
: ~70.4 seconds average - glibc 2.23: ~70.5 seconds average
- jemalloc 5.1: ~78.2 seconds average
- jemalloc 3.6: ~87.7 seconds average
- tcmalloc 4.2.6: ~96.8 seconds average
We would need to make more repetitions to understand if MALLOC_ARENA_MAX=2
and MALLOC_ARENA_MAX=4
have a significant difference over the performance here. The standard deviation overall was 1–4 seconds.
Conclusion
In this experiment, the better cost-benefit is to use glibc with two arenas — we’ve got the faster performance with reasonable memory usage.
As it is mention in this Heroku article, setting the number of arenas is usually a trade-off between memory consumption and performance, however, it was not what we observed in this particular benchmark.