Benchmark of Memory Allocators on a Multi-Threaded Ruby Program

André Guimarães Sakata
2 min readJan 27, 2019
Image result for ruby lang

The idea of this benchmark came from the Nate Berkopec’s blog post published on December 4, 2017, which demonstrates how multi-threaded Ruby programs may be consuming much more memory that they need, due to memory fragmentation caused by malloc.

I’ve done the tests using concurrent-ruby in a simple 34 lines script that can be found in this repository — it is very easy to run in a working Ruby environment. All the results are committed in the logs folder as well.

Memory usage and performance were measured for each of these factors:

  • glibc 2.23
  • glibc 2.23 with MALLOC_ARENA_MAX=2
  • glibc 2.23 with MALLOC_ARENA_MAX=4
  • jemalloc 3.6
  • jemalloc 5.1
  • jemalloc 5.1 with MALLOC_CONF=narenas:2
  • jemalloc 5.1 with MALLOC_CONF=narenas:4
  • tcmalloc 4.2.6

I’ve done five repetitions and you can see the results below.

Memory Usage

Amount of memory being used (in megabytes) during the tests for each factor.

This is a plot of a particular iteration, but all of them followed the same pattern.

It is interesting to notice that the mean memory usage ranged from 394.79MB to 4425.28MB only by changing these parameters. The best four factors in terms of memory consumption were:

  1. tcmalloc 4.2.6: ~402.8MB average
  2. jemalloc 3.6: ~435.4MB average
  3. glibc 2.23 with MALLOC_ARENA_MAX=2: ~976MB average
  4. glibc 2.23 with MALLOC_ARENA_MAX=4: ~1357.3MB average

Performance

Regarding performance, I’ve got the following numbers:

  1. glibc 2.23 with MALLOC_ARENA_MAX=2: ~64 seconds average
  2. glibc 2.23 with MALLOC_ARENA_MAX=4: ~65.2 seconds average
  3. jemalloc 5.1 with MALLOC_CONF=narenas:2: ~68.5 seconds average
  4. jemalloc 5.1 with MALLOC_CONF=narenas:4: ~70.4 seconds average
  5. glibc 2.23: ~70.5 seconds average
  6. jemalloc 5.1: ~78.2 seconds average
  7. jemalloc 3.6: ~87.7 seconds average
  8. tcmalloc 4.2.6: ~96.8 seconds average

We would need to make more repetitions to understand if MALLOC_ARENA_MAX=2 and MALLOC_ARENA_MAX=4 have a significant difference over the performance here. The standard deviation overall was 1–4 seconds.

Conclusion

In this experiment, the better cost-benefit is to use glibc with two arenas — we’ve got the faster performance with reasonable memory usage.

As it is mention in this Heroku article, setting the number of arenas is usually a trade-off between memory consumption and performance, however, it was not what we observed in this particular benchmark.

--

--

André Guimarães Sakata

I write about software development, project management, and other stuff.