Memory Usage Research


Overview

Directory Server calls malloc and free in many places, which could cause the memory fragmentation, then it prevents the further memory allocation. This is a memo which explains the experiments I tried to investigate to reduce malloc and free.

Testcase

Test Environment: OS: Fedora 9 HW: dual core, 3.9GB memory, 64-bit Fedora Directory Server 1.1.1 nsslapd-cachememsize: 1GB nsslapd-dbcachesize: 100MB Test: ldclt with 3 "jpeg" files to randomize the size (1.5MB, 3.5KB, 28B). ldclt operations: {add , delete, modify, search on uid, search on seealso} x 2 threads search all (above nsslapd-idlistscanlimit) to evict entries in the entry cache

Experiments: modification made on the server

  1. A shared memory pool among threads; protected by mutex 1-1. Store system structures pblock, backentry, Slapi_Entry, internal Slapi_Operation in the memory pool and reuse them. Almost no positive effect. It shows the similar growth curve as the original server.

1-2. Store system structures pblock, backentry, Slapi_Entry, internal Slapi_Operation + attribute values in the range between 512KB and 64MB in the memory pool and reuse them. This implementation stores attribute values referred from the entry, which is retrieved from BDB, in the memory pool. The server size increased only 50% of the original server.

1-3. Store system structures pblock, backentry, Slapi_Entry, internal Slapi_Operation + attribute values in the range between 2KB and 64MB in the memory pool and reuse them; if the requested size is larger than 128KB, instead of calling malloc, call mmap directly. The result looks flat, but the performance went down 20% of the original server.

Note on M_TRIM_THRESHOLD (glibc): By default, M_TRIM_THRESHOLD is 128K. If the requested size is larger than M_TRIM_THRESHOLD, it tends to call mmap and when munmap is called, the memory is supposed to return to the system. But for the performance reason, even if the requested size is larger than 128KB, if the size is available in the heap, it’s allocated from there. And it contributes to the process size growth and the memory fragmentation. To see if it could affect the server size increase, I ran the test with M_TRIM_THRESHOLD = 32KB as well as M_TRIM_THRESHOLD = 2GB, but there were almost no difference between the 2 server size growth charts.

  1. Per thread memory pool combined with slapd malloc functions (the current code in ifdef MEMOPOOL_EXPERIMENTAL) Limited system structures and attribute values referred from the entry are just a small part of the entire malloc/free calls in the server. Especially, there are many occasions in BDB and in the back-end layer which calls BDB. To let mempool cover wider mallocs, I extended the internal malloc functions to use mempool and set the malloc functions to use in the linked libraries such as LDAP C SDK, SASL, and BDB. This approach shows the better/lower server size growth rate as well as the performance compared to the shared mempool handling just some system structures and attribute values in the entry.

Here’s the implementation memo.

Issues/Leftovers

Appendix: Non glibc approach

There is an OpenLDAP report that other memory allocator library shows better performance / less process size growth (http://highlandsun.com/hyc/malloc/). I also tested the DS linked with tcmalloc without mempool code (http://goog-perftools.sourceforge.net/doc/tcmalloc.html). The test configuration: single supplier - single replica; update 30MB attribute to (30MB + n * 4KB) every 15 minutes under the constant ldclt add and search stress. Both glibc and tcmalloc increases the process size, but the growth rate is 1/5 with tcmalloc.

Last modified on 1 March 2024