Database Backend Redesign - Phase 3


Date: 2020 Nov, 26th

Staus: Mostly Implemented



See backend redesign (initial phases)

High level summary

This document is the continuation of ludwig’s work about Berkeley database removal. backend redesign (initial phases)

It focus on:

Current state

Specific issues (solved in this phase)


dbimpl API is part of libback-ldbm and dbimpl API users needs to include dbimpl.h and link with libback-ldbm

When initializing the dblayer API (or when requesting a private access to a file), the value of nsslapd-backend-implement configuration parameter is used to call value_init function (within libback-ldbm) that fills a set of callbacks in li->priv.


Include file: dbimpl.h

struct typedef

Name Role Opaque Old bdb name
dbi_env_t The global environment PseudoOpaque(1) DB_ENV
dbi_db_t A database instance PseudoOpaque(1) DB
dbi_txn_t A transaction Yes(3) DB_TXN
dbi_cursor_t A cursor (i.e: iterator on DB data) PseudoOpaque(1) DBC
dbi_data_t A key or a value No DBT
dbi_cb_t Contains all DB implementation callbacks No N/A

(1) DB_ENV is used as opaque struct except dbenv->get_open_flags that is used in db_uses_feature that should be moved in bdb plugin anyway

(2) already used as an opaque struct

PseudoOpaque type are: Typedef struct { DBI_CB *cb;The callbacks void *<name>;The implementation opaque struct (name is env,db or cursor) void *plg_ctx;A context that implementation plugin is free to use. (may be not needed) } PseudoOpaque

They are used because the code sometime use function that only have access to underlying element And not the upper layer context (i.e cursor without backend or li_instance)

typedef struct {
    DBI\_CB *cb;
	DBI\_MEM\_OPTION flags;
	void *data;
	size\_t size;
	void *ctx;						/* Context handled by db implementation plugin */

typedef struct {
	struct DBI\_CB *cb;
	void *cursor;

Enum values

DBI_OP /* Represents a cursor operation */

‘Name’ ‘Role’ ‘Old bdb function’ ‘Old bdb value’
DBI_OP_MOVE_TO_KEY Move cursor to first record having the key and get its value c_get DB_SET
DBI_OP_MOVE_NEAR_KEY Move cursor to record having smallest key greater or equal than the specified one. Then it gets the record c_get DB_SET_RANGE
DBI_OP_MOVE_TO_DATA Move cursor to key+value record c_get DB_GET_BOTH
DBI_OP_MOVE_NEAR_DATA Move cursor to record having specified key and smallest data greater or equal than the specified data and get the value c_get DB_GET_BOTH_RANGE
DBI_OP_MOVE_TO_RECNO Move record to specified record number then get it. c_get DB_SET_RECNO
DBI_OP_MOVE_TO_FIRST Move cursor to first record then get it. c_get DB_FIRST
DBI_OP_MOVE_TO_LAST Move cursor to last record then get it. c_get DB_LAST
DBI_OP_GET Get record from key. get DB_GET
DBI_OP_GET_RECNO Get current record number. c_get DB_GET_RECNO
DBI_OP_NEXT Move cursor to next record then get it. c_get DB_NEXT
DBI_OP_NEXT_DATA Move cursor to next record having the same key then get the value. c_get DB_NEXT_DUP
DBI_OP_NEXT_KEY Move cursor to next record having different key then get the record. c_get DB_NEXT_NODUP
DBI_OP_PREV Move cursor to previous record then get it. c_get DB_PREV
DBI_OP_PUT Insert new key-data put DB_PUT
DBI_OP_REPLACE Overwrite current position value c_put DB_CURRENT
DBI_OP_ADD Insert new key-data if it does not already exists put DB_NODUPDATA
DBI_OP_ADD Insert new key-data if it does not already exists c_put DB_NODUPDATA
DBI_OP_DEL Delete key-data record del 0
DBI_OP_DEL Delete record at cursor position c_del 0
DBI_OP_CLOSE Close cursor c_close N/A

dbi_val_t flags

Name Role Berkeley db flags
0 data should be alloc or realloc DB_DBT_MALLOC (if data is NULL) or DB_DBT_REALLOC
DBI_VF_PROTECTED data should not be freed  
DBI_VF_DONTGROW data should not be realloced N/A
DBI_VF_READONLY data should not be modified DB_DBT_READONLY

dbi_val_t flags to DBT flags mapping

‘dbi_val_t’ DBT
DBI_VF_PROTECTED data should not be freed

dbi_bulk_t flags

Name Role
DBI_VF_BULK_DATA Bulk operation on data only
DBI_VF_BULK_RECORD Bulk operation on key+data

error codes

Name Role Old bdb value
DBI_RC_NOMEM Memory allocation error
(usually it does not happen because slapi_ch_malloc cannot returns NULL)
DBI_RC_KEYEXIST Key exists and duplicate keys are not allowed. DB_KEYEXIST
DBI_RC_RETRY Transient error: operation should be retried. DB_LOCK_DEADLOCK
DBI_RC_NOTFOUND Record not found: Key does not exists. DB_NOTFOUND
DBI_RC_RUNRECOVERY Recovery must be performed. DB_RUNRECOVERY
DBI_RC_OTHER Other database errors N/A

Note: the implementation plugin should log an error with error code and error text when getting an error that cannot be mapped ( To ease diagnostic in case of unexpected error )


(TODO: get the callback name and prototype from dblayer.h and put them in this document to have the full API

Name Role Old bdb value
dblayer_start_fn_t *dblayer_start_fn    
dblayer_close_fn_t *dblayer_close_fn    
dblayer_instance_start_fn_t *dblayer_instance_start_fn    
dblayer_backup_fn_t *dblayer_backup_fn    
dblayer_verify_fn_t *dblayer_verify_fn    
dblayer_db_size_fn_t *dblayer_db_size_fn    
dblayer_ldif2db_fn_t *dblayer_ldif2db_fn    
dblayer_db2ldif_fn_t *dblayer_db2ldif_fn    
dblayer_db2index_fn_t *dblayer_db2index_fn    
dblayer_cleanup_fn_t *dblayer_cleanup_fn    
dblayer_upgradedn_fn_t *dblayer_upgradedn_fn    
dblayer_upgradedb_fn_t *dblayer_upgradedb_fn    
dblayer_restore_fn_t *dblayer_restore_fn    
dblayer_txn_begin_fn_t *dblayer_txn_begin_fn    
dblayer_txn_commit_fn_t *dblayer_txn_commit_fn    
dblayer_txn_abort_fn_t *dblayer_txn_abort_fn    
dblayer_get_info_fn_t *dblayer_get_info_fn    
dblayer_set_info_fn_t *dblayer_set_info_fn    
dblayer_back_ctrl_fn_t *dblayer_back_ctrl_fn    
dblayer_get_db_fn_t *dblayer_get_db_fn    
dblayer_delete_db_fn_t *dblayer_delete_db_fn    
dblayer_rm_db_file_fn_t *dblayer_rm_db_file_fn    
dblayer_import_fn_t *dblayer_import_fn    
dblayer_load_dse_fn_t *dblayer_load_dse_fn    
dblayer_config_get_fn_t *dblayer_config_get_fn    
dblayer_config_set_fn_t *dblayer_config_set_fn    
instance_config_set_fn_t *instance_config_set_fn    
instance_config_entry_callback_fn_t *instance_add_config_fn    
instance_config_entry_callback_fn_t *instance_postadd_config_fn    
instance_config_entry_callback_fn_t *instance_del_config_fn    
instance_config_entry_callback_fn_t *instance_postdel_config_fn    
instance_cleanup_fn_t *instance_cleanup_fn    
instance_create_fn_t *instance_create_fn    
instance_create_fn_t *instance_register_monitor_fn    
instance_search_callback_fn_t *instance_search_callback_fn    
dblayer_auto_tune_fn_t *dblayer_auto_tune_fn    
dblayer_cursor_op(DBI_CUR *cur, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Move cursor and get record cursor->c_get
dblayer_cursor_op(DBI_CUR *cur, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Add/replace a record cursor->c_put
dblayer_cursor_op(DBI_CUR *cur, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Remove a record cursor->c_del
dblayer_cursor_op(DBI_CUR *cur, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Close a record cursor->c_close
dblayer_new_cursor(be,db,txn, cursor) Should store the backend in cldb_Handle to retrieve it. db->cursor(db, db_txn, &cursor, 0);
dblayer_db_op(DBI_DB *db, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Move cursor and get record db->get
dblayer_db_op(be, DBI_DB *db, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Add/replace a record db->put
dblayer_db_op(be, DBI_DB *db, DBI_OP op, DBI_DATA *key, DBI_DATA *data) Delete a record db->del
dblayer_get_db_id   db->fname
dblayer_init_bulk_op(DBI_DATA *bulk) Initialize iterator for bulk operation DB_MULTIPLE_INIT
dblayer_next_bulk_op(DBI_DATA *bulk, DBI_DATA *key, DBI_DATA *data) Get next operation from bulk operation DB_MULTIPLE_NEXT

db-bdb plugin

That is the plugin that implements the dbimpl API callbacks and calls libdb functions. The important points are:

Note: In both case isresponse is set to PR_FALSE before the operation and PR_TRUE after it. if a key or data get alloced/realloced, the original key/data get freed (if the value flags allows it)


Proposed solution

* Solution 1
    * Remap the errors to generic values
    * Add a function in bdb that remap the value (should be a simple switch) If the value cannot be mapped we could:
        * add a string in thread local storage and return DBI\_RC\_OTHER The string should contains the original return code and its associated message (i.e: bdb error code: %d : %s&quot;, native\_rc, db\_strerror(native\_rc))
        * Modify dblayer\_strerror to print a message for generic errors and if DBI\_RC\_OTHER to generate a message from the thread local data string.
* This solution has the advantage that:
        * it does not impact the back-ldm/changelog code (except for dblayer\_strerror)
        * It is quite efficient in the usual case as it handles a switch with few values
        * Keep the ability to diagnose errors in the unexpected case
    * The drawbacks:
        * Message can be wrong if creative error handling is performed (i.e
          rc1 = dblayer\_xxx(li, ...)
          rc2 = dblayer\_xxx(li, ...)
          log(dblayer\_strerror(rc1)) prints rc2 message if both values are are DBI\_RC\_OTHER)

Should double check that when hitting unexpected errors we just logs an error message and aborts the operation (as it is possible that we abort the txn before logging the errr)

        * Error handling should be done in the same thread than the operation (This is IMHO the case)

* Solution 2 I thought about keeping the db code as it, but then it implies a lot of changes as we need to access the db plugin to determine what action to do or to log the error. (but the dblayer instance context is not always easily available when the message is logged)
* Solution 3 Same as solution 1 but without storing data in thread local storage: problem is that we got clueless in case of unexpected database error. (unless an error message is logged by the plugin  (Note: that is finally the implemented solution))

Open Questions

These questions will need to be solved in phase 4.


*The phase 3 is about being able to remove the bdb dependencies (i.e being able to build ns-slapd libbck-ldbm and replication without the bdb include and lib) Due to the size of these changes (FYI: Phase 3a already impacts 53 files), it seems better to split the phase in sub phases:

Last modified on 7 April 2021