Replication Agreement Status


Overview

For each replication agreement a thread is created which contacts the consumer, detects if updtes have to be sent, locates these updates in its own changelog and sends the updates to the consumer. The status of the replication agreement is maintained in a attrinbute of the agreement: *nsds5ReplicaLastUpdateStatus* and can be queried by client searches.

This document will list and explain the potential update states that can be see by a client

Disabled agreements

If an replication agreement is disabled the update status no longer is updated. The message that is seen has two variants:

If the replication agreement was disabled when the server started

"Error (0) No replication sessions started since server startup"

If the agreement was disabled while the server was running

"Error (0) Replica acquired successfully: agreement disabled"

General agreement status

The status of the agreemnet is updated if it is stopped, if an incremental update is started or completed.

"Error (0) Replica acquired successfully: Protocol stopped"


"Error (0) Replica acquired successfully: Incremental update started"


"Error (0) Replica acquired successfully: Incremental update succeeded"


"Error (0) Replica acquired successfully: Incremental update succeeded and yielded"

The last status indicates that replication was processing successfully, but the consumer ended the session to be able to get acquired by another supplier

Error messages

State: ACQUIRING_REPLICA.

In the first step of a replication session the supplier wants to acquire the comsumer, it has to establish the connection, to bind to the consumer, to verify that the consumer is not already updated by another supplier and some more checks.

"Error (<ldap_rc>) Problem connecting to replica - LDAP error: <ldap error message> "

"Error (<ldap_rc>) Problem connecting to replica (SSL not enabled) - LDAP error: <ldap error message> "

these are failures in establishing a connection to the consumer, the ldap return code and related message provide more information

"Error (8) :Failed to acquire replica: Internal error occurred on the remote replica"

an internal error occured on the consumer side, this is generated by failure related to the csn generator on the consumer, check consumer logs

"Error (3) :Unable to acquire replica: permission denied. The bind dn does not have permission to supply replication updates to the replica. Will retry later."

The identity used to authenticate to the consumer is not recognized as a valid replication bind dn or member in a binddn group

"Error (6) :Unable to acquire replica: there is no replicated area on the consumer server. Replication is aborting."

misconfiguration on the consumer side, no valid replica defined for the suffix

"Error (4) :Unable to acquire replica: the consumer was unable to decode the startReplicationRequest extended operation sent by the supplier. Replication is aborting."

decoding error of the replication control sent to the consumer

"Error (1) :Unable to acquire replica: the replica is currently being updated by another supplier."

The replica is busy. There is already an active replication session from another supplier

"Error (10) :Unable to acquire replica: the replica is supplied by a legacy supplier.  Replication is aborting."

No longer in use.

"Error (11) :Unable to aquire replica: the replica has the same Replica ID as this one. Replication is aborting."

Misconfiguration. Supplier and consumer replica have the same replication ID

"Error (14) :Unable to acquire replica: the replica instructed us to go into backoff mode. Will retry later."

Only possible in cases where a custom replication hook is implemented

"Error (extop_result) :Unable to acquire replica"

"Error (4) Unable to parse the response to the startReplication extended operation. Replication is aborting."

"Error (16) Unable to receive the response for a startReplication extended operation to consumer. Will retry later."

"Error (0) Unable to obtain current CSN. " "Replication is aborting."

decoding error of the replication control received from the consumer

state SENDING_UPDATES

If a replica was successful acquired the session goes to the next state: sending updates

There are several steps in this state with different error messages

step 1: examine RUV

First the RUV of the consumer is examined, potential errors are:

"Error (19) : Replica is not initialized"

This means the consumer replica has no update vector, replication was probably not enabled on the consumer

"Error (19) : Replica has different database generation ID, remote replica may need to be initialized"

The consumer has not been initialized with a database containing the same database generation. Either the supplier or the consumer need to be initialized

"Error (19) : Replica needs to be reinitialized"

The consumer replica is too old, no longer in use.

step 2: updating csn generator

The csns on each server are generated by a csn-generator, this takes into account the local time and detected time differeneces to remote servers. In a replication session the data for the csn generator are updated - and in case of failure these status updates are set:

"Error (2) : fatal error - too much time skew between replicas"

"Error (2) : fatal internal error updating the CSN generator"

step 3: initial changelog positioning

"Error (15) : Invalid parameter passed to cl5CreateReplayIterator"

General error if changelog cannot be processed, eg if the path does not exist

"Error (15) : Unexpected format encountered in changelog database"

Failure in parsing an entry from the changelog

"Error (15) : Changelog database was in an incorrect state"

"Error (15) : Incorrect dbversion found in changelog database"

"Error (15) : Changelog database error was encountered"

All errors related to the database layer of the changelog, error log should provide more information

"Error (15) : changelog memory allocation error occurred"

Failure to allocate memory, eg for changelog buffer or changelog iterator

"Error (15) : Data required to update replica has been purged from the changelog. " "The replica must be reinitialized."

"Error (15) : Changelog data is missing"

The above two error indicate that the supplier is ahead of the consumer and wants to send updates, but cannot find the starting point in the changelog. In the current version this is treated as FATAL error, but it could be resolved if the consumer receives updates from other suppliers, it will become TRANSIENT in a next version

step 4: sending next update

"Error (rc) : Failed to create result thread"

Updates are sent asynchronously to the consumer, a result thread is created to collect the response messages. The (rc) gives an indication why the thread could not be created.

"Error (15) : Invalid parameter passed to cl5GetNextOperationToReplay"

General error if changelog cannot be processed, eg if the path does not exist

"Error (15) : Database error occurred while getting the next operation to replay"

"Error (15) : Memory allocation error occurred (cl5GetNextOperationToReplay)"

step 6: subentry update

"Error (-1) :  Agreement is corrupted: missing suffix"

On each server a “replica keep alive” entry is created to improve performance for fractional replication, if it cannot be created this error is set

general send_updates result:

The following messages indicate that an error occured, but the error is considered temporary. They look very similar, the differenc is where the error originated.

"Error (18) : Incremental update transient error.  Backing off, will retry update later."

An error occured on the local server during changelog processing which is not fatal. The error log will contain more details.

"Error (16) : Incremental update connection error.  Backing off, will retry update later."

A replication connection was established, but got disconnected.

"Error (17) : Incremental update timeout error.  Backing off, will retry update later."

On an established replication connection a timeout was hit, replication will try to resume later.

Last modified on 2 April 2024