Fix and Reset Time Skew


Thanks to JR Aquino for these instructions.JR.Aquino>

WARNING: Following these steps will require having to perform a replica initialization of all of your servers!

If you are seeing clock skew errors in

/var/log/dirsrv/slapd-EXAMPLE-COM/errors

that look like this, then you will need to verify the time/date of the server to make sure NTP isn’t freaked out. If the system date is correct, it is possible that the change number generator has skewed.

[01/Feb/2014:14:42:06 -0800] NSMMReplicationPlugin - conn=12949 op=7 repl="dc=example,dc=com": Excessive clock skew from supplier RUV    
[01/Feb/2014:14:42:06 -0800] - csngen_adjust_time: adjustment limit exceeded; value - 1448518, limit - 86400    
[01/Feb/2014:14:42:06 -0800] - CSN generator's state:    
[01/Feb/2014:14:42:06 -0800] -  replica id: 115    
[01/Feb/2014:14:42:06 -0800] -  sampled time: 1391294526    
[01/Feb/2014:14:42:06 -0800] -  local offset: 0    
[01/Feb/2014:14:42:06 -0800] -  remote offset: 0    
[01/Feb/2014:14:42:06 -0800] -  sequence number: 55067    

The following NsState_Script should be used to determine whether the change number generator has jumped significantly from the real time/date. https://github.com/richm/scripts/blob/master/readNsState.py

The usage for the script works like this (as root):

# ./readNsState.py /etc/dirsrv/slapd-EXAMPLE-COM/dse.ldif    
nsState is cwAAAAAAAABGPfBSAAAAAAAAAAAAAAAAAQAAAAAAAAACAAAAAAAAAA==    
Little Endian    
For replica cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping tree,cn=config    
  fmtstr=[H6x3QH6x]    
  size=40    
  len of nsstate is 40    
  CSN generator state:    
    Replica ID    : 115    
    Sampled Time  : 1391476038    
    Gen as csn    : 52f03d46000201150000    
    Time as str   : Mon Feb  3 17:07:18 2014    
    Local Offset  : 0    
    Remote Offset : 1    
    Seq. num      : 2    
    System time   : Mon Feb  3 17:09:11 2014    
    Diff in sec.  : 113    
    Day:sec diff  : 0:113    

If the output from the above command is over a day or more out of sync, then the reason is because the CSN generator has become grossly skewed. It will be necessary to perform the following steps to recover. If the report contains more than one replica, you will need to clean all of the replicas reported (see below).

How to resolve this issue

Note that without the -r option it is deliberately omitting the tainted replication data which contains the bad CSNs

If you have more than one suffix/db, you will have to do this for each one that the readNsState.py script reports.

WARNING: Once you do the following step, replication will be broken until you complete all steps!!!!

WARNING: There is no going back now!!!!

BEGIN - REPEAT THESE STEPS ON EVERY REPLICA

END - REPEAT THESE STEPS ON EVERY REPLICA

Why is this necessary?

Further reading for those interested in the particulars of CSN tracking or the Multisupplier Replication algorithm:

It all starts with the Leslie Lamport paper: http://www.stanford.edu/class/cs240/readings/lamport.pdf “Time, Clocks, and the Ordering of Events in a Distributed System”

The next big impact on MMR protocols was the work done at Xerox PARC on the Bayou project.

These and other sources formed the basis of the IETF LDUP working group. Much of the MMR protocol is based on the LDUP work.

The tl;dr version is this:

The MMR protocol is based on ordering operations by time so that when you have two updates to the same attribute, the “last one wins”.

So how do you guarantee some sort of consistent ordering throughout many systems that do not have clocks in sync down to the millisecond? If you say “ntp” then you lose…

The protocol itself has to have some notion of time differences among servers.

The ordering is done by CSN (Change Sequence Number).

The first part of the CSN is the timestamp of the operation in unix time_t (number of seconds since the epoch).

In order to guarantee ordering, the MMR protocol has a major constraint - you must never, never, issue a CSN that is the same or less than another CSN.

In order to guarantee that, the MMR protocol keeps track of the time differences among _all_ of the servers that it knows about.

When it generates CSNs, it uses the largest time difference among all servers that it knows about.

So how does the time skew grow at all?

Due to timing differences, network latency, etc. the directory server cannot always generate the absolute exact system time. There will always be 1 or 2 second differences in some replication sessions. These 1 to 2 second differences accumulate over time.

However, there are things which can introduce really large differences

How can you monitor for this in the future?

The readnsState.py script mentioned above can be used to output the effective skew of the system date vs the CSN generator. You can set a crontab to run this script and monitor its output to catch any future severe drifts.

Ticket information for some of the fixes that have been implemented because of this work so far: https://github.com/389ds/389-ds-base/issues/853

Last modified on 2 April 2024