cancel
Showing results for 
Search instead for 
Did you mean: 

xDB MMR replication lag query fails

Adventurer

xDB MMR replication lag query fails

Hi,

 

We have one internal test MMR setup where I am performing table insert and then stopping the xDB service to monitor the lag. I used the query mentioned below. The values we are receiving are in negative (eg. -00:29:57.166467)!

 

On MDN node, when there is a long running data insertion, the lag is visible in negative (like above value) only on MDN node, while on non-MDN node we see no value (i.e. the value is NULL). When I do long running data insert on non-MDN node then on lag on non-MDN node is again negative (value around -00:29:57.166467).

 

If I stop the xdb service, the value increases (i.e. -00:29 to -00:27 to -00:24 to -00:22 and so on...). This happens on both the nodes. When i resume the xDB service, the value decreases (i.e. -00:22 to -00:24 to -00:27 until it reaches -00:29:59). That is it. The value does not go below this!

 

WITH
src_db (current_xid_commit_timestamp) AS
(SELECT timestamp FROM pg_last_committed_xact()),
-- last replicated timestamp for target databases
target_db (target_db_id, last_repl_xid_timestamp) AS
(SELECT db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp
FROM _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C'
GROUP BY db_id),
-- identity of target databases
target_db_identity (target_db_id, target_db_identity) AS
(SELECT pub_db_id, db_host || ':' || db_port || ':' || db_name
FROM _edb_replicator_pub.xdb_pub_database)
-- replication lag for each of the target databases
SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag
FROM src_db a, target_db b, target_db_identity c WHERE b.target_db_id = c.target_db_id
ORDER BY target_db_id;

 

Can someone help me how to interpret this negative output and what is the way to get the lag in time and in number of rows or in MBs?

10 REPLIES 10
EDB Team Member

Re: xDB MMR replication lag query fails

Hi pratmeht,

 

To investigate further on this, could you please share the below query output from all the nodes which are part of MMR replication.

 

selec timestamp FROM pg_last_committed_xact();

select max (last_repl_xid_timestamp) from _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C';

Adventurer

Re: xDB MMR replication lag query fails

Hi Kapil,

 

Note that at data insertion is no longer running. So number of tables and records are same on both the sites.

 

MDN node:

rwdb=# select timestamp FROM pg_last_committed_xact();
timestamp
----------------------------------
10-JUL-19 13:42:24.730344 +05:30
(1 row)

rwdb=# select max (last_repl_xid_timestamp) from _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C';
max
-----

(1 row)

rwdb=# \! hostname -i
10.10.17.47

 

Non-MDN node:

 

rwdb=# select timestamp FROM pg_last_committed_xact();
timestamp
----------------------------------
10-JUL-19 00:00:51.416383 +05:30
(1 row)

rwdb=# select max (last_repl_xid_timestamp) from _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C';
max
-----

(1 row)

rwdb=# \! hostname -i
10.10.17.51

Adventurer

Re: xDB MMR replication lag query fails

Let me know if you want me to reinitiate the data insertion (on one of the sites) and then run your queries.

Level 3 Adventurer

Re: xDB MMR replication lag query fails


Hi pratmeht,

We are checking you issue with the development team and will update you once we receive any feedback from them.

Adventurer

Re: xDB MMR replication lag query fails

Hi SwagataBanik,

 

Thanks. Waiting for your reply.

 

Regards,

Pratik Mehta

+91 9664022206

Highlighted
Adventurer

Re: xDB MMR replication lag query fails

Hi SwagataBanik,

 

Any luck ?

Level 3 Adventurer

Re: xDB MMR replication lag query fails

Hi pratmeht,

 

We are yet to recieve feedback from the development team for the query which can show the exact lag.

Thank you for your patience and co-operation.

Level 3 Adventurer

Re: xDB MMR replication lag query fails

Hi pratmeht,

 

Hope you are doing well.

 

Our development team has analyzed different possibilities but there is no accurate way to reflect the zero-lag as the Published data set changes are fully consumed. The key issue is that as the changes occur in other databases that are hosted on the same DB server that is hosting Publication database, replication stream channel is unaware of such changes and xDB cannot record last consumed change from those non-publication databases. Hence the best available option is to rely on the count of pending tx set and generate an alert in case the tx set count is increasing. 

 
We recommend to make use of the below query to monitor the replication lag based on pending_tx_set count threshold:
 
WITH 
src_db (current_xid_commit_timestamp) AS
(SELECT timestamp FROM pg_last_committed_xact()),
-- last replicated timestamp for target databases
target_db (target_db_id, last_repl_xid_timestamp) AS
(SELECT db_id AS target_db_id, MAX(last_repl_xid_timestamp) AS last_repl_xid_timestamp
 FROM _edb_replicator_pub.rrep_mmr_txset WHERE status = 'C'
 GROUP BY db_id),
-- pending / in-progress tx set for target databases
target_db_p_tx_set (pending_tx_set) AS
(SELECT COUNT(set_id) AS pending_tx_set
 FROM _edb_replicator_pub.rrep_mmr_txset WHERE  status IN ('P','R')
),
-- identity of target databases
target_db_identity (target_db_id, target_db_identity) AS
(SELECT pub_db_id, db_host || ':' || db_port || ':' || db_name
 FROM _edb_replicator_pub.xdb_pub_database)
-- replication lag for each of the target databases 
SELECT c.target_db_id, target_db_identity, AGE(current_xid_commit_timestamp, last_repl_xid_timestamp) repl_time_lag,d.pending_tx_set
FROM src_db a, target_db b, target_db_identity c,target_db_p_tx_set d WHERE b.target_db_id = c.target_db_id 
ORDER BY target_db_id;

 

Adventurer

Re: xDB MMR replication lag query fails

Hi SwagataBanik,

 

I am unable to understand on how to interpret lag (between 2 MMR sites) via the values derived from this query.

Can you suggest some systematic approach to test this query to find out the lag in a MMR setup ?

 

Regards,

Pratik Mehta