EDB Replication Server empowers DBAs with high-performance Multi-Master Replication (MMR) with the help of Postgres’ Logical Decoding framework. While performance is a signifiance improvement from the xDB 5.x trigger-based replication architecture, DBAs and sysadmins still seek to monitor and track replication lag. There are a number of ways to do this, and we’ll go over them.
Row Replication Lag
Much like usingpg_stat_replicationwhen employing Postgres’ built-in Streaming Replication, the most accurate form of replication lag monitoring is to see how many bytes (in this case, rows) a standby/target databse is behind its provider. The way to measure this is by executing the following query on the control databse (in most cases, the MDN):
Note that this is the same query that Postgres Enterprise Manager’s (PEM) XDB Replication Probe uses. If you’re already using PEM, you can just enable the XDB Replication Probe, and you’ll be able to monitor the lag from the PEM client.
Time Replication Lag
For most people, they’re not interested in the number of bytes or rows a replication cluster is lagging behind. Instead, they’re more interested in how long it takes for ALL the unreplicated data to get to the other side. For this, there’s no real way to get a measurement without having a third-party monitoring to see if a change in one server has appeared in another server. Moreover, in an MMR situation, there are complexities in defining replication lag in terms of wallclock time, since all nodes are Master Databases.
One way to measure time lag in an EDB Replication Server cluster involves a few tricks, and the setup looks like this:
A table dedicated to testing replication performance (we’ll call itxdb_lag_test)
A table tabulating the time lag history (we’ll call itxdb_lag_history)
A function returning a trigger (namedupdate_xdb_lag_history)
A trigger on thexdb_lag_testtable
First, we create thexdb_lag_testtable:
Then, we add it to the publication (in this case, I’m creating a new publication, since it’s a new cluster in my environment):
After we verify that replication is working in both directions, create the xdb_lag_history table, trigger and function on the MDN:
It isvery importantto note theALTER TABLEstatement that setsENABLE ALWAYS TRIGGER– this prevents the DML coming from the non-MDN from being ignored by the trigger. Without this trigger enabled, you’ll never get any data inserted intoxdb_lag_history.
Now,INSERTa row intoxdb_lag_teston any of your non-MDNs:
Wait a few seconds for the data to replicate into the MDN, and then go check:
In this system, there was a 4.36sec lag in propagating theINSERTstatement. Now, we can write to all the non-MDNs and see how much time it takes for data to replicate into the MDN. From here, other instrumentation can be to track time lag in EDB Replication Server.