cancel
Showing results for 
Search instead for 
Did you mean: 

EFM Status on Master and Slave

Level 3 Adventurer

EFM Status on Master and Slave

Hi Guys,

I have configured EFM for 3 server(master-slave-witness).The EFM status on witness shown both master and slvae in sync.But efm status on each master and slave doesnt show the same.Can anyone help me in understanding this??

 

Witness:

[root@ip-172-31-11-7 ~]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      172.31.0.127         UP     UP
        Witness     172.31.11.7          UP     N/A
        Standby     172.31.4.185         UP     UP

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      172.31.0.127         0/2B0001E8
        Standby     172.31.4.185         0/2B0001E8

        Standby database(s) in sync with master. It is safe to promote.

 

Slave :

[root@ip-172-31-4-185 edb-postgress]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Witness     172.31.11.7          UP     N/A
        Standby     172.31.4.185         UP     UP

Warning: Could not get status from the following node(s):
        172.31.0.127
It could be that this node(s) has failed but has not reached the failure timeout period yet.

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Standby     172.31.4.185         0/2B000220

        No master database was found.

Master:

[root@ip-172-31-0-127 ~]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      172.31.0.127         UP     UP
        Witness     172.31.11.7          UP     N/A

Warning: Could not get status from the following node(s):
        172.31.4.185
It could be that this node(s) has failed but has not reached the failure timeout period yet.

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      172.31.0.127         0/2B000220

        No standby databases were found.

6 REPLIES
EDB Team Member

Re: EFM Status on Master and Slave


@Mahisha wrote:

Hi Guys,

I have configured EFM for 3 server(master-slave-witness).The EFM status on witness shown both master and slvae in sync.But efm status on each master and slave doesnt show the same.Can anyone help me in understanding this??

 

Witness:

[root@ip-172-31-11-7 ~]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      172.31.0.127         UP     UP
        Witness     172.31.11.7          UP     N/A
        Standby     172.31.4.185         UP     UP

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      172.31.0.127         0/2B0001E8
        Standby     172.31.4.185         0/2B0001E8

        Standby database(s) in sync with master. It is safe to promote.

 

Slave :

[root@ip-172-31-4-185 edb-postgress]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Witness     172.31.11.7          UP     N/A
        Standby     172.31.4.185         UP     UP

Warning: Could not get status from the following node(s):
        172.31.0.127
It could be that this node(s) has failed but has not reached the failure timeout period yet.

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Standby     172.31.4.185         0/2B000220

        No master database was found.

Master:

[root@ip-172-31-0-127 ~]# efm cluster-status efm-edb96
Cluster Status: efm-edb96

        Agent Type  Address              Agent  DB       VIP
        -----------------------------------------------------------------------
        Master      172.31.0.127         UP     UP
        Witness     172.31.11.7          UP     N/A

Warning: Could not get status from the following node(s):
        172.31.4.185
It could be that this node(s) has failed but has not reached the failure timeout period yet.

Allowed node host list:
        172.31.11.7 172.31.4.185 172.31.0.127

Membership coordinator: 172.31.11.7

Standby priority host list:
        172.31.4.185

Promote Status:

        DB Type     Address              XLog Loc         Info
        --------------------------------------------------------------
        Master      172.31.0.127         0/2B000220

        No standby databases were found.


Above output highlights few things regarding the setup. If you look at the efm cluster-status output on salve it says master ("172.31.0.127") node status is missing and on the master efm cluster status says Slave(172.31.4.185) node details are missing. 

 

Can you try few things to see where the problem ?

 

  • Run "select * from pg_stat_replication" query on Master to see if there's a entry of Slave node for replication. 
  • Try connecting from Slave to Master and vice-versa using "psql -h <masterHostip> -U username -p portnumber -d database" to there's a handshake between two nodes. 

After executing above 2 checks if you won't find any issue, then restart the efm agents and recheck the efm cluster status. 

 

--Raghav

Highlighted
Level 3 Adventurer

Re: EFM Status on Master and Slave

Hi Raghav,

 

Please find the output of pg_stat_replication.

edb=# select * from pg_stat_replication;
 pid  | usesysid |  usename   | application_name | client_addr  | client_hostname | client_port |          backend_start           | backend_xmin |   state   | sent_lo
cation | write_location | flush_location | replay_location | sync_priority | sync_state
------+----------+------------+------------------+--------------+-----------------+-------------+----------------------------------+--------------+-----------+--------
-------+----------------+----------------+-----------------+---------------+------------
 1908 |    16395 | edbrepuser | walreceiver      | 172.31.4.185 |                 |       36596 | 10-OCT-18 08:26:24.170248 -04:00 |              | streaming | 0/2800B
B38    | 0/2800BB38     | 0/2800BB38     | 0/2800BA90      |             0 | async
(1 row)

 

When I tried to connect the database from master to slave,its connecting.

Slave to Master:

-bash-4.2$ ./psql -d edb -U enterprisedb -h 172.31.2.48 -p5442
Password for user enterprisedb:
psql.bin (9.6.10.17)
Type "help" for help.

edb=#

Master to Slave:

-bash-4.2$ ./psql -h 172.31.4.185 -U enterprisedb -p5442 -d edb
Password for user enterprisedb:
psql.bin (9.6.10.17)
Type "help" for help.

edb=#

 

Please suggest if anything else can be tried to resolve the same.

 

Regards,

Manisha

EDB Team Member

Re: EFM Status on Master and Slave

Hi Manisha,

 

As pointed out by Raghav, there seems to be some issue for EFM agents to connect from Master to Standby and viceversa.

 

Could you please check whether your firewall is not blocking EFM port and database from these servers.

 

Also, could you please change efm.loglevel to FINER, restart the EFM agents, and share the EFM logs with us.

 

Regards,

Sudhir

Level 3 Adventurer

Re: EFM Status on Master and Slave

Hi Sudhir,

 

This issue got resolved after I dis-allowed all nodes and allowed it again with restart of EFM Agents.But as per my understanding this can not be the reason for EFM Status change.Is there anything that was the cause which got resolved after the restart??....I had tried normal restart previously but it hadnt worked.

 

 

EDB Team Member

Re: EFM Status on Master and Slave

Hi Manisha,

 

Yes, thats correct, maybe you have allowed/disallowed nodes to the EFM cluster from multiple nodes and which might have caused this different EFM status issue.

 

Regards,

Sudhir

EDB Team Member

Re: EFM Status on Master and Slave


@Mahisha wrote:

Hi Sudhir,

 

This issue got resolved after I dis-allowed all nodes and allowed it again with restart of EFM Agents.But as per my understanding this can not be the reason for EFM Status change.Is there anything that was the cause which got resolved after the restart??....I had tried normal restart previously but it hadnt worked 


You're right -- this can't be related to the allowed node list. Otherwise the three nodes would never have been in the same cluster in the first place.

 

When you run the 'efm cluster-status <cluster>' command on node X, the agent on that node sends a request out to every node in the cluster, including itself, asking for agent-specific info. It then puts that info together and sends it back to the 'efm' command process for output. In this case, the master and slave both knew the other existed, but the request to get status info did not return within the 'remote.timeout' time period. Since both nodes were clearly still up, and connected to the witness, there must have been some network issue that prevented communication between the master/slave nodes for at least that time period.

 

There should probably be a warning in the logs on those two nodes at that time saying they couldn't reach the other, but whatever caused it didn't last long enough to cause a failure. There could also be weird edge cases with IP addresses (*), but if things are working now the above is probably the reason.

 

Cheers,

Bobby

 

(*) E.g. If you start agent A with the wrong address for B, then start B with the correct address for A, they will eventually find each other. How/what/when that happens isn't defined, especially if node C then appears with the address A was using incorrectly before. There are lots of ways besides the prescribed ones to do things. Smiley Happy