Hi team,
I have configured edb with efm in the following servers.
Master-172.31.0.99
slave-172.31.5.37
witness-172.31.8.153
I am facing error while checking efm cluster-status in master and standby node as follows.
Master:
[root@ip-172-31-0-99 ~]# efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 172.31.0.99 UP UP
Witness 172.31.8.153 UP N/A
Warning: Could not get status from the following node(s):
172.31.5.37
It could be that this node(s) has failed but has not reached the failure timeout period yet.
Allowed node host list:
172.31.0.99 172.31.5.37 172.31.8.153
Membership coordinator: 172.31.8.153
Standby priority host list:
172.31.5.37
Promote Status:
DB Type Address XLog Loc Info
--------------------------------------------------------------
Master 172.31.0.99 0/2D000680
No standby databases were found.
Slave:
[root@ip-172-31-5-37 ~]# efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Standby 172.31.5.37 UP UP
Witness 172.31.8.153 UP N/A
Warning: Could not get status from the following node(s):
172.31.0.99
It could be that this node(s) has failed but has not reached the failure timeout period yet.
Allowed node host list:
172.31.0.99 172.31.5.37 172.31.8.153
Membership coordinator: 172.31.8.153
Standby priority host list:
172.31.5.37
Promote Status:
DB Type Address XLog Loc Info
--------------------------------------------------------------
Standby 172.31.5.37 0/2D000760
No master database was found.
Witness:
[root@ip-172-31-8-153 ~]# efm cluster-status efm
Cluster Status: efm
Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Master 172.31.0.99 UP UP
Standby 172.31.5.37 UP UP
Witness 172.31.8.153 UP N/A
Allowed node host list:
172.31.0.99 172.31.5.37 172.31.8.153
Membership coordinator: 172.31.8.153
Standby priority host list:
172.31.5.37
Promote Status:
DB Type Address XLog Loc Info
--------------------------------------------------------------
Master 172.31.0.99 0/2D000760
Standby 172.31.5.37 0/2D000760
Standby database(s) in sync with master. It is safe to promote.
Please find following recovery.conf file content.
[root@ip-172-31-5-37 data]# more recovery.conf
standby_mode = 'on'
primary_conninfo = 'user=edbrepuser password=Edb@12345 host=172.31.0.99 port=5444 sslmode=prefer sslcompression=1 krbsrvname=postgres target_sessi
on_attrs=any'
trigger_file='/opt/edb/as9.6/data/failover_trigger'
restore_command='scp enterprisedb@172.31.0.99:/opt/edb/as9.6/data/archive/%f'
recovery_target_timeline = 'latest'
EFM error log of standby:
1/23/19 6:58:20 AM com.enterprisedb.efm.utils.ClusterUtils sendStatusRequestToNodes WARNING: Could not reach node(s): 172.31.0.99
1/23/19 6:58:25 AM org.jgroups.protocols.TP sendToMembers SEVERE: JGRP000034: ip-172-31-5-37-1846(172.31.5.37): failure sending message to ip-172-31-0-99-38176(172.31.0.99): java.net.SocketTimeoutException: connect timed out
1/23/19 6:58:33 AM org.jgroups.protocols.TP sendToMembers SEVERE: JGRP000034: ip-172-31-5-37-1846(172.31.5.37): failure sending message to ip-172-31-0-99-38176(172.31.0.99): java.net.SocketTimeoutException: connect timed out
1/23/19 6:58:41 AM org.jgroups.protocols.TP sendToMembers SEVERE: JGRP000034: ip-172-31-5-37-1846(172.31.5.37): failure sending message to ip-172-31-0-99-38176(172.31.0.99): java.net.SocketTimeoutException: connect timed out
1/23/19 6:58:49 AM org.jgroups.protocols.TP sendToMembers SEVERE: JGRP000034: ip-172-31-5-37-1846(172.31.5.37): failure sending message to ip-172-31-0-99-38176(172.31.0.99): java.net.SocketTimeoutException: connect timed out
Please help me out.
Hi Sushmitha,
It seems to network issue between Master(172.31.0.99) and Slave(172.31.5.37) servers. Could you please confirm the below things:
efm.properties file:
1. bind.address=<IP>:<PORT>
2. admin.port=<PORT>
Database:
1. Database port
Please check the above 3 ports with 'telnet' command, Wheater it is open or not from Master to Slave and vice-versa.
Hi Sushmitha,
It seems to network issue between Master(172.31.0.99) and Slave(172.31.5.37) servers. Could you please confirm the below things:
efm.properties file:
1. bind.address=<IP>:<PORT>
2. admin.port=<PORT>
Database:
1. Database port
Please check the above 3 ports with 'telnet' command, Wheater it is open or not from Master to Slave and vice-versa.
Hi,
It worked by allowing 7800 port in security group.However 7809 is still not reachable even after allowing.
Thanks for the solution :)
Hi Sushmita,
Thanks for the update. Regarding the bind port and admin port, the bind port (7800) is for cluster, so other agents can communicate with each other on it.
And the admin port (7809) is usedto communicate with local agent for commands like cluster-status . so, there is no need to open it for other nodes.
Please let us know in case any further queries/issues.
Regards,
Sudhir