cancel
Showing results for 
Search instead for 
Did you mean: 

Auto-Failover Not Happening (EFM-3.4)

Adventurer

Auto-Failover Not Happening (EFM-3.4)

Good day, guys,

I'd like to ask for your help as I've been working at this for hours to no avail.

I have a Master, a Standby, and a Witness Node, and the cluster status run on all nodes indicate that they are syncing fine.

clusterstatussync.png


But when I stop the database cluster on the Master (using pg_ctl stop), auto-failover does not happen. Nothing happens also when I try to do a manual switchover (efm promote efm -switchover).

I've attached the efm.properties file and the efm.log log file on each node for your reference.

Regards,
Warren

PS.
The underlying DB engine is Postgres CE 11

db1 (Master) files: https://drive.google.com/open?id=1WPNncnC38J0Niy0uFJlcb4muQ7D3-UKn
db2 (Standby) files: https://drive.google.com/open?id=1hQrMYPFOHYDy2opxS69XToXRyGmGYcYL
witness files: https://drive.google.com/open?id=1AEy59Y3ZglQ2ZoCXJM5w94jj6X8mylq9

 

15 REPLIES 15
EDB Team Member

Re: Auto-Failover Not Happening (EFM-3.4)

Hi @wcruz,

 

We are not able to download the files from the link provided by you if the size is not big, could you please attach it to this thread only.

 

Regards,

Sudhir

Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

Hi, Sudhir,

Is there an attach file option in the forum? All I can see is for photos.

Thanks.

 

Regards,

Warren

Highlighted
Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

Also, how do we determine if it's a JGroups issue?

I tested auto-FO in the DR cluster, and it worked there. I copied over the config files from the DR nodes to PROD, modified the pertinent IPs for PROD....but still auto-FO is not happening in PROD.

Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

I'm elso encountering this issue with the PROD Master that when I start the EFM service (systemctl start efm-3.4.service), the start-up will fail but the VIP will be successfully added to the network interface. The 7809 Admin process would also be spawned and doing a cluster-status on the Witness Node will show the Master there successfully.

I would then have to kill the 7809 process and other EFM processes and start the service again. This time it will start successfully.

If I remove the VIP by doing a network service restart (systemctl restart network), the failure to start the service will happen again if I attempt to start the EFM service.


EDB Team Member

Re: Auto-Failover Not Happening (EFM-3.4)

Hi Wcuz, 

 

Could you please look into below concerns. 

 

1. What is the error you are seeing when you do "efm promote efm -switchover" command...?

2. Could you please share efm.properties file from all three nodes. 

 

Regards

Siva.

 

Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

Hi, Siva,

That's the weird thing, I don't see any errors when I run the command and even in the log file:

switchovercommand.png

These are the efm.properties files:
db1: https://ctrlv.it/txt/216628/3397291979
db2: https://ctrlv.it/txt/216629/1812239174
witness: https://ctrlv.it/txt/216630/1082399809

Thanks!

Regards,
Warren

Adventurer

Re: Auto-Failover Not Happening (EFM-3.4)

I restarted the 3 nodes in the EFM cluster and the problem went away! HAHAHAHA!

When all else fails.....REBOOT!

EDB Team Member

Re: Auto-Failover Not Happening (EFM-3.4)

Hi @wcruz,

 

Agreed and glad to hear that the issue solved. From the logs, it looks like the communication was not happening between the nodes that could have been resolved after the reboot.

 

Regards,

Sudhir