cancel
Showing results for 
Search instead for 
Did you mean: 

EDB Failover manager switchover behaviour

SOLVED
Highlighted
Level 3 Adventurer

EDB Failover manager switchover behaviour

Hi All,

 

I have a question reagarding switchover feature in EDB FM.

 

I have four standby's and one master, and I want to use the switchover feature to make one of the standby as new master and old master as a standby, I used below command:

bash$ efm promote efm -switchover -sourcenode ipaddr

                    - where ipaddr is the address of the standby which you want to promote as the new master.

I am trying this, but during switchover, efm is not promoting the standby whose ipaddr is given above, it is promoting the standby that has first priority in efm priority list.

 

Is this correct behaviour? I think it should promote the standby whose ipaddr is in the -sourcenode option. What do you guys think?

 

Thank you! 

 

Regards,

Nikhil

1 ACCEPTED SOLUTION

Accepted Solutions
EDB Team Member

Re: EDB Failover manager switchover behaviour

Hi Nikhil,

 

Thank you for raising this query. 

 

This is the expected behaviour of EFM.  The node priority list specifies who should be the new master. 

The -sourcenode option has nothing to do with which node is promoted. It is the source from which the recovery.conf file is copied to the old master. 

 

This was reported as a bug in EFM 3.1 which will be fixed in the 3.2 version.

5 REPLIES
EDB Team Member

Re: EDB Failover manager switchover behaviour

HI Nikhil,

Please share the EFM configuration file and logs when you tried to switchover. Also, can you confirm the replication status of the slave at the time you tried to promote it as the new master ?

Level 3 Adventurer

Re: EDB Failover manager switchover behaviour

Hi,

 

Master - 206.189.92.54

Slave1 - 159.65.135.89

Slave2 - 159.65.4.28

Slave3 - 139.59.73.25

 

I used below command to promote slave3 for switchover, but slave 1 got promoted.:

efm promote efm -switchover -sourcenode 139.59.73.25

 

 

1) Below are the changes I have made to the efm.properties file:

db.user=enterprisedb
db.password.encrypted='enterprisedb_encrypted_pwd'

db.port=5324
db.database=edb
db.service.owner=enterprisedb
db.bin=/opt/edb/as10/bin
db.recovery.conf.dir=/opt/edb/as10/data
script.notification=/etc/edb/efm-3.1/message.sh
bind.address=nodeip:7800
admin.port=7809
auto.reconfigure=true
auto.failover=true
promotable=true

 

2) EFM logfile during switchover:

6/13/18 8:49:00 AM com.enterprisedb.efm.admin.AdminServerThread processRequest INFO: admin service responding: Standby%%139.59.73.25%%true%%true%% %%false||Standby%%139.59.73.52%%true%%true%% %%false||Standby%%159.65.135.89%%true%%true%% %%false||Standby%%159.65.4.28%%true%%true%% %%false||Master%%206.189.92.54%%true%%true%% %%false||206.189.92.54 159.65.135.89 159.65.4.28 139.59.73.25 139.59.73.52%%206.189.92.54%%159.65.135.89 139.59.73.25 159.65.4.28%%(List is empty.)

6/13/18 8:49:02 AM com.enterprisedb.efm.admin.AdminServerThread processRequest INFO: Admin server received switchover request.

6/13/18 8:49:02 AM com.enterprisedb.efm.admin.AdminServerThread processRequest INFO: Received param: 139.59.73.25|false

6/13/18 8:49:02 AM com.enterprisedb.efm.nodes.EfmNode doSwitchover INFO: Switchover command passed source node "139.59.73.25" and quiet mode "false"

6/13/18 8:49:02 AM com.enterprisedb.efm.nodes.EfmAgent handleStatusCall INFO: responding to db status request with status: Standby%%139.59.73.25%%true%%true%% %%false

6/13/18 8:49:02 AM com.enterprisedb.efm.nodes.EfmNode doSwitchover INFO: Using local recovery.conf file to send to master.

6/13/18 8:49:02 AM com.enterprisedb.efm.nodes.EfmNode doManualPromotion INFO: Starting manual promotion.

6/13/18 8:49:02 AM com.enterprisedb.efm.nodes.EfmNode doManualPromotion WARNING: Telling master node ho-5568 at 206.189.92.54 to stop monitoring database.

6/13/18 8:49:03 AM com.enterprisedb.efm.nodes.EfmNode$8 run INFO: Reloading shared state per coordinator command

6/13/18 8:49:03 AM com.enterprisedb.efm.nodes.EfmNode setState INFO: setState called: [ master: null promoting: null standbys (4): slave-01-14611 cascading-DR-16613 slave-03-DR-28044 master-61908 witnesses (0): idle nodes (1): ho-5568 ]

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmAgent handleStatusCall INFO: responding to db status request with status: Standby%%139.59.73.25%%true%%true%% %%false

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmAgent handleStatusCall INFO: responding to db status request with status: Standby%%139.59.73.25%%true%%true%% %%false

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmNode setLastPromoted INFO: Last promoted address set to 159.65.135.89

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmNode handle INFO: Last promoted address set to 159.65.135.89

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmNode$8 run INFO: Reloading shared state per coordinator command

6/13/18 8:49:06 AM com.enterprisedb.efm.nodes.EfmNode setState INFO: setState called: [ master: null promoting: slave-01-14611 standbys (3): cascading-DR-16613 slave-03-DR-28044 master-61908 witnesses (0): idle nodes (1): ho-5568 ]

6/13/18 8:49:08 AM com.enterprisedb.efm.nodes.EfmAgent handleStatusCall INFO: responding to db status request with status: Standby%%139.59.73.25%%true%%true%% %%false

6/13/18 8:49:10 AM com.enterprisedb.efm.utils.RecoveryConfUtils reconfigureRecoveryConf INFO: Changing recovery.conf file to use new host 159.65.135.89.

6/13/18 8:49:10 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: [sudo -u enterprisedb /usr/edb/efm-3.1/bin/efm_db_functions changemasterhost efm 159.65.135.89]

6/13/18 8:49:10 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut=''}

6/13/18 8:49:10 AM com.enterprisedb.efm.nodes.EfmAgent switchToNewMaster INFO: Restarting database. Will stop monitoring during restart.

6/13/18 8:49:10 AM com.enterprisedb.efm.nodes.EfmNode setNodeState INFO: New internal state: RECONFIGURING

6/13/18 8:49:10 AM com.enterprisedb.efm.DBMonitor shutdown INFO: Shutdown called on DB monitor.

6/13/18 8:49:10 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: [sudo -u enterprisedb /usr/edb/efm-3.1/bin/efm_db_functions restartdb efm]

6/13/18 8:49:10 AM com.enterprisedb.efm.DBMonitor watchDB INFO: *DB monitor interrupted*

6/13/18 8:49:10 AM com.enterprisedb.efm.nodes.EfmAgent startMonitoring INFO: Db monitor has stopped. Current internal state: RECONFIGURING

6/13/18 8:49:10 AM com.enterprisedb.efm.nodes.EfmAgent startMonitoring INFO: Agent has stopped watching the local database while it is being reconfigured.

6/13/18 8:49:10 AM com.enterprisedb.efm.nodes.EfmAgent run INFO: Running as the db owner: false

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut='waiting for server to shut down.... done
server stopped
waiting for server to start....2018-06-13 08:49:11 UTC LOG: listening on IPv4 address "0.0.0.0", port 5324
2018-06-13 08:49:11 UTC LOG: listening on IPv6 address "::", port 5324
2018-06-13 08:49:11 UTC LOG: listening on Unix socket "/tmp/.s.PGSQL.5324"
2018-06-13 08:49:11 UTC LOG: redirecting log output to logging collector process
2018-06-13 08:49:11 UTC HINT: Future log output will appear in directory "log".
done
server started'}

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent switchToNewMaster INFO: Resuming monitoring.

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent resumeMonitoring INFO: Connecting to database and determining recovery status

6/13/18 8:49:11 AM com.enterprisedb.efm.DBMonitor isInRecovery INFO: Performing pg_is_in_recovery query on jdbcSmiley Tongueostgresql://139.59.73.25:5324/edb?ApplicationName='efm-3.1'

6/13/18 8:49:11 AM com.enterprisedb.efm.DBMonitor isInRecovery INFO: Query result: true

6/13/18 8:49:11 AM com.enterprisedb.efm.utils.RecoveryConfUtils validateTriggerFileLocation INFO: Validating trigger file location from recovery.conf file.

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: [sudo -u enterprisedb /usr/edb/efm-3.1/bin/efm_db_functions validatetriggerlocation efm]

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut='/opt/binary/slave2'}

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut='/opt/binary/slave2'}

6/13/18 8:49:11 AM com.enterprisedb.efm.utils.RecoveryConfUtils validateTriggerFileLocation INFO: Current trigger file location for promotion: /opt/binary/slave2

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent validateRecoveryTargetTimeline INFO: Reading recovery_target_timeline value from recovery.conf file.

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent resumeMonitoring INFO: Setting state back to running from 'reconfiguring' state.

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmNode setNodeState INFO: New internal state: RUNNING

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent$2 run INFO: Resuming monitoring.

6/13/18 8:49:11 AM com.enterprisedb.efm.nodes.EfmAgent validateRecoveryTargetTimeline INFO: Reading recovery_target_timeline value from recovery.conf file.

6/13/18 8:49:11 AM com.enterprisedb.efm.DBMonitor watchDB INFO: DBMonitor connecting to jdbcSmiley Tongueostgresql://139.59.73.25:5324/edb?ApplicationName='efm-3.1'

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: [/etc/edb/efm-3.1/message.sh, [INFO] EFM Standby agent running on 139.59.73.25 for cluster efm, EFM node: 139.59.73.25
Cluster name: efm
Database name: edb
VIP:

Standby agent is running and database health is being monitored.]

6/13/18 8:49:11 AM com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut=''}

6/13/18 8:49:11 AM com.enterprisedb.efm.utils.CustomMonitor start INFO: Custom monitoring is not being used.

6/13/18 8:49:13 AM com.enterprisedb.efm.nodes.EfmAgent handleStatusCall INFO: responding to db status request with status: Standby%%139.59.73.25%%true%%true%% %%false

6/13/18 8:49:14 AM com.enterprisedb.efm.nodes.EfmNode$8 run INFO: Reloading shared state per coordinator command

 

 

 

3) All slaves where in sync with the master when i tried to promote.

Cluster Status: efm

Agent Type Address Agent DB VIP
-----------------------------------------------------------------------
Standby 139.59.73.25 UP UP
Standby 139.59.73.52 UP UP
Standby 159.65.135.89 UP UP
Standby 159.65.4.28 UP UP
Master 206.189.92.54 UP UP

 

Allowed node host list:
206.189.92.54 159.65.135.89 159.65.4.28 139.59.73.25 139.59.73.52

Membership coordinator: 206.189.92.54

Standby priority host list:
159.65.135.89 139.59.73.25 159.65.4.28

Promote Status:

DB Type Address XLog Loc Info
--------------------------------------------------------------
Master 206.189.92.54 1/F0047B0
Standby 139.59.73.25 1/F0047B0
Standby 159.65.4.28 1/F0047B0
Standby 139.59.73.52 1/F0047B0
Standby 159.65.135.89 1/F0047B0

Standby database(s) in sync with master. It is safe to promote.

 

 

Regards,

Nikhil

EDB Team Member

Re: EDB Failover manager switchover behaviour

Hi Nikhil,

 

Thank you for raising this query. 

 

This is the expected behaviour of EFM.  The node priority list specifies who should be the new master. 

The -sourcenode option has nothing to do with which node is promoted. It is the source from which the recovery.conf file is copied to the old master. 

 

This was reported as a bug in EFM 3.1 which will be fixed in the 3.2 version.

EDB Team Member

Re: EDB Failover manager switchover behaviour


@Nikhil wrote:

 

Is this correct behaviour? I think it should promote the standby whose ipaddr is in the -sourcenode option. What do you guys think?

 

Hi,

 

The efm --help message has been fixed (will be in 3.2). There was one line cut out that described the -sourcenode option:

 

promote

Perform manual failover of master to standby (overrides 'auto.failover' property).

Use the -switchover switch to promote and reconfigure the master as a new standby.

Use the optional -sourcenode switch during switchover to choose the node from

which the recovery.conf file will be copied to the master.

Use the optional -quiet switch during switchover to turn off normal notifications

during the switchover. Full example:

efm promote <cluster_name> -switchover [-sourcenode <address>] [-quiet]

 

The user's guide will be fixed soon.

 

Cheers,

Bobby

 

Level 3 Adventurer

Re: EDB Failover manager switchover behaviour

Hi @chaitalirs@bbissett

 

Thank you for your input guys.

Really helpful info.

 

Regards,

Nikhil