cancel
Showing results for 
Search instead for 
Did you mean: 

EFM Dedicated Witness Node

SOLVED
Level 3 Adventurer

EFM Dedicated Witness Node

Hello Experts,

 

We plan to deploy 2 node streaming replication hot/warm standby with a dedicatd witness box (on a 3rd node). What happens if the 3rd node i.e. the witness node is down and then the master crashes? Will the standby still get promoted as the master?

 

Thanks

Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
EDB Team Member

Re: EFM Dedicated Witness Node


@pcpg wrote:

Does it mean that there is no way we can handle the scenario that I have described in my reply to Deepanshu above?


The short answer is no. That's not a limitation of EFM, but just the way the math works. The longer answer is below:

 

Starting with a simple master/standby/witness cluster, I think the question is "why can't I have master node failover if the witness fails?" An important thing to remember in the following is a design decision of EFM that it is better to have no masters than to have more than one master. Another is that network isolation looks the same as failure between nodes: if node A can no longer talk to B, there is no way for A to know if B failed or is just disconnected (and could be reconnected at any time).

 

Case 1: Witness fails, and some time later the master node fails.

 

In this case, we're starting with one master and one standby node. The master fails, and the standby sees it disappear. To the standby, there is no way to know whether the master node actually failed or simply was disconnected from the standby. The standby simply sees itself as part of a new, smaller, cluster. The new cluster size it sees is not a majority of the previous cluster (1 is not more than half of 2), so it cannot promote in case the master is still running somewhere. To put it another way, there is no way to support this case that would not create multiple masters in the case of simple network isolation.

 

For comparison, if the master is isolated while the witness is still alive, the standby would see it's in a 2-node cluster instead of 3 nodes, which is a majority, and so would promote. The master agent would see that it's in a minority of the cluster (1 compared to 3) and fence itself off. Thus, this case is safe whether the master node actually failed or was isolated from the network.

 

(Note: You suggested adding a witness agent to the existing standby node. As pointed out, this is not possible because both agents would try to use the same resources on that node at the same time. Even if you could do it, I'm not sure it helps with this use case. I guess it would mean that, if the two nodes are separated, the standby/witness one is "more important" and the master should isolate itself. Am not sure whether that would be a good idea or not since you still have a working master that applications may still be able to reach.)

 

Case 2: Witness and master fail at same time.

 

By "same time" here, we really mean close in time to each other, due to the various ways networks can fail. Am not sure this is what you're asking about, but wanted to incude it. In this case, the standby will see its cluster drop from 3 nodes to 1. As above, it has no way to know whether a) 2 nodes died or b) it was isolated from those two nodes. In the latter case, the master and witness would still be working just fine, so it would be very bad for the standby to promote itself. Since it can't tell which case is happening, it has to take the safe route and consider itself isolated from the majority of the cluster.

 

Either way, case 1 or 2, there is no safe way for the standby to know that it should promote.

 

I hope the above helps. The main idea is that, in your use case, if we promote the standby this will produce 2 masters in the network isolation case, and EFM does whatever it can to avoid these split-brain results. If you really need to run with just two nodes, you should at least get notifications about what is happening, and use the 'efm promote' command if needed after the master (half of the cluster) fails.

 

Cheers,

Bobby

 

View solution in original post

13 REPLIES 13
Moderator

Re: EFM Dedicated Witness Node

Answering your questions in reverse::

 

2. Will the standby still get promoted as the master?

No the promotion will not be done.

 

1. What happens if the 3rd node i.e. the witness node is down and then the master crashes? 

When witness node goes down, you will get  a notification and no further action (promotion) will take place. The reason to provide a minimum three node architecture is to ensure that in case slave node has issues and it is not able to reach and promotes itself. With witness node, it ensures that you have two confirmations before the slave gets promoted.

 

Also, since you are using a third dedicated box as witness, you can have use it as a slave instead of just a witness. If you have 2 or more slaves you do not need a witness box in your deployment.  

 

Level 3 Adventurer

Re: EFM Dedicated Witness Node

Thanks Amit.

 

We understand that a 2nd slave will do away with the need for a dedicated witness box but in our case the 3rd node i.e. the dedicated witness box is going to be a VM with minimal resources for a small footprint EDB installation. Also, a 2nd slave will mean so many disks/space, cpu, memory, etc. basically same requirements as an entire DB server which is what we are trying to avoid since that will be hardly ever used.

 

Regards

Level 3 Adventurer

Re: EFM Dedicated Witness Node

We tested on a setup with 3 nodes (master, slave & witness). On testing, we observed the foll.:

The witness node was shutdown. And then the master db was stopped. At that point of time, the slave got promoted as master without any manual intervention. That is surprising!

 

This is not in line with what you mentioned. So, can you pls help understand why did that happen? Are we testing incorrectly OR is this some enhancement in latest version?

 

Also, the Note on page https://www.enterprisedb.com/docs/en/2.1.1/edbfm/EDB_Failover_Manager_Guide.1.31.html# 

says "If there is only one Master and one Standby remaining, there is no failover protection in the case of a Master node failure. In the case of a Master database failure, the Master and Standby agents can agree that the database failed and proceed with failover." 

Does it mean to say, that if the witness node is not available and the master "node" fails then (auto) promotion will not take place but if the master "database" fails then the promotion will still take place?

 

Thanks

Moderator

Re: EFM Dedicated Witness Node

Yes your understanding is correct. If the master database fails, the failover will occur, just like you witnessed in your test. But if the node goes down, no failover will happen. 

Level 3 Adventurer

Re: EFM Dedicated Witness Node

That help Amit. Will have this tested and revert.

Also, believe if we want to have auto-failover (promotion), even in case of failure of Master node, then we can do another installation of EFM on the Slave node and once the Witness node is down, we start that second EFM on the Slave as a Witness. So again we have 3 agents (1-Master agent on Master node, 2-Slave agent on Slave node, 3-Witness agent on Slave node) and then if the Master node goes down then auto-failover will take place. Is that a good thought?

Community Manager

Re: EFM Dedicated Witness Node

Hi,

 

 

Hope I am understanding the question correctly.

If I am, let me just point to a few things:

1) If you have a Single Master with a Single Slave then you are required to have a witness node on a saperate machine, you cant have a witness on the machine running slave in this case. So staring a second EFM with its witness on the Slave will not work 

2) If you have One Master with 2 Slaves then you dont need a witness node as one of the slaves can act as witness

 

 

Moderator

Re: EFM Dedicated Witness Node

I am not sure if that is possible. You will not be able to have the Witness and Slave on the same node due to IP address binding.

Level 3 Adventurer

Re: EFM Dedicated Witness Node

Hi Deepanshu, Thanks. In our case its 1 i.e. single master & single slave and a dedicated witness node (on 3rd machine). 

We tested and found the foll.:

- If the witness node has failed and then if the Master *DB* goes down (but master *node* is up), then Slave gets promoted. So this is fine.

- If the witness has failed and then if the Master node goes down, then Slave does NOT get promoted. So this is the scenario we are trying to fix (if possible in some way).

 

Regards

Highlighted
Level 3 Adventurer

Re: EFM Dedicated Witness Node

Hi Amit,

Thanks for your reply.

Does it mean that there is no way we can handle the scenario that I have described in my reply to Deepanshu above?

 

Regards