cancel
Showing results for 
Search instead for 
Did you mean: 

EFM startup error

SOLVED
Level 2 Adventurer

EFM startup error

Hi 

have cluster  master,standby,witness.

after failover cant start efm on stanby and master

 

getting same error  authentication failed  on both nodes

 

 

where to it is tryig to authenticate ?

 

2019-09-12 17:45:16 com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut='PING  xxxxxxx.1 (xxxxxx.1) 56(84) bytes of data.

--- xxxxxxx.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.411/0.618/1.002/0.272 ms'}
2019-09-12 17:45:16 com.enterprisedb.efm.admin.AdminServer run INFO: AdminServer starting on port: 7809
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode doStartup INFO: Starting
2019-09-12 17:45:16 com.enterprisedb.efm.Environment parseHostList INFO: Host list: [xxxxxxxxx:9998,xxxxxxxx.72:9998, xxxxxx:9998]

2019-09-12 17:45:16 com.enterprisedb.efm.exec.ExecUtil performExec INFO: ProcessResult{exitValue=0, errorOut='', stdOut='PING xxxxxx.1 (xxxxxx.1) 56(84) bytes of data.

--- 172.23.238.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.411/0.618/1.002/0.272 ms'}
2019-09-12 17:45:16 com.enterprisedb.efm.admin.AdminServer run INFO: AdminServer starting on port: 7809
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode doStartup INFO: Starting
2019-09-12 17:45:16 com.enterprisedb.efm.Environment parseHostList INFO: Host list: [xxxxxx.67:9998, xxxxx.72:9998, xxxxxx.66:9998]
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmAgent run ERROR: Exception starting service: java.lang.SecurityException: authentication failed
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode commonShutdown INFO: Starting shutdown.
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode setNodeState INFO: New internal state: SHUTDOWN
2019-09-12 17:45:16 com.enterprisedb.efm.admin.AdminServer shutdown INFO: Stopping AdminServer...
2019-09-12 17:45:16 org.jgroups.protocols.pbcast.GMS warn WARN: ocppsqldb02-9251(xxxxxxx.66): leave() should not be invoked on an instance of org.jgroups.protocols.pbcast.ClientGmsImpl

2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode commonShutdown INFO: Starting shutdown.
2019-09-12 17:45:16 com.enterprisedb.efm.nodes.EfmNode setNodeState INFO: New internal state: SHUTDOWN
2019-09-12 17:45:16 com.enterprisedb.efm.admin.AdminServer shutdown INFO: Stopping AdminServer...
2019-09-12 17:45:16 org.jgroups.protocols.pbcast.GMS warn WARN: ocppsqldb02-9251(172.23.238.66): leave() should not be invoked on an instance of org.jgroups.protocols.pbcast.ClientGmsImpl

 

 

Tags (1)
1 ACCEPTED SOLUTION

Accepted Solutions
EDB Team Member

Re: EFM startup error

If you have startup issues, check the startup log. Am guessing you have something like this in there:

 

9/12/19 6:11:04 PM There was an error starting service: authentication failed

9/12/19 6:11:04 PM If other nodes are already running in the cluster, please verify that the address for this node is on the allowed node host list.

 

If there's a node already running in the cluster, other nodes can't join unless they're on the allowed list. You can add them with 'efm allow-node' or, more simply, have them in the cluster.nodes file and set this:

 

# Have the first node started automatically add the addresses from
# its .nodes file to the allowed host list. This will make it
# faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=false

 

More info from the logs would help, but I think you're running into something like this.

 

Cheers,

Bobby

 

4 REPLIES 4
EDB Team Member

Re: EFM startup error

If you have startup issues, check the startup log. Am guessing you have something like this in there:

 

9/12/19 6:11:04 PM There was an error starting service: authentication failed

9/12/19 6:11:04 PM If other nodes are already running in the cluster, please verify that the address for this node is on the allowed node host list.

 

If there's a node already running in the cluster, other nodes can't join unless they're on the allowed list. You can add them with 'efm allow-node' or, more simply, have them in the cluster.nodes file and set this:

 

# Have the first node started automatically add the addresses from
# its .nodes file to the allowed host list. This will make it
# faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=false

 

More info from the logs would help, but I think you're running into something like this.

 

Cheers,

Bobby

 

Level 2 Adventurer

Re: EFM startup error

yes you were right about the another node is up and running.

i didnt check the witness node efm.node

there was only one ip

why efm deletes sometimes the content of the efm.node ? 

EDB Team Member

Re: EFM startup error


@EduardR wrote:

why efm deletes sometimes the content of the efm.node ? 


 

It's not "sometimes" though it may seem that way. By default, an efm agent keeps the current cluster members in the .nodes file. If you start agent A with no other agents running, A will empty out the file and then refill it as other nodes join. It's done this way to handle more dynamic situations (e.g. cloud). The only point of that file is to direct nodes starting to existing nodes in the cluster -- if there's just 1 or 2 (and there used to be more), then ideally the file has just those 1 or 2. But really those changes are just to try to help.

 

There's another prop that might help you:

 

# When set to true, EFM will not rewrite the .nodes file whenever new nodes
# join or leave the cluster. This can help starting a cluster in the cases
# where it is expected for member addresses to be mostly static, and combined
# with 'auto.allow.hosts' makes startup easier when learning Failover Manager.
stable.nodes.file=false

 

These are long, but will answer the question about how/why efm handles the .nodes files (if I had more time would get you the exact timestamps, sorry).

 

Bobby

 

Highlighted
EDB Team Member

Re: EFM startup error

 

These are long, but will answer the question about how/why efm handles the .nodes files (if I had more time would get you the exact timestamps, sorry).


 My links didn't go through before. Trying again:

 

https://www.youtube.com/watch?v=cjFWMqQmNKA

https://www.youtube.com/watch?v=Hujc9OQfeLE

 

This video demonstrates how to start an EDB Postgres Failover Manager 3.0 cluster.
This video demonstrates how to install and set up EDB Postgres Failover Manager 3.0. It includes a walk through of all of the EDB Postgres Failover Manager properties that can be set, including how to upgrade from a previous installation.