cancel
Showing results for 
Search instead for 
Did you mean: 

Could not able to add Slave to EFM cluster

SOLVED
Level 3 Adventurer

Could not able to add Slave to EFM cluster

Hello,

 

I've a senario, where I've configured Witness, Master on EFM cluster now I'm tryin to add slave to cluser but throws below error, please hlep.

 

E571588C.PNG

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
EDB Team Member

Re: Could not able to add Slave to EFM cluster

Hi Mahisha,

 

From the below error we have noticed that efm is not able to validate the trigger file location "/finbdodbprod/edb/data." from recovery.conf. Becuase trigger file location "/finbdodbprod/edb/data." is a data directory not file and I suspect that location "/finbdodbprod/edb/" doesn't have proper permission to touch the file. I would recommend you to change the trigger file location from "/finbdodbprod/edb/data." to "/finbdodbprod/edb/data/trigger" in recovery.conf file and restart the slave server and then start the efm agent on slave server.

7 REPLIES 7
EDB Team Member

Re: Could not able to add Slave to EFM cluster

Hi Mahisha,

 

From the below error message from the EFM startup logs it seems that the efm agent trying to start as a master agent and for that reason, it is giving the error message as there is already a master node in this cluster running.

We are suspecting that there is some kind of misconfiguration in the efm.nodes/efm.properties file due to which you are facing this issue. However, to troubleshoot this issue, could you please share the below information with us.

1. Output of 'select * from pg_stat_replication;' from the master server.
2. efm.properties,efm.nodes file from the slave node
3.
 Output of efm cluster status from the master node. 

Level 3 Adventurer

Re: Could not able to add Slave to EFM cluster

Hi Kapil,

Thaks for your response, pleasee find belw screenshots and content of efm.properties, efm.noes files:

Pic1.PNGselect * from pg_stat_replication;

Pic2.PNGEFM status from Master node

Content of efm.properties file

 

# Copyright EnterpriseDB Corporation, 2013-2018. All Rights Reserved.

# Do not use quotes around property values in this file.
# All user scripts specified below are run as the failover manager user.

# The value for the password property should be the output from
# 'efm encrypt' -- do not include a cleartext password here. To
# prevent accidental sharing of passwords among clusters, the cluster
# name is incorporated into the encrypted password. If you change
# the cluster name (the name of this file), you must encrypt the
# password again with the new name.
# The db.port property must be the same for all nodes.
db.user=enterprisedb
db.password.encrypted=***********************
db.port=5010
db.database=edb

# This property tells EFM which OS user owns the $PGDATA dir for the
# 'db.database'. By default, the owner is either 'postgres' for PostgreSQL
# or 'enterprisedb' for EDB Postgres Advanced Server. However, if you
# have configured your db to run as a different user, you will need to
# copy the /etc/sudoers.d/efm-XX conf file to grant the
# necessary permissions to your db owner.
#
# This username must have write permission to the 'db.recovery.conf.dir'
# specified below.
db.service.owner=enterprisedb

# Specify the proper service name in order to use service commands rather
# than pg_ctl to start/stop/restart a database. For example, if this property
# is set, then 'service <name> restart' or 'systemctl restart <name>'
# (depending on OS version) will be used to restart the database rather than pg_ctl.
# This property is required unless db.bin is set.
db.service.name=edb-as-9.6

# Specify the directory containing the pg_ctl command, for example:
# /usr/edb/as9.6/bin. Unless the db.service.name property is used, the pg_ctl
# command is used to start/stop/restart databases as needed after a
# failover or switchover. This property is required unless db.service.name
# is set.
db.bin=/finbdodbprod/edb/bin

# Specify the location of the db recovery.conf file on the node. On
# a standby node, the trigger file location is read from the file in
# this directory. After a failover, the recovery.conf files on
# remaining standbys are changed to point to the new master db (a copy
# of the original is made first). On a master node, a recovery.conf file will
# be written during failover and promotion to ensure that the master node can
# not be restarted as the master database.
db.recovery.conf.dir=/finbdodbprod/edb/data

# Use the jdbc.sslmode property to enable ssl for EFM connections. Setting
# this property to anything but 'disable' will force the agents to use
# 'ssl=true' for all JDBC database connections (to both local and remote
# databases). Valid values are:
#
# disable - Do not use ssl for connections.
# verify-ca - EFM will perform CA verification before allowing the
# certificate.
# require - Verification will not be performed on the server certificate.
jdbc.sslmode=disable

# Email address(es) for notifications. The value of this property must
# be the same across all agents. Multiple email addresses must
# be separated by space. If using a notification script instead,
# this property can be left blank.
user.email=manisha.naik@trianz.com

# Minimum severity level of notifications that will be sent by the agent.
# The minimum level also applies to the notification script (below).
# Valid values are INFO, WARNING, and SEVERE. A list of notifications is
# grouped by severity in the user's guide.
notification.level=INFO

# Absolute path to script run for user notifications.
#
# This is an optional user-supplied script that can be used for
# notifications instead of email. This is required if not using
# email notifications. Either/both can be used. The script will
# be passed two parameters: the message subject and the message body.
script.notification=

# This property specifies the ip address and port that jgroups
# will bind to on this node. The value is of the form <ip>:<port>.
# Note that the port specified here is used for communicating with
# other nodes, and is not the same as the admin.port below, used
# only to communicate with the local agent to send control signals.
# For example, <provide_your_ip_address_here>:7800
bind.address=192.168.216.108:7800

# This property controls the port binding of the administration server
# which is used for some commands (ie cluster-status). The default
# is 7809; you can modify this value if the port is already in use.
admin.port=7809

# Specifies whether or not this is a witness node. Witness nodes
# do not have local databases running.
is.witness=false

# These properties apply to the connection(s) EFM uses to monitor
# the local database. Every 'local.period' seconds, a database check
# is made in a background thread. If the main monitoring thread does
# not see that any checks were successful in 'local.timeout' seconds,
# then the main thread makes a final check with a timeout value
# specified by the 'local.timeout.final' value. All values are in seconds.
# Whether EFM uses single or multiple connections for database checks
# is controlled by the 'db.reuse.connection.count' property.
local.period=10
local.timeout=60
local.timeout.final=10

# Timeout for a call to check if a remote database is responsive.
# For example, this is how long a standby would wait for a
# DB ping request from itself and the witness to the master DB
# before performing failover.
remote.timeout=10

# The total amount of time in seconds to wait before determining that
# a node has failed or been disconnected from this node.
#
# The value of this property must be the same across all agents.
node.timeout=50

# Shut down the database after a master agent detects that it has
# been isolated from the majority of the efm cluster. If set to
# true, efm will stop the database before running the
# 'script.master.isolated' script, if a script is specified.
stop.isolated.master=true

# Attempt to shut down a failed master database after EFM can no longer
# connect to it. This can be used for added safety in the case a
# failover is caused by a failure of the network on the master node.
# If specified, a 'script.db.failure' script is run after this attempt.
stop.failed.master=true

# This is the address of a well-known server that EFM can ping in an
# effort to determine network reachability issues. It might be the IP
# address of a nameserver within your corporate firewall or another server
# that *should* always be reachable via a 'ping' command from each of the
# EFM nodes.
#
# There are many reasons why this node might not be considered reachable:
# firewalls might be blocking the request, ICMP might be filtered out,
# etc.
#
# Do not use the IP address of any node in the EFM cluster (master, standby,
# or witness because this ping server is meant to provide an additional
# layer of information should the EFM nodes lose sight of each other.
#
# The installation default is Google's DNS server.
pingServerIp=192.168.216.108

# This command will be used to test the reachability of certain nodes.
#
# Do not include an IP address or hostname on the end of this command - it
# will be added dynamically at runtime with the values contained in 'virtualIp'
# and 'pingServer'.
#
# Make sure this command returns reasonably quickly - test it from a shell
# command line first to make sure it works properly.
pingServerCommand=/bin/ping -q -c3 -w5

# Have the first node started automatically add the addresses from
# its .nodes file to the allowed host list. This will make it
# faster to start the cluster when the initial set of hosts
# is already known.
auto.allow.hosts=true

# When set to true, EFM will not rewrite the .nodes file whenever new nodes
# join or leave the cluster. This can help starting a cluster in the cases
# where it is expected for member addresses to be mostly static, and combined
# with 'auto.allow.hosts' makes startup easier when learning failover manager.
stable.nodes.file=true

# This property controls how many times a database connection is reused
# before creating a new one. If set to zero, a new connection will be
# created every time an agent pings its local database.
db.reuse.connection.count=5

# Whether or not failover will happen automatically when the master
# fails. Set to false if you want to receive the failover notifications
# but not have EFM actually perform the failover steps.
# The value of this property must be the same across all agents.
auto.failover=true

# After a standby is promoted, failover manager will attempt to
# update the remaining standbys to use the new master. Failover
# manager will back up recovery.conf, change the host parameter
# of the primary_conninfo entry, and restart the database. The
# restart command is contained in either the efm_db_functions or
# efm_root_functions file; default when not running db as an os
# service is: "pg_ctl restart -m fast -w -t <timeout> -D <directory>"
# where the timeout is the local.timeout property value and the
# directory is specified by db.recovery.conf.dir. To turn off
# automatic reconfiguration, set this property to false.
auto.reconfigure=true

# A standby with this set to false will not be added to the failover
# priority list, and so will not be available for promotion. The
# property will be used whenever an agent starts as a standby or
# resumes as a standby after being idle. After startup/resume, the
# node can still be added or removed from the priority list with the
# 'efm set-priority' command. This property is required for all
# non-witness nodes.
promotable=true

# Instead of setting specific standbys as being unavailable for
# promotion, this property can be used to set a minimum number
# of standbys that will not be promoted. Set to one, for example,
# promotion will not happen if it will drop the number of standbys
# below this value. This property must be the same on each node.
minimum.standbys=0

# Time in seconds between checks to see if a promoting database
# is out of recovery.
recovery.check.period=2

# Period in seconds for IDLE agents to try to resume monitoring after
# a database failure. Set to 0 for agents to not try to resume
# (in which case the 'efm resume <cluster>' command is used after
# bringing a database back up).
auto.resume.period=60

# These properties specify the IP and prefix length that will be remapped during
# failover. If you do not use a VIP as part of your failover solution, leave
# the virtualIp property blank to disable Failover Manager support for VIP processing
# (assigning, releasing, testing reachability, etc).
#
# If you specify a VIP, the interface and prefix are required.
#
# By default, the virtualIp and virtualIp.prefix values must be the same
# across all agents. If you set virtualIp.single to false, you can specify unique values
# for virtualIp and virtualIp.prefix on each node.
#
# If you are using an IPv4 address, the virtualIp.interface value
# should not contain a secondary virtual ip id (do not include ":1", etc).
virtualIp=
virtualIp.interface=
virtualIp.prefix=
virtualIp.single=true

# Absolute path to load balancer scripts
#
# The attach script is called when a node should be attached to the load
# balancer, for example after a promotion. The detach script is called
# when a node should be removed, for example when a database has failed
# or is about to be stopped. Use %h to represent the IP/hostname
# of the node that is being attached/detached.
#
# Example:
# script.load.balancer.attach=/somepath/attachscript %h
script.load.balancer.attach=
script.load.balancer.detach=

# Absolute path to fencing script run during promotion
#
# This is an optional user-supplied script that will be run during
# failover on the standby database node. If left blank, no action will
# be taken. If specified, EFM will execute this script before promoting
# the standby.
#
# Parameters can be passed into this script for the failed master and
# new primary node addresses. Use %p for new primary and %f for failed
# master. On a node that has just been promoted, %p should be the same
# as the node's efm binding address.
#
# Example:
# script.fence=/somepath/myscript %p %f
#
# NOTE: FAILOVER WILL NOT OCCUR IF THIS SCRIPT RETURNS A NON-ZERO EXIT CODE.
script.fence=

# Absolute path to fencing script run after promotion
#
# This is an optional user-supplied script that will be run after
# failover on the standby node after it has been promoted and
# is no longer in recovery. The exit code from this script has
# no effect on failover manager, but will be included in a notification
# sent after the script executes.
#
# Parameters can be passed into this script for the failed master and
# new primary node addresses. Use %p for new primary and %f for failed
# master. On a node that has just been promoted, %p should be the same
# as the node's efm binding address.
#
# Example:
# script.post.promotion=/somepath/myscript %f %p
script.post.promotion=

# Absolute path to resume script
#
# This script is run before an IDLE agent resumes
# monitoring its local database.
script.resumed=

# Absolute path to script run after database failure
#
# This is an optional user-supplied script that will be run after
# an agent detects that its local database has failed.
script.db.failure=

# Absolute path to script run on isolated master
#
# This is an optional user-supplied script that will be run after
# a master agent detects that it has been isolated from the majority
# of the efm cluster.
script.master.isolated=

# Absolute path to script invoked on non-promoting agent nodes
# before a promotion.
#
# This optional user-supplied script will be invoked on other agents
# when a node is about to promote its database. The exit code from this script has
# no effect on Failover Manager, but will be included in a notification
# sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new primary node address.
#
# Example:
# script.remote.pre.promotion=/path_name/script_name %p
script.remote.pre.promotion=

# Absolute path to script invoked on non-master agent nodes
# after a promotion.
#
# This optional user-supplied script will be invoked on nodes
# (except the new master) after a promotion occurs. The exit code from this script has
# no effect on Failover Manager, but will be included in a notification
# sent after the script executes.
#
# Pass a parameter (%p) with the script to identify the new primary node address.
#
# Example:
# script.remote.post.promotion=/path_name/script_name %p
script.remote.post.promotion=

# Absolute path to a custom monitoring script.
#
# Use script.custom.monitor to specify the location and name of an optional
# user-supplied script that will be invoked periodically to perform custom
# monitoring tasks. A non-zero exit value means that a check has failed;
# this will be treated as a database failure. On a master node, script
# failure will cause a promotion. On a standby node script failure will
# generate a notification and the agent will become IDLE.
#
# The custom.monitor.* properties are required if a custom monitoring script
# is specified:
#
# custom.monitor.interval is the time in seconds between executions of the script.
#
# custom.monitor.timeout is a timeout value in seconds for how long the script
# will be allowed to run. If script execution exceeds the specified time, the
# task will be stopped and a notification sent. Subsequent runs will continue.
#
# If custom.monitor.safe.mode is set to true, non-zero exit codes from the
# script will be reported but will not cause a promotion or be treated as a
# database failure. This allows testing of the script without affecting EFM.
#
script.custom.monitor=
custom.monitor.interval=
custom.monitor.timeout=
custom.monitor.safe.mode=

# Command to use in place of 'sudo' if desired when efm runs
# the efm_db_functions, efm_root_functions, or efm_address scripts.
# Sudo is used in the following ways by efm:
#
# sudo /usr/edb/efm-<version>/bin/efm_address <arguments>
# sudo /usr/edb/efm-<version>/bin/efm_root_functions <arguments>
# sudo -u <db service owner> /usr/edb/efm-<version>/bin/efm_db_functions <arguments>
#
# 'sudo' in the first two examples will be replaced by the value of the sudo.command
# property. 'sudo -u <db service owner>' will be replaced by the value of the
# sudo.user.command property. The '%u' field will be replaced with the db owner.
sudo.command=sudo
sudo.user.command=sudo -u %u

# Specify the directory of lock file on the node. Failover Manager creates a
# file named <cluster>.lock at this location to avoid starting multiple agents
# for same cluster. If the path does not exist, Failover Manager will attempt
# to create it. If not specified defaults to '/var/lock/efm-<version>'
lock.dir=

# Specify the directory of agent logs on the node. If the path does not exist,
# Failover Manager will attempt to create it. If not specified defaults to
# '/var/log/efm-<version>'. (To store Failover Manager
# startup logs in a custom location, modify the path in the service script
# to point to an existing, writable directory.)
# If using a custom log directory, you must configure logrotate separately.
# Use 'man logrotate' for more information.
log.dir=

# Logging levels for JGroups and EFM.
# Valid values are: FINEST, FINER, FINE, CONFIG, INFO, WARNING, SEVERE
# Default value: INFO
# It is not necessary to increase these values unless debugging a specific
# issue. If nodes are not discovering each other at startup, increasing
# the jgroups level to FINER will show information about the TCP connection
# attempts that may help diagnose the connection failures.
jgroups.loglevel=INFO
efm.loglevel=INFO

# Extra information that will be passed to the JVM when starting the agent.
jvm.options=-Xmx32m

 

 

Content of efm.nodes file:

 

# List of node address:port combinations separated by whitespace.
# The list should include at least the membership coordinator's address.
192.168.216.107:7800

 

 

EDB Team Member

Re: Could not able to add Slave to EFM cluster

Hi Mahisha,

 

Thank you for providing the requested information.

 

Everything looks fine from the nodes and propeities file and replication is working as expected. Could you please see any duplicate recovery.conf files/recovery.done in the data directory location of standby server that needs to be moved/rename, it should have file with name recovery.conf file. If their is no duplicate files. Could you please run the below command from the root and start the efm.

 

From root:  updatedb 

Highlighted
Level 3 Adventurer

Re: Could not able to add Slave to EFM cluster

Hi Kapil,

Thank you so much for quick respose, I've reconfigured the salve node, and restarted replication now I see only recovery.conf. I've tried to start efm and facing the belwo error, you thoughts/suggesstion are much appreciated.

 

Pic4.PNG

Pic5.PNG

EDB Team Member

Re: Could not able to add Slave to EFM cluster

Hi Mahisha,

 

From the below error we have noticed that efm is not able to validate the trigger file location "/finbdodbprod/edb/data." from recovery.conf. Becuase trigger file location "/finbdodbprod/edb/data." is a data directory not file and I suspect that location "/finbdodbprod/edb/" doesn't have proper permission to touch the file. I would recommend you to change the trigger file location from "/finbdodbprod/edb/data." to "/finbdodbprod/edb/data/trigger" in recovery.conf file and restart the slave server and then start the efm agent on slave server.

Level 3 Adventurer

Re: Could not able to add Slave to EFM cluster

Hi Kapil,

 

Thank you very mcuch for the solution. Now I'm able to add Slave nodes on EFM cluter.

EDB Team Member

Re: Could not able to add Slave to EFM cluster

Hi Mahisha,

 

Thank you for the update.

 

Glad to hear that it worked for you and now your able to add the standby nodes on EFM cluster. Please let us know if you need any other information/assistance.