In this article we are going to see how database servers work together to allow second server to take over quickly if the primary server fails(high availability), and to allow multiple servers to serve the same data for SELECTs(horizontal read scalability).
Since both functionalities are not available in one product, we need to rely on external tools to provide this functionality and integrate them. EDB Postgres Failover Manager (EFM) is high-availability tool to monitor the health of Postgres Streaming Replication clusters to verify database failures quickly and automatically promote a standby node as primary without any data loss and with minimal service interruption(connection rollback). PgPool-II(pgpool) provides connection pooling and load balancing for horizontal scalability of SELECT queries on multiple standbys.
We need EDB Failover Manager (EFM) and PgPool-II functionalities to achieve high-availability and horizontal read scalability.
IMO, it would have been difficult to integrate EFM & PgPool, if the latest version of EFM 3.2 has not introduced Load Balancer Hooks(Thanks to EFM team). About EFM 3.2, it supports multiple ways to connect applications with the new master database after a failover or promotion. With latest version, two new script hooks introduced to execute before and after a database event of switchover or failover or standby node failure. These EFM new hooks can be used to update the PgPool and other load balancer configuration. All we need, write a simple script with set of commands to manipulate load balancer configuration on the database status and set the script in EFM load balancer hooks(script.load.balancer.attach / script.load.balancer.detach) in cluster properties file. When EFM takes action on any database event, these scripts hooks will be executed automatically and run load balancer configuration commands set in script to update the database node status. By this way, integration is transparent, well automated and with NO manual intervention in case of any database events.
Watch a short video demonstrating the integration with few testing scenarios. (https://youtu.be/pgZJiXa3SoY)
How EFM & PgPool Integration works ?
EFM Load Balancer script hooks(script.load.balancer.attach / script.load.balancer.detach) are called by “EFM” user for every database event (down/up/promotion) before or after the event accordingly. A simple script should be created with PgPool PCP unix commands in it to update the PgPool configuration. When EFM performs Switchover/Failover, a node role will be changed and “pcp_promote_node” command can be used to update PgPool-II cluster for new master similarly for any node down or up “pcp_attach_node” & “pcp_detach_node” command can be used to add/remove node from PgPool Cluster. Remember, since EFM script hooks call PCP unix commands we MUST configure password-less authentication between database and PgPool Node to avoid interruption of Switchover/Failover.
Note: When pgpool pcp_promote_node command exeucted to update a New Master information in an event of Switchover/Failover performed by EFM, the connections which are established to old Master or Standby nodes will be disconnected and reconnected to new master after promotion.
Below are the list of scenarios tested in 1 Master and 2 Standby nodes architecture.
Note: Above scenarios are tested with EFM 3.2/PgPool-II 3.7/EPAS 10.x versions, behavior may vary if implemented with other versions. This article do not encourage to implement the architecture in production without thorough testing.
Let's proceed with setup.
In this section, we will cover the installation of all components required on each server as per the architecture. We are going to use RPM method of installation for all the components. Refer to the documentation link tagged with each component to complete the installation.
On PgPool Node:
On 3 Database Nodes:
In this section, we are going to cover ONLY very important configuration required for EFM & PgPool integration. Streaming Replication & EFM configuration are not covered in details for the clarity reason of the article.
db.user=enterprisedb db.password.encrypted=<encrypted password> db.port=5444 db.database=edb db.service.owner=enterprisedb db.service.name=edb-as-10.service db.recovery.conf.dir=/var/lib/edb/as10/data user.email=<email address> bind.address=MASTER_IP:7800 //Masked the IP address admin.port=7809 is.witness=false auto.allow.hosts=true stable.nodes.file=true auto.failover=true auto.reconfigure=true promotable=true script.load.balancer.attach=/path/to/script/location/efm_loadbalancer_attach.sh %h script.load.balancer.detach=/path/to/script/location/efm_loadbalancer_detach.sh %h efm.loglevel=FINE
Note: Change Node IP address as per Master/Standby 1/Standby 2 and rest all remain same on all the nodes.
backend_hostname0 = 'MASTER_IP' backend_port0 = 5444 backend_weight0 = 1 backend_data_directory0 = '/var/lib/edb/as10/data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'STANDBY_IP_1' backend_port1 = 5444 backend_weight1 = 1 backend_data_directory1 = '/var/lib/edb/as10/data' backend_flag1 = 'ALLOW_TO_FAILOVER' backend_hostname2 = 'STANDBY_IP_2' backend_port2 = 5444 backend_weight2 = 1 backend_data_directory2 = '/var/lib/edb/as10/data' backend_flag2 = 'ALLOW_TO_FAILOVER' enable_pool_hba = on load_balance_mode = on sr_check_user = 'enterprisedb' sr_check_password = '<enterprisedb user password>' sr_check_database = 'edb' health_check_user = 'enterprisedb' health_check_password = '<enterprisedb user password>' health_check_database = 'edb' delay_threshold = <adjust as per the Master/Standby delay> fail_over_on_backend_error = off search_primary_node_timeout = 3Note: Above parameters are mostly relevant to EFM & PgPool integration. It may vary if you want to add PgPOOL HA(Watchdog).
#su - efm -bash-4.2$ pwd /var/efm/
echo "PGPOOL_HOSTIP:PCPPORT:PCPUSER:PCPPASSWORD" > ~/pcppass Eg:- -bash-4.2$ more pcppass 172.16.23.23:9898:enterprisedb:edb
-bash-4.2$ chmod 0600 ~/pcppass
$ export PCPPASSFILE=/var/efm/pcppass $ /usr/edb/pgpool3.6/bin/pcp_node_count -h <pgpool_node_ip> -U enterprisedb -p 9898 -wNote: Above PCP configuration steps should be followed on each database node. All nodes should pass the password-less authentication for a successful PgPool integration.
#su - efm -bash-4.2$ pwd /var/efm/
-bash-4.2$ mkdir efm-scripts -bash-4.2$ cd efm-scripts
Start the Services
After installation/configuration, start the services of each components involved in architecture.
Two EFM Load Balancer Scripts:
Two user-friendly scripts "efm_loadbalancer_attach.sh” and “efm_loadbalancer_detach.sh” written in BASH and they are wrapped with PCP unix commands which are executed by EFM Load balancer attach/detach hooks as “efm” user. Refer to GitHub Repository for sample configuration files and scripts. Please feel free to clone & use the scripts, if there's a scope of improvement or correction then don't hestitate to raise issue in the repository.
Note: Including script content in this article may risk of missing the real context of the article by readers, hence those details are moved to GitHub Repository Page.
After completing all the above steps, let's run few command and see the outputs from PgPool Load balancer for an EFM database event.
Consistent State Before Testing:
After configuring Asynchronous Streaming Replication, EDB Failover Manager Cluster & PgPool-II load balancing, below output shows the nodes status in PgPool.
[root@frontend-server ~]# /bin/psql -p 9999 -U enterprisedb edb -c 'show pool_nodes;' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+---------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 |MASTER_IP | 5444 | up | 0.333333 |primary | 2 | true | 0 1 | STANDBY1_IP | 5444 | up | 0.333333 | standby | 10 | false | 0 2 | STANDBY2_IP | 5444 | up | 0.333333 | standby | 6 | false | 0 (3 rows)
PgPool Nodes Status after Switchover
In EFM, switchover can be performed using “efm promote <cluster name> -quiet -switchover” command. One of the priority standby will take the role of primary when performed switchover and old primary will be attached as standby to new primary. During this EFM operation, load balancer hooks will connect to pgpool cluster via pcp command to manipulate the nodes roles.
[root@frontend-server ~]# /bin/psql -p 9999 -U enterprisedb edb -c 'show pool_nodes;' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+---------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | STANDBY1_IP| 5444 | up | 0.333333 |standby | 2 | false | 0 1 |MASTER_IP | 5444 | up | 0.333333 | primary | 10 | true | 0 2 | STANDBY2_IP |5444 | up | 0.333333 | standby | 6 | false | 0 (3 rows)
Note: Above result after the Switchover from the consistent state.
PgPool Nodes Status after Failover
In EFM, manual failover can be performed using “efm promote <cluster name>” command. One of the priority standby will take the role of primary when performed failover and old primary will be detached. During this EFM operation, load balancer hooks will connect to pgpool cluster via pcp command to manipulate the nodes roles.
[root@frontend-server ~]# /bin/psql -p 9999 -U enterprisedb edb -c 'show pool_nodes;' node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+---------------+------+--------+-----------+---------+------------+-------------------+------------------- 0 |MASTER_IP | 5444 | down | 0.333333 |standby | 2 | false | 0 1 | STANDBY1_IP | 5444 | up | 0.333333 | primary | 10 | true | 0 2 | STANDBY2_IP | 5444 | up | 0.333333 | standby | 6 | false | 0 (3 rows)
Note: Above result after the manual EFM Failover from the consistent state.
That's all. Hope it helps.