Setting the master server (server1) to
backend_weight0 = 0
does not work b/c it sends all traffiic to the standby server (server2). Again, the problem is that the blacklisted functions "nextval" for example is being sent to the standby under load. I even tried to add the additional blacklist_pattern
black_query_pattern_list = 'SELECT\s.*_seq\..*'
in an attempt to force the query.
Thanks for the update provided on the issue.
We need to know the background of how 'SELECT nextval()' is getting called from application end.
We are able to reproduce this issue in the case where 'nextval()' is wrapped by function along with blacklist function provided by you and if the 'nextval()' as a standalone function call, it went on primary only.
So to analyze the usage of 'nexval()' function call would be useful here.
Kindly let us know the further information as requested above.
We will have our developers look at the code surrounding "nextval". In the meantime, can you provide an example of this? Most of our calls to nextval are done as insert triggers to a table for example,
create function table_trigger() returns trigger security definer language edbspl as $$ BEGIN IF :NEW.ID IS NULL THEN SELECT TABLE_SEQ.NEXTVAL INTO :NEW.ID FROM DUAL; END IF; END $$;
But these would be insert queries so I do not think it would apply
So we are not wrapping the nextval function but we are calling it from our application using the following pattern
SELECT UNSUCCESSFULLOGINATTEMPTS_SEQ.nextval As entityId FROM Dual
SELECT recurringrules_seq.nextval INTO series FROM Dual;
The above was taken directly from the edb server logs.
That said, I added the additional black_list_pattern which should have worked?
I think I may have stumbled upon what is going on and its has something to do with connection pooling at the .net client but I need further testing to confirm. What I believe is happening is that connections in the client pool are being reused that have estabished (idle) sessions connected to the standby DB. My theory is that a connection/session is load balanced to the standby then returned to the client connection pool. When another process resues that connection and issues a nextval statement it is not reevaluated by pgpool and sent directly to the standby. Considering we only get these errors when we increase the load on the application of when the connections in the pool are reused faster without time close the connection. We also do not get errors when we set
disable_load_balance_on_write = always
In our current configuration we have pooling enabled at the client and at pgpool. To test this theory, I am going to
1. Disable connection pooling at pgpool and enable at client
2. Disable connection pooling at client and enable at pgpool
3. Drop idle sessions immediately
What are your thoughts on this theory?
We have gone through the details about nexval() function call. Plesae find the below suggesstion.
create function table_trigger() returns trigger security definer language edbspl as $$ BEGIN IF :NEW.ID IS NULL THEN SELECT TABLE_SEQ.NEXTVAL INTO :NEW.ID FROM DUAL; END IF; END $$;Then that function itself should be black listed.
SELECT UNSUCCESSFULLOGINATTEMPTS_SEQ.nextval As entityId FROM Dual and SELECT recurringrules_seq.nextval INTO series FROM Dual;
black_query_pattern_list = 'SELECT\s.*_seq\..*'
We suspect that this is failing to have pattaern match SQL template provided above, regex could be someting like 'SELECT *_seq\.*'
Let us know if this is helpful for you.
Sure, let's have some test for the above theory you described.
I would like to add one more point is , just disable connection pooling from both application and pgpool level and try to load balance the SQLs which are causing issue with connection pooling enabled. So it will give us fair idea about does connection pooling come in picture/causing issue while load balancing.
Another thing is you said that if isable_load_balance_on_write set to 'always' then everything goes well which is obvious. Because, if you set this parameter to 'always', once pgpool found write SQL, subsequent read queries are not load balanced until the session ends regardless they are in explicit transactions or not.
Also please share your thoughts on suggestions were made in previous post. Let us know in case of any issues.
I have tested all scenarios and unfortunately in all cases some of the nextval queries are being sent to the standby. For example, on app login we perform a seq.nextval to insert into the transactions table (SELECT transactions_seq.nextval as newItemId FROM Dual) as seen by the edb log file. If that call fails you will be denied access. When you attempt to login it works randomly and when it fails we see the above query in the standy db server logs.
I have also verified the regEx pattern using https://regex101.com/ Moreover, I used a broader pattern of "SELECT.*" which would match everything thus nothing is sent to the standby and it fails. I have learned that the exclusion pattern has to match the entire query not just a piece of it.
This feature feels like its just not ready for primetime. I am not sure how other are using it in production successfully under load?
Below is the test case I performed.
Step 1: Initially all call to currval() was load balanced (sending them to standby)
Step 2: Added the belwo regex in blacklist function
black_function_list = '[A-Za-z]*currval'
Step 3: Reload/Restarted pgpool to reflect changes.
Step 4: Executed SQL 'select currval('testing_id_seq') from testing;' via pgpool. This is now executing on primary, which was executing on slave before adding to blacklist.
Let us know if this is helpful
I will test this but for your step 1 did you have
black_function_list = 'currval,nextval,lastval,setval'
or was it empty?
I ask b/c regEx wise '[A-Za-z]*currval' and 'currval' will both match
select table.table_seq.currval as if from dual;
I will test and post back. Again, we are finding that load balancing works but under load we are seeing nextval being sent to the standby