cancel
Showing results for 
Search instead for 
Did you mean: 

Error while starting standby - PANIC: WAL contains references to invalid pages

SOLVED
Level 2 Adventurer

Error while starting standby - PANIC: WAL contains references to invalid pages

Hi guys,

 

We have a client who is having problems when the standby is being started.

The standby was stopped to reboot the server. When the cluster service was restarted, the WALs files started to be restored from the archive, a consistency point was reached, and the restore of the WALs continued until the error "PANIC: WAL contains references to invalid pages" occurred.

 

Follows a part of the log.

 

2018-11-27 16:58:59 -02 [122610]: [49-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=00000 LOG:  restored log file "0000000200004C270000000D" from archive
2018-11-27 16:59:00 -02 [123043]: [1-1] user=postgres@10.0.100.21(33464),db=postgres remoto=10.0.100.21 Sessao=5bfd93f4.1e0a3 T_ID=0 VT_ID= tag= sqlstate=57P03 FATAL:  the database system is starting up
2018-11-27 16:59:00 -02 [123044]: [1-1] user=postgres@[local],db=template1 remoto=[local] Sessao=5bfd93f4.1e0a4 T_ID=0 VT_ID= tag= sqlstate=57P03 FATAL:  the database system is starting up
receiving incremental file list
0000000200004C270000000E

sent 30 bytes  received 16,779,379 bytes  11,186,272.67 bytes/sec
total size is 16,777,216  speedup is 1.00
2018-11-27 16:59:01 -02 [122610]: [50-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=00000 LOG:  restored log file "0000000200004C270000000E" from archive
2018-11-27 16:59:01 -02 [123051]: [1-1] user=postgres@[local],db=template1 remoto=[local] Sessao=5bfd93f5.1e0ab T_ID=0 VT_ID= tag= sqlstate=57P03 FATAL:  the database system is starting up
receiving incremental file list
0000000200004C270000000F

sent 30 bytes  received 16,779,379 bytes  33,558,818.00 bytes/sec
total size is 16,777,216  speedup is 1.00
2018-11-27 16:59:01 -02 [122610]: [51-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=00000 LOG:  restored log file "0000000200004C270000000F" from archive
2018-11-27 16:59:01 -02 [122610]: [52-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=00000 LOG:  consistent recovery state reached at 4C27/F2159B8
2018-11-27 16:59:01 -02 [122600]: [4-1] user=@,db= remoto= Sessao=5bfd93cd.1dee8 T_ID=0 VT_ID= tag= sqlstate=00000 LOG:  database system is ready to accept read only connections
receiving incremental file list
0000000200004C2700000010

sent 30 bytes  received 16,779,379 bytes  33,558,818.00 bytes/sec
total size is 16,777,216  speedup is 1.00
2018-11-27 16:59:02 -02 [122610]: [53-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=00000 LOG:  restored log file "0000000200004C2700000010" from archive
2018-11-27 16:59:02 -02 [122610]: [54-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=01000 WARNING:  page 1277624320 of relation base/16400/741450880 does not exist
2018-11-27 16:59:02 -02 [122610]: [55-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=01000 CONTEXT:  xlog redo Btree/DELETE: 49 items
2018-11-27 16:59:02 -02 [122610]: [56-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=XX000 PANIC:  WAL contains references to invalid pages
2018-11-27 16:59:02 -02 [122610]: [57-1] user=@,db= remoto= Sessao=5bfd93ce.1def2 T_ID=0 VT_ID=1/0 tag= sqlstate=XX000 CONTEXT:  xlog redo Btree/DELETE: 49 items
2018-11-27 16:59:03 -02 [122600]: [5-1] user=@,db= remoto= Sessao=5bfd93cd.1dee8 T_ID=0 VT_ID= tag= sqlstate=00000 LOG:  startup process (PID 122610) was terminated by signal 6: Aborted
2018-11-27 16:59:03 -02 [122600]: [6-1] user=@,db= remoto= Sessao=5bfd93cd.1dee8 T_ID=0 VT_ID= tag= sqlstate=00000 LOG:  terminating any other active server processes
2018-11-27 16:59:03 -02 [123058]: [1-1] user=postgres@10.0.100.21(33470),db=postgres remoto=10.0.100.21 Sessao=5bfd93f5.1e0b2 T_ID=0 VT_ID=2/0 tag=idle sqlstate=57P02 WARNING:  terminating connection because of crash of another server process
2018-11-27 16:59:03 -02 [123058]: [2-1] user=postgres@10.0.100.21(33470),db=postgres remoto=10.0.100.21 Sessao=5bfd93f5.1e0b2 T_ID=0 VT_ID=2/0 tag=idle sqlstate=57P02 DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2018-11-27 16:59:03 -02 [123058]: [3-1] user=postgres@10.0.100.21(33470),db=postgres remoto=10.0.100.21 Sessao=5bfd93f5.1e0b2 T_ID=0 VT_ID=2/0 tag=idle sqlstate=57P02 HINT:  In a moment you should be able to reconnect to the database and repeat your command.

This is already the second time that this error occurs in less than 15 days. In the 2 times, the service was stopped and after some time, reinitiated.

 

The standy service was restarted a second time after the error, and it occurred again, but in a later WAL segment. The first error was while applying the segment 0000000200004C2700000010, and on the second attempt at 0000000200004C2700000014.

 

The client is using version 9.5.14.20 of ppas.

 

I found this thread in the community list. Can this be related?

 

https://www.postgresql.org/message-id/flat/CAFh8B%3DkvY2BYgikf%3DtpjUGD2h0rsm8GosOeU4TeYR%2BQ6YK-mjw...

 

The shared_preload_libraries parameter is configured in the standby as below.

 

shared_preload_libraries = '$libdir/dbms_pipe,$libdir/edb_gen,$libdir/sql-profiler,$libdir/plugin_debugger,$libdir/pg_stat_statements,$libdir/pg_buffercache'

 

Maybe this can be a tip to solve the problem.

 

I appreciate any help and tell me if you need any more information.

 

Thanks,

Clailson de Almeida.

2 ACCEPTED SOLUTIONS

Accepted Solutions
EDB Team Member

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi @Clailson

 

This thread was escalated to our Engineering team. We received response from our team that 9.5.14 is having certain bugs related to WAL and replay. It was fixed in 9.5.15.

Request you to perform minor upgrade to 9.5.15 version.

 

Thanks,

Ninad

Level 2 Adventurer

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi,

 

The client updated the OS along with the version of Postgres and said that so far there were no problems.

Thank you for your help.

7 REPLIES
EDB Team Member

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi Clailson,

 

From the error logs shared by you, it looks like archive file is getting corrupted while transferring from Master to Standby.

 

For the further analysis, could you please share the recovery.conf and postgresql.conf file of standby server and postgresql.conf file from Master.

 

For now, could you please rebuild standby server to resolve this issue.

 

Please perform the vacuum analyze and pg_dump on the Master server just to verify there is no corruption on it.

 

Also, please confirm whether the standby database cluster was properly shut down before the reboot of the server.

 

Please let us know in case of any issues/queries,

 

Regards,

Sudhir

EDB Team Member

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi @Clailson,

 

Adding to previous email, the corrupted WAL belongs to the relation (table/view) whose data file on the disk is ‘base/16400/741450880’.

 

You can find the relation  with below queries :

 

select datname from pg_database where oid = 16400;

 

This will give the database name, then connect to that database and use below query to get the table name :

 

SELECT pg_filenode_relation(0, 741450880);

 

Find the relation belongs to it and perform the vacuum analyze and pg_dump on the Master server.

 

Also, please check the OS logs on both Master and Standby which points to the corruption.

 

Regards,

Sudhir

Highlighted
Level 2 Adventurer

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi Sudhir,

 

I do not think the problem is in the WAL file because the restart operation was done again and that the same segment (0000000200004C2700000010) was successfully restored and stopped in another segment (0000000200004C2700000014).

 

Where can I attach a file (recovery.conf e etc.) in this post?

 

The reboot on the server was done with the cluster service stopped.

 

The standby was rebuilt and the error did not happen anymore. I also report that the restart was done whith shared_preload_libraries commented. I'm waiting for the client to run some tests to see if the error reoccurs.

 

Additionally, I asked the client to perform a vacuum full on the 2 tables to check for problems.

 

Regards,

Clailson de Almeida

 

EDB Team Member

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi @Clailson

 

This thread was escalated to our Engineering team. We received response from our team that 9.5.14 is having certain bugs related to WAL and replay. It was fixed in 9.5.15.

Request you to perform minor upgrade to 9.5.15 version.

 

Thanks,

Ninad

Level 2 Adventurer

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi Ninad. Thanks for the answer. I asked the client to do the upgrade of the master and standby servers and also recreate the latter. As soon as he finishes the tests, I'll return.
EDB Team Member

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi Clailson,

 

Thank you for the update. We will wait to hear back from you.

 

Regards,

Dhananjay

Level 2 Adventurer

Re: Error while starting standby - PANIC: WAL contains references to invalid pages

Hi,

 

The client updated the OS along with the version of Postgres and said that so far there were no problems.

Thank you for your help.