cancel
Showing results for 
Search instead for 
Did you mean: 

Incremental backup restore failure in BART 2.3

SOLVED
Level 2 Adventurer

Incremental backup restore failure in BART 2.3

Hi,
We are running tests on BART 2.3 and it had the following error when trying to perform a restore of an incremental backup greater than 1.73 GB:

 

INFO: base backup restored
ERROR: worker-1 (44430) failed: failed to overlay modified blocks; failing command was: bart --config-path /usr/edb/bart/etc/bart-10.cfg STREAM --source /backup/bart/server01/1550696096559/base/worker-1.cbm --source_dir /backup/bart/server01/1550696096559/base | ssh -o BatchMode=yes -o PasswordAuthentication=no postgres@192.168.16.22 'bart APPLY-INCREMENTAL --source /pgsql/app_PRD/worker-1.cbm --source_dir /pgsql/app_PRD --backupid 1550696096559 --tsmap ""'
ERROR: sendfile(/backup/bart/server01/1550696096559/base/base/14240/16501.36.blk) failed: Arquivo muito grande
ERROR: unable to read block in stdin: Erro desconhecido 32749: read 70 bytes at off 813285376

 

Follow the full log test.

 

Could you please help me?

2 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi Oiramantt ,

 

This issue is escalated to the Development team and it is reported as a bug.

 

It will get fixed in the next point release soon. 

 

Please click on the hyperlink to Subscribe BART Technical update, So you will get the notification about new updates.

 

Hope this information will help you.

Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi, good afternoon!

Thank you!

You guys are awesome!

 

5 REPLIES 5
Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Follow the full log test.

____________

-bash-4.2$ locale
LANG=pt_BR.iso88591
LC_CTYPE="pt_BR.iso88591"
LC_NUMERIC="pt_BR.iso88591"
LC_TIME="pt_BR.iso88591"
LC_COLLATE="pt_BR.iso88591"
LC_MONETARY="pt_BR.iso88591"
LC_MESSAGES="pt_BR.iso88591"
LC_PAPER="pt_BR.iso88591"
LC_NAME="pt_BR.iso88591"
LC_ADDRESS="pt_BR.iso88591"
LC_TELEPHONE="pt_BR.iso88591"
LC_MEASUREMENT="pt_BR.iso88591"
LC_IDENTIFICATION="pt_BR.iso88591"
LC_ALL=

____________

[BART]
bart_host = postgres@10.0.100.37
backup_path = /backup/bart
pg_basebackup_path = /opt/edb/as10/bin/pg_basebackup
logfile = /tmp/bart-2.1-as10.log
scanner_logfile = /tmp/bart_scanner-2.1-as10.log
xlog_method=stream
retention_policy = 1 BACKUPS

[bdpostprd10]
host = bdpostprd10
port = 5432
user = postgres
cluster_owner = postgres
description = "Teste Codad"
#remote_host = postgres@10.0.100.53
allow_incremental_backups = enabled

_________________

 

-bash-4.2$ bart SHOW-SERVERS
SERVER NAME : bdpostprd10
HOST NAME : bdpostprd10
USER NAME : postgres
PORT : 5432
REMOTE HOST :
RETENTION POLICY : 1 Backups
DISK UTILIZATION : 0.00 bytes
NUMBER OF ARCHIVES : 0
ARCHIVE PATH : /backup/bart/bdpostprd10/archived_wals
ARCHIVE COMMAND : cp %p /backup/bart/bdpostprd10/archived_wals/%f
XLOG METHOD : stream
WAL COMPRESSION : disabled
TABLESPACE PATH(s) :
INCREMENTAL BACKUP : ENABLED
DESCRIPTION : "Teste Codad"

-bash-4.2$
-bash-4.2$
-bash-4.2$
-bash-4.2$ bart CHECK-CONFIG
INFO: Verifying that pg_basebackup is executable
INFO: success - pg_basebackup(/opt/edb/as10/bin/pg_basebackup) returns version 10,000000

-bash-4.2$ ll /backup/bart/bdpostprd10/archived_wals
total 39552
-rw-------. 1 postgres 1001 16777216 Mar 1 12:07 000000020000000B000000AB
-rw-------. 1 postgres 1001 16777216 Mar 1 12:14 000000020000000B000000AC
-rw-------. 1 postgres 1001 305 Mar 1 12:14 000000020000000B000000AC.00000028.backup
-rw-rw-r--. 1 postgres 1001 164 Mar 1 12:16 000000020000000BAB0000280000000BAC000000.mbm
-rw-rw-r--. 1 postgres 1001 164 Mar 1 12:16 000000020000000BAC0000280000000BAD000000.mbm
-bash-4.2$
-bash-4.2$
-bash-4.2$
-bash-4.2$
-bash-4.2$
-bash-4.2$ bart BACKUP -s bdpostprd10 -F p --parent full10 --check
INFO: DebugTarget - getVar(checkDiskSpace.bytesAvailable)
INFO: checking /backup/bart/bdpostprd10/archived_wals for MBM files from B/AC000028 to B/AD000000
INFO: all prerequisite of incremental backup verified


-bash-4.2$ bart BACKUP -s bdpostprd10 -F p --parent full10 --backup-name incr1001
INFO: DebugTarget - getVar(checkDiskSpace.bytesAvailable)
INFO: checking /backup/bart/bdpostprd10/archived_wals for MBM files from B/AC000028 to B/AF000000
INFO: new backup identifier generated 1551453550076
INFO: creating 1 harvester threads
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
INFO: backup completed successfully
INFO:
BART VERSION: 2.3.0
BACKUP DETAILS:
BACKUP STATUS: active
BACKUP IDENTIFIER: 1551453550076
BACKUP NAME: incr1001
BACKUP PARENT: 1551452865195
BACKUP LOCATION: /backup/bart/bdpostprd10/1551453550076
BACKUP SIZE: 47.23 MB
BACKUP FORMAT: plain
BACKUP TIMEZONE: Brazil/East
XLOG METHOD: stream
BACKUP CHECKSUM(s): 0
TABLESPACE(s): 0
START WAL LOCATION: 000000020000000B000000AF
STOP WAL LOCATION: 000000020000000B000000B0
BACKUP METHOD: pg_start_backup
BACKUP FROM: master
START TIME: 2019-03-01 12:19:10 -03
STOP TIME: 2019-03-01 12:19:16 -03
TOTAL DURATION: 6 sec(s)


-bash-4.2$ bart SHOW-BACKUPS
SERVER NAME BACKUP ID BACKUP NAME BACKUP PARENT BACKUP TIME BACKUP SIZE WAL(s) SIZE WAL FILES STATUS

bdpostprd10 1551453550076 incr1001 full10 2019-03-01 12:19:16 -03 47.23 MB active
bdpostprd10 1551452865195 full10 none 2019-03-01 12:14:29 -03 33.83 GB 80.00 MB 5 active

-bash-4.2$ bart --debug restore -s bdpostprd10 -i incr1001 -p /pgsql/bdpostprd_TESTE
DEBUG: Server: Global, No of Retained Backups 1
INFO: restoring incremental backup 'incr1001' of server 'bdpostprd10'
DEBUG: restoring backup: 1551452865195 (full10)
DEBUG: restoring backup to /pgsql/bdpostprd_TESTE
DEBUG: restore command: tar -cf - -C /backup/bart/bdpostprd10/1551452865195/base . | tar -C /pgsql/bdpostprd_TESTE -xf - --exclude=pg_tblspc --exclude=__payloadChecksum
INFO: base backup restored
DEBUG: backup '1551452865195' restored to '/pgsql/bdpostprd_TESTE'
DEBUG: restoring backup: 1551453550076 (incr1001)
DEBUG: CBM produced: /backup/bart/bdpostprd10/1551453550076/base/worker-1.cbm
DEBUG: worker-1 - 0 blocks
DEBUG: CBM produced: /backup/bart/bdpostprd10/1551453550076/base/worker-0.cbm
DEBUG: 0 blocks to write
DEBUG: executing file movement command: tar -cf - -C /backup/bart/bdpostprd10/1551453550076/base . --exclude *.blk --exclude pg_tblspc | tar -C /pgsql/bdpostprd_TESTE -xf -
DEBUG: executing file movement command: cd . && cp -r /backup/bart/bdpostprd10/1551453550076/base/../1551453550076.cbm /pgsql/bdpostprd_TESTE | true
ERROR: worker-1 (35505) failed: failed to overlay modified blocks; failing command was: bart --debug --config-path ./bart.cfg STREAM --source /backup/bart/bdpostprd10/1551453550076/base/worker-1.cbm --source_dir /backup/bart/bdpostprd10/1551453550076/base | bart --debug APPLY-INCREMENTAL --source /pgsql/bdpostprd_TESTE/worker-1.cbm --source_dir /pgsql/bdpostprd_TESTE --backupid 1551453550076 --tsmap ""
DEBUG: collecting incremental 1551453550076 (pid 35508)
DEBUG: Server: Global, No of Retained Backups 1
bart: /usr/include/boost/token_iterator.hpp:56: const Type& boost::token_iterator<TokenizerFunc, Iterator, Type>::dereference() const [with TokenizerFunc = boost::char_separator<char>; Iterator = __gnu_cxx::__normal_iterator<const char*, std::basic_string<char> >; Type = std::basic_string<char>]: Assertion `valid_' failed.
DEBUG: exit status for process 35507: 0 (happy)
bash: line 1: 35507 Concluído bart --debug --config-path ./bart.cfg STREAM --source /backup/bart/bdpostprd10/1551453550076/base/worker-1.cbm --source_dir /backup/bart/bdpostprd10/1551453550076/base
35508 Abortado (imagem do núcleo gravada)| bart --debug APPLY-INCREMENTAL --source /pgsql/bdpostprd_TESTE/worker-1.cbm --source_dir /pgsql/bdpostprd_TESTE --backupid 1551453550076 --tsmap ""

ERROR: process 35505 exited with status 1
-bash-4.2$


-bash-4.2$ bart CHECK-CONFIG
INFO: Verifying that pg_basebackup is executable
INFO: success - pg_basebackup(/opt/edb/as10/bin/pg_basebackup) returns version 10,000000
-bash-4.2$
-bash-4.2$
-bash-4.2$ bart CHECK-CONFIG -s bdpostprd10
INFO: Checking server bdpostprd10
INFO: Verifying cluster_owner and ssh/scp connectivity
INFO: success
INFO: Verifying user, host, and replication connectivity
INFO: success
INFO: Verifying that user is a database superuser
INFO: success
INFO: Verifying that cluster_owner can read cluster data files
INFO: success
INFO: Verifying that you have permission to write to vault
INFO: success
INFO: /backup/bart/bdpostprd10
INFO: Verifying database server configuration
INFO: success
INFO: Verifying that WAL archiving is working
INFO: waiting 30 seconds for /backup/bart/bdpostprd10/archived_wals/000000020000000B000000B1
INFO: success
INFO: Verifying that bart-scanner is configured and running
INFO: success
-bash-4.2$

 

Level 3 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi oiramantt,

 

Is there any bulk updates going on the database during incremental backup?

 

We had observed this issue (restoration of incremental backup) when incremental backup taken during bulk DB updates and tried to restore it.

Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi Ranjan,
Thank you for your help.
There is no bulk updates occurring on the database server or in the database.
The only operation performed even on the database is running BART 2.3 itself (full backup restore and then incremental backup).
Could you help me with any more suggestions or tips?

Highlighted
Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi Oiramantt ,

 

This issue is escalated to the Development team and it is reported as a bug.

 

It will get fixed in the next point release soon. 

 

Please click on the hyperlink to Subscribe BART Technical update, So you will get the notification about new updates.

 

Hope this information will help you.

Level 2 Adventurer

Re: Incremental backup restore failure in BART 2.3

Hi, good afternoon!

Thank you!

You guys are awesome!