In this blog post, I will share about few gotchas to look out for when backing up tablespaces in a DB2 DPF database. What is DPF? I wrote a detailed blog post on db2commerce.com.
Why Tablespace backups?
Why should we look into backing up tablespaces instead of the entire database? Below are couple of instances when we want to backup tablespaces:
- You just created a new tablespace; Before taking an incremental/delta backup, the newly created tablespace needs to be backed-up.
- Tablespace is in BACKUP PENDING state. This can happen when we perform a load operation for a recoverable database and specify the COPY NO parameter. We can remove the tablespace from the BACKUP PENDING state by backing up the tablespace.
- Is it practical to do database backups? A database backup for a multi-terabyte database (think Data Warehouse) is not practical all the time. It consumes too many resources (CPU, storage, lock contention etc.). If the recovery effort is well thought out, we can get away with performing tablespace level backups.
Now that we know why we need to look at tablespace level backups, let us look at a problem I recently encountered.
Problem:
SQL2066N encountered when backing up a tablespace in a DB2 DPF database. The backup was being run from a catalog node.
$ db2 "backup database TESTDB tablespace (TBSP1) online to /backups"
SQL2066N The command failed because the specified table space either does not
exist or is not supported with the specified command. Specified table space
name: "TBSP1".
Troubleshooting SQL2066N:
One of the quickest ways to decipher what a SQL error code means is to prefix it with a db2 ?
as shown below.
$ db2 "? SQL2066N"
SQL2066N The command failed because the specified table space either
does not exist or is not supported with the specified command.
Specified table space name: "".
Explanation:
The table space name specified is syntactically correct, but it does not
exist in the database or cannot be used in the specified utility
operation.
A table space might be disallowed for many reasons, including the
following examples:
* If the utility in use is a backup operation, the table space might be
disallowed because it is a system or user temporary table space or is
in an inconsistent state.
* If the utility in use is a restore operation, the table space might
be disallowed because another restore operation or a rollforward
operation of the table space is already in progress.
* If the utility is a rollforward operation, the table space might be
disallowed because a restore of the table space is already in
progress.
User response:
Resubmit the utility command, specifying a valid table space.
Initial findings:
From the above, it looks like we might be passing an incorrect tablespace name. Messages in db2diag.log also lead us to believe that incorrect tablespace name could be the cause of the problem.
2015-09-09-06.52.51.712287-240 E6241A1073 LEVEL: Severe
PID : 15991648 TID : 48053 PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : TESTDB
APPHDL : 0-2914 APPID: *N0.db2inst1.150909105252
AUTHID : db2inst1 HOSTNAME: dvserver1
EDUID : 48053 EDUNAME: db2agent (TESTDB) 0
FUNCTION: DB2 UDB, database utilities, sqlubBuildAppTbsp, probe:1815
MESSAGE : SQL2066N The command failed because the specified table space either
does not exist or is not supported with the specified command.
Specified table space name: "".
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -2066 sqlerrml: 10
sqlerrmc: TBSP1
sqlerrp : sqlubBui
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
At this point, I wanted to find out if the tablespace actually exists in the database and if it does, which partitions is it created on? Here is the SQL to find out.
SELECT CHAR(TBSP_NAME, 20) TABLESPACE,
DBPARTITIONNUM
FROM SYSIBMADM.TBSP_UTILIZATION
WHERE TBSP_NAME LIKE '%TBSP1%'
ORDER BY TBSP_NAME,
DBPARTITIONNUM
WITH UR;
TABLESPACE DBPARTITIONNUM
-------------------- --------------
TBSP1 1
TBSP1 2
TBSP1 3
TBSP1 4
TBSP1 5
5 record(s) selected.
From the above output, tablespace TBSP1 does exist in the database and is created on partitions 1 through 5. Note that it is NOT created on partition 0 (which is the catalog node). However, as stated at the beginning of the blog post, we are running the backup command from partition 0. Can this be the reason? Are we running into problems because we are running backup of a tablespace (that is NOT created on catalog partition) from the catalog partition? It turns out that this is the problem. How do we handle this?
Solution:
When we look at backup
command’s syntax, we find the following from IBM’s Knowledge Center – DB2 Backup Command.
Scope
In a partitioned database environment, if no database partitions are specified, this command affects only the database partition on which it is executed.
What this means is that we cannot take a backup of tablespace TBSP1 from catalog node (partition 0) as the tablespace does not exist on this partition, at least without making changes to the backup command.
How to get around this? It appears like we have couple of solutions.. but, when we dig further, we’ll find out that we only have one solution.
Explicitly connect to one of the partitions:
We realized that we were on partition 0 when we were backing up a tablespace that does not exist on partition 0. What if we explicitly connect to a partition (example partition #1) on which the tablespace exists? Would that work? Let us see.
$export DB2NODE=1
$ db2 terminate
DB20000I The TERMINATE command completed successfully.
$ echo $DB2NODE
1
$ db2 "backup database TESTDB tablespace (TBSP1) online to /backups"
Backup successful. The timestamp for this backup image is : 20150909070438
How do we know if we backed up tablespace on all partitions or on a single partition? Well, for starters, we only see one backup image when we list the backup images. If we find only one backup image, it means that the backup was run only on a single partition. When a (tablespace/database) backup is run for multiple partitions, multiple images are generated.. one per database partition.
$ ls -ltr TESTDB* -rw------- 1 db2inst1 db2iadm1 4044328960 Sep 09 07:12 TESTDB.3.db2inst1.DBPART001.20150909071246.001
Also, messages in db2diag.log point to successful completion of backup only on partition #1.
2015-09-09-07.04.46.351196-240 E27460A508 LEVEL: Info PID : 5505716 TID : 14968 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : TESTDB APPHDL : 1-30285 APPID: *N1.db2inst1.150909110438 AUTHID : db2inst1 HOSTNAME: dvserver1 EDUID : 14968 EDUNAME: db2agent (TESTDB) 1 FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:1897 MESSAGE : Starting an online tablespace TBSP1... backup. 2015-09-09-07.04.56.977959-240 E32815A461 LEVEL: Info PID : 5505716 TID : 14968 PROC : db2sysc 1 INSTANCE: db2inst1 NODE : 001 DB : TESTDB APPHDL : 1-30285 APPID: *N1.db2inst1.150909110438 AUTHID : db2inst1 HOSTNAME: dvserver1 EDUID : 14968 EDUNAME: db2agent (TESTDB) 1 FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:1070 MESSAGE : Backup complete.
Why is this so? This is because when a backup is run, it only affects the database partition on which the backup is run. What if we have multiple tablespaces that we want to backup them up? What if each of them is on a subset of database partitions? Is there an easy way to handle such complex tablespace and partition mapping? Yes there is..
ALL DBPARTITIONNUMS clause to rescue:
The backup command has a clause ‘ALL DBPARTITIONNUMS’ that would allow the backup to be run on all the database partitions. What if a tablespace does not exist on all the database partitions? Well, there is no problem because the backup utility simply prints a warning message indicating that the tablespace does not exist on a specific partition. See below for an example:
$ db2 "backup database TESTDB ON ALL DBPARTITIONNUMS tablespace (TBSP1) online to /backups"
Part Result
---- ------------------------------------------------------------------------
0000 SQL2430W The database backup succeeded, but the following table spaces do not exist on this database partition: "TBSP1".
0001 DB20000I The BACKUP DATABASE command completed successfully.
0002 DB20000I The BACKUP DATABASE command completed successfully.
0003 DB20000I The BACKUP DATABASE command completed successfully.
0004 DB20000I The BACKUP DATABASE command completed successfully.
0005 DB20000I The BACKUP DATABASE command completed successfully.
Backup successful. The timestamp for this backup image is : 20150906182257
In the above output, we see a warning (SQL2430W) message about the tablespace TBSP1 not existing on the catalog partition. Look at the prefix ‘000’ before the warning message for partition number. This message could be safely ignored and does not impact backup of the tablespace on other partitions.
Conclusion:
- When backing up a tablespace, we need to pay attention to how the tablespace is spread across multiple database partitions.
- Alternatively, include the
ALL DBPARTITIONNUMS
clause in the backup command as this would take care backups on all database partitions on which the tablespace exists. Also, this is the best practice to ensure recoverability of database objects on all the partitions.
What other gotchas have you encountered when dealing with backups in a single or multi-partitioned DB2 LUW database? Please share in comments. Thank you for reading.
Pingback: Db2 Tablespace - How Do I Perform A Db2 Tablespace Level Backup, Restore And ...