Applies To:: RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)

Copyright (c) 2022, Oracle. All rights reserved. Oracle Confidential.
RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)
In this Document
Purpose
Troubleshooting Steps
References
APPLIES TO:
Oracle Database - Enterprise Edition - Version 9.0.1.0 and later

Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and

later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.
PURPOSE
This note discusses what to look for when queries against the RMAN repository (whether in the controlfile or a catalog
database) are performing badly and what diagnostics needs to be gathered when raising a Service Request with Oracle
Support Services.
This note is intended for use by Database Administrators and Support personnel investigating RMAN performance
problems relating specifically to catalog resyncs or queries against the controlfile or catalog database whilst running RMAN.
TROUBLESHOOTING STEPS
Identifying a problem with the RMAN repository

The RMAN repository is updated whenever new backups are taken or housekeeping is done to maintain the backup
metadata; it is queried when RMAN reports are run and during restore to check backup history and determine the best and
most efficient way to carry out the restore. If a catalog is used, a number of resyncs are done first to update the catalog
with any new information in the controlfile but once the resyncs are complete, ALL subsequent queries of RMAN metadata
(during crosscheck, report, restore etc) are run against the catalog database only.

You know there is a problem if:
An explicit catalog resync takes a long time to complete

Statspack or AWR reports for a nocatalog run shows top wait events in the target are for CPU or control file IO:
"control file parallel write"
"control file sequential read"
Statspack or AWR reports for catalog run shows top wait events in the catalog are for CPU or the catalog datafile
IO:
"db file scattered read"
"db file sequential read"
Statspack or AWR reports shows most expensive queries in terms of CPU or elapsed time are those issued against
the controlfile or rman catalog tables
You have worked through Note 360443.1: RMAN Backup Performance, and identified the bottleneck to be in the
Oracle Layer before the backup actually starts
You have worked through Note 740911.1: RMAN Restore Performance, and identified the bottleneck to be in the
Oracle Layer before the restore actually starts
Control_file_record_keep_time and recovery window

Control_file_record_keep_time specifies the minimum amount of RMAN metadata guarranteed to be retained in the
controlfile; the default is 7 days. In practice the rate at which records age out of the controlfile depends on how long it
takes for a recyclable section to fill up which may well be more than 7 days. It is important to note that
control_file_record_keep_time and recovery window are NOT the same; the former governs when records can be
overwritten in the controlfile, primarily to prevent unlimited controlfile growth whilst the latter is concerned with retention
of RMAN metadata according to business requirements (see Bug 6448254: RAISE WARNING IF RECOVERY WINDOW IS >
CONTROL_FILE_RECORD_KEEP_TIME IN NOCAT MODE). If a catalog is used and RMAN is run daily or at a frequency
LESS than control_file_record_keep_time days, RMAN metadata will never be lost as implicit catalog resyncs ensure data is
regularly propogated into the catalog.
Integrity of RMAN metadata
Constraints in the catalog ensure the integrity of the rman metadata. For example, a backup_set must have a
corresponding backup_piece which must have corresponding backup_X (datafile, redolog ) entries. There are no such
constraints in a controlfile; records in different sections can age out at different rates. For example, the one:many
relationship between backup_set and backup_redolog entries means the backup_redologs section will fill up faster than
the backup_set section and over time, a backup_redolog entry may age out of the controlfile BEFORE its corresponding
backup_set entry, leaving a backup_set entry with missing backup_redolog entries (mis-matched). Running housekeeping
operations (report/delete obsolete, crosscheck etc) against such a CONTROLFILE with mismatched entries will produce
unpredictable results (see Bug 7357779: RMAN REPORT/DELETE OBSOLETE RETURNS ARCHIVELOG BACKUPS INSIDE
RETENTION POLICY).
To avoid any issues with mis-matched entries, set control_file_record_keep_time equal to the recovery window +1 which
ensures that RMAN metadata INSIDE the retention policy will never be affected. In practice this may not be possible if a
very large recovery window is needed so typically a catalog is used, a retention policy configured and
control_file_record_keep_time left to its default of 7 days. But there may still be times when the catalog database is
unavailable (it is being upgraded perhaps or is under investigation) and an RMAN backup must still proceed in which case:
set control_file_record_keep_time days to the recovery window plus 1 to ensure no loss of metadata whilst the
catalog is unavailable
DO NOT RUN ANY MAINTENANCE OPERATIONS until the catalog is again available expecially if your recovery
window is greater than the original setting of control_file_record_keep_time days
Tackling a performance issue
AWR snapshot reports from the target (nocatalog) or catalog database should identify the specific SQL that is causing a
problem in terms of CPU, high IO or elapsed times.
Refer to the following notes to see if the problematic SQL is a known issue:
Note 247611.1:Known RMAN Performance Problems

Note 463227.1: Oracle10g RMAN Recovery Catalog Known Performance Issues
If none of the issues in the above notes are relevant, then further investigation is needed and the following diagnostics will
be useful and should be uploaded if a Service Request is raised with Oracle Support Services. There are three processes to
consider:
The RMAN client – this should always be traced

The 1st default channel (for nocatalog operations only, this process runs all the rman queries against the controlfile)
The RMAN session in the catalog instance
All the traces will generate a lot of information so consider FIRST how to minimise the output by stripping the job down to
a SINGLE rman command. Execute each command in a script in isolation to see which command produces the problem
and trace just that one command. It is always worth running an explicit catalog resync first to see if the real problem lies
in the implicit resync done for virtually all commands.
Tracing the RMAN client
Ideally, set debug trace on the RMAN command line and execute a simplied script to illustrate the problem eg
%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc debug
run {
allocate channel t1 type sbt parms=(…………);
backup database plus archivelog all;
Otherwise trace only the command that you are interested in. The following example traces only the crosscheck command:
%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc

run {
allocate channel t1 type sbt parms=(….);

allocate channel t2 type sbt parms=(….);
backup database;
backup archivelog all until time ' sysdate –7' delete all input;
debug on;
crosscheck backup;
debug off;
}
Tracing queries against the RMAN repository

Make sure before you invoke rman that the following environmental variables are set:
NLS_LANG='american_america.<charset>'
NLS_DATE_FORMAT='dd-mon-rr hh24:mi:ss'
And for the target database (nocatalog operations) or catalog database (catalog operations):
TIME_STATISTICS=TRUE

MAX_DUMP_FILE_SIZE=UNLIMITED
If a recovery catalog is normally used, try again WITHOUT the catalog; if the problem persists then for diagnostic
purposes, get the trace against the CONTROLFILE only. Tracing the RMAN catalog session should only be done if the
problem only occurs when a catalog is used.

The Controlfile (nocatalog)
%rman target / log rman.log trace rman.trc debug
run {
sql "alter session set events ''10046 trace name context forever, level 12''";
set command id to 'rman';
allocate channel for maintenance type sbt…………..;
backup database;
Note: Two single quotes before : 10046

: level 12
Two single quotes and a double quote after

This traces the first default channel; trace file will be generated in the target udump directory using the format
<SID>_ora_<spid>.trc. To find the <spid> of the 1st default channel:
col program format a20
col action format a20
select s.sid, p.spid, s.program, s.client_info,s.action, seq#, event, wait_time,
seconds_in_wait AS sec_wait
from v$session s, v$process p

where s.paddr = p.addr and s.program like '%rman%';
Typical output:
SID SPID PROGRAM CLIENT_INFO

---------------------------------------------------------

ACTION SEQ# EVENT WAIT_TIME SEC_WAIT

-----------------------------------------------------------------------------
139 24822 rman@celcsol4 (TNS V1-V3)
0000012 FINISHED129 853 SQL*Net
message from client 0 1794

147 4574 rman@celcsol4 (TNS V1-V3)
115 SQL*Net message from client 0 1797

135 5056 rman@celcsol4 (TNS V1-V3) rman channel=ORA_DISK_1
0000046 FINISHED129 2219 SQL*Net message from client 0 1794
Note:

is the polling channel; it never has a value in ACTION
- sid 147
- sid 139 is the 1st default Channel (CLIENT_INFO is null)
- sid 135 is the allocated channel ORA_DISK_1 (per CLIENT_INFO)

The Catalog
Allow the rman job to make the catalog connection first – it may help to use the host command to allow time to find the
catalogn session:
%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc debug
run {
set command id to 'rman';

allocate channel for maintenance type sbt…………..;
host;
backup database;
}
The host command will halt execution and return control to the OS prompt.
To identify the RMAN session in the catalog database:
SELECT s.sid, s.serial#, p.spid, s.program, s.action

FROM v$session s, v$process p
WHERE s.paddr = p.addr
AND s.program like '%rman%';
Note the spid value eg 3164 then do:
oradebug setospid 3164
oradebug unlimit
oradebug event 10046 trace name context forever, level 12
exit
Once this is done, to return to the RMAN session and to continue execution of the script type: exit
A trace file will be generated in the CATALOG udump directory using the format <SID>_ora_<spid>.trc.
Upload Diagnostics

1. RMAN client trace file

2.
1st default channel 10046 trace file (nocatalog only)

3. RMAN catalog 10046 trace file (catalog issues only)
4. Related AWR/Statspack reports
REFERENCES
BUG:6448254 - RAISE WARNING IF RECOVERY WINDOW IS > CONTROL_FILE_RECORD_KEEP_TIME IN NOCAT MOD

BUG:7357779 - RMAN REPORT/DELETE OBSOLETE RETURNS ARCHIVELOG BACKUPS INSIDE RETENTION POLICY
NOTE:247611.1 - Known RMAN Performance Problems
NOTE:360443.1 - RMAN Backup Performance
NOTE:463227.1 - Oracle10g RMAN Recovery Catalog Known Performance Issues

NOTE:740911.1 - RMAN Restore Performance
Didn't find what you are looking for?

Applies To:: RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)

Uploaded by

Copyright:

Available Formats

Applies To:: RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applies To:: RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)

Uploaded by

Copyright:

Available Formats

Copyright (c) 2022, Oracle. All rights reserved. Oracle Confidential.

RMAN Troubleshooting Catalog Performance Issues (Doc ID 748257.1)

Oracle Database - Enterprise Edition - Version 9.0.1.0 and later

Oracle Database Exadata Cloud Machine - Version N/A and later

Oracle Cloud Infrastructure - Database Service - Version N/A and

Information in this document applies to any platform.

Identifying a problem with the RMAN repository

You know there is a problem if:

An explicit catalog resync takes a long time to complete

Control_file_record_keep_time and recovery window

Integrity of RMAN metadata

Tackling a performance issue

Note 247611.1:Known RMAN Performance Problems

The RMAN client – this should always be traced

Tracing the RMAN client

%rman target / catalog usr/pw@<alias> log rman.log trace rman.trc

allocate channel t1 type sbt parms=(….);

Tracing queries against the RMAN repository

The Controlfile (nocatalog)

Note: Two single quotes before : 10046

SID SPID PROGRAM CLIENT_INFO

ACTION SEQ# EVENT WAIT_TIME SEC_WAIT

0000012 FINISHED129 853 SQL*Net

message from client 0 1794

115 SQL*Net message from client 0 1797

0000046 FINISHED129 2219 SQL*Net message from client 0 1794

- sid 135 is the allocated channel ORA_DISK_1 (per CLIENT_INFO)

SELECT s.sid, s.serial#, p.spid, s.program, s.action

WHERE s.paddr = p.addr

AND s.program like '%rman%';

Note the spid value eg 3164 then do:

1. RMAN client trace file

1st default channel 10046 trace file (nocatalog only)

4. Related AWR/Statspack reports

BUG:6448254 - RAISE WARNING IF RECOVERY WINDOW IS > CONTROL_FILE_RECORD_KEEP_TIME IN NOCAT MOD

NOTE:247611.1 - Known RMAN Performance Problems

NOTE:360443.1 - RMAN Backup Performance

NOTE:463227.1 - Oracle10g RMAN Recovery Catalog Known Performance Issues

Didn't find what you are looking for?

You might also like