Страницы

суббота, 13 марта 2021 г.

Strict Redaction Semantics

A DBMS_REDACT issue has recently popped up on SQL.RU. I constructed a simple test case to demonstrate the problem in 19.9:
set echo on lin 120

conn tc/tc@localhost/pdb

grant create table to scott identified by tiger;
alter user scott quota unlimited on users;

drop table scott.emp;
create table scott.emp(client_name varchar2(30));

insert into scott.emp values ('Larry Ellison');
insert into scott.emp values ('Jeff Bezos');
insert into scott.emp values ('Elon Musk');
insert into scott.emp values ('Vladimir Putin');
insert into scott.emp values (null);

exec dbms_redact.add_policy( -
       object_schema         => 'SCOTT', -
       object_name           => 'emp', -
       column_name           => 'CLIENT_NAME', -
       policy_name           => 'client_fio', -
       function_type         => dbms_redact.regexp, -
       regexp_pattern        => '(\S{3})(\S+)', -
       regexp_replace_string => '\1***', -
       expression            => '1 = 1' -
)

col client_name for a16
col c1 for a16
col c2 like c1
col c3 like c1
create or replace view scott.vw as
select client_name
     , nvl(client_name, 'zyzy') c1
     , case when client_name='Vla*** Put***' then 'y' else client_name end c2
     ,'Hello '||client_name c3
  from scott.emp;

grant alter session, create session to tc2 identified by tc2;

grant select on scott.vw to tc2;

conn tc2/tc2@localhost/pdb

select * from scott.vw;
In a nutshell, there is a table that has a data redaction policy. There is a view built on top of it. However, when a user tries to select from the view, it returns an error:
SQL> select * from scott.vw;
select * from scott.vw
                    *
ERROR at line 1:
ORA-28094: SQL construct not supported by data redaction
This restriction is mentioned in the Advanced Security Guide:
To avoid the ORA-28094 error, ensure that the query has the following properties:
In a CREATE VIEW definition or an inline view, there cannot be any SQL expression that involves a redacted column.
When I come across such restrictions, it is always interesting to explore if there is a way of getting around it. Let us enable Data Redaction tracing and rerun the query:
SQL> alter session set events 'trace[radm]';

Session altered.

SQL> select * from scott.vw;
select * from scott.vw
                    *
ERROR at line 1:
ORA-28094: SQL construct not supported by data redaction
Here is the trace file:
kzdmpci: Table obj# is: 23656
kzdmpci: Policy Name is: client_fio
kzdmpci: Policy Expression ID radm_mc$.pe# = 5001
kzdmpci: intcol# is: 1
kzdmpci: Masking Function is: 5
kzdmpci: RegExp pattern: (\S{3})(\S+) (len=12)
kzdmpci: RegExp replace: \1*** (len=5)
kzdmpci: RegExp match  :  (len=0), mpl=17
kzdmpci: no RADM policies exist, starting new RADM Policy Chain.
kzdmpci: new RADM Policy Chain ci->kkscdmpc=0x12da37ec started with  obj#=23656
kzdmpci:2: radm_pe$ lookup: for Policy Expression ID pe# 5001, pe_pexpr is [1 = 1], pe_name = [], pe_obj# = 23656
kzdmpci: added column 1 for object 23656
kzradm_vc_replace_expr_allowed: policy on table EMP

kzdmrfs: entered with opntyp of OPNTCOL
kzdmrfs: found redacted col CLIENT_NAME at 7
kzradm_vc_replace_expr_allowed: 0

kzdmchkerpna: kzpcxp(KZSERP) gave FALSE, no bypass.

kzdmchkerpna: kzpcxp(KZSERP) gave FALSE, no bypass.

kzdmchkerpna: kzpcxp(KZSERP) gave FALSE, no bypass.

kzdmchkerpna: kzpcxp(KZSERP) gave FALSE, no bypass.

kzdmchkerp: kzpcsp(KZSERP) gave FALSE, no bypass.
kzdmprqb: entered
kzdminsc: entered with opntyp=OPNTCOL

kzdminsc: view column CLIENT_NAME with KCCF2MASKDEP detected, calling kzdmpvc
kzdmpvc: level 0, column 'CLIENT_NAME' has KCCF2MASKDEP flag
kzdmpvc: level 0, calling kzdmcrmo to insert MASK operator around view column with name 'CLIENT_NAME'

kzdmcrmo: the pe# is 5001
kzdmcrmo: entered, creating MASK operator for column CLIENT_NAME of table EMP
kzdmcrmo: policy chain = 0x6319aa30
kzdmcrmo: obj# = 23656, intcol# = 1
kzdmcrmo: New ctxmask chain pgactx->ctxmask=0x62485db0 started with obj#=23656
kzdmcrmo: found object 23656
kzdmcrmo: found column 1, metadata=0x6319a988
kzdmcrmo: New ctxmask_re chain 0x62485d80 started with col=(23656,1)
kzdmcrmo: finished creating MASK operator
kzdmpvc: level 0, kzdmcrmo inserted MASK operator.
kzdminsc: entered with opntyp=OPNTCOL

kzdminsc: view column C1 with KCCF2MASKDEP detected, calling kzdmpvc
kzdmpvc: level 0, column 'C1' has KCCF2MASKDEP flag
kzdmpvc: level 0, processing SQL expression...
kzdmpvc: strict semantics disallow expression in VIEW


kzdmchkerpna: kzpcxp(KZSERP) gave FALSE, no bypass.
This strict semantics disallow expression in VIEW bit sounded intriguing, so that I decided to research what it is and how to change it:
SQL> select ksppinm, ksppdesc from x$ksppi where ksppinm like '%redaction%';

KSPPINM                        KSPPDESC
------------------------------ ----------------------------------------
_strict_redaction_semantics    Strict Data Redaction semantics
I set the parameter to FALSE and bounced the database:
SQL> alter system set "_strict_redaction_semantics"=false scope=spfile;

System altered.

$ srvctl stop database -db orcl ; srvctl start database -db orcl
Sure enough, that is what I got after selecting the data from the view:
SQL> select * from scott.vw;

CLIENT_NAME      C1                  C2               C3
---------------- ------------------- ---------------- -------------------
Lar*** Ell***    Lar*** Ell***       Lar*** Ell***    Hello Lar*** Ell***
Jef*** Bez***    Jef*** Bez***       Jef*** Bez***    Hello Jef*** Bez***
Elo*** Mus***    Elo*** Mus***       Elo*** Mus***    Hello Elo*** Mus***
Vla*** Put***    Vla*** Put***       y                Hello Vla*** Put***
                 zyzy                                 Hello

Conclusion

Oracle obviously makes such restrictions on purpose. There is no much information about this parameter, but there should be some corner cases due to which Oracle decided to keep the redaction semantics as strict by default. It might be the case that Oracle will make this functionality officially available in a next release once it is ready for production use.

вторник, 9 марта 2021 г.

Oracle to PostgreSQL Replication: Numbers Truncated to 6 Decimal Places

I configured an initial load from Oracle to PostgreSQL, and found that some numbers were truncated after the load, e.g. Oracle number 123.0123456789 happened to be 123.012346 in PostgreSQL.

Setup

Source: Oracle Goldengate 19.1.0.0.4

GGSCI (ogg-hub as c##ggadmin@orcl/CDB$ROOT) 9> versions

Operating System:
Linux
Version #2 SMP Fri Mar 29 17:05:02 PDT 2019, Release 4.1.12-124.26.7.el7uek.x86_64
Node: ogg-hub
Machine: Linux

Database:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Target: Oracle GoldenGate 19.1.0.0.200714 for PostgreSQL

GGSCI (ogg-hub as ggadmin@testdb) 8> versions

Operating System:
Linux
Version #2 SMP Fri Mar 29 17:05:02 PDT 2019, Release 4.1.12-124.26.7.el7uek.x86_64
Node: ogg-hub
Machine: Linux

Database:
PostgreSQL
Version 12.06.0000 PostgreSQL 12.6
ODBC Version 03.52.0000
Driver Information:
GGpsql25.so
Version 07.16.0353 (B0519, U0369)
ODBC Version 03.52

odbc.ini

[ODBC Data Sources]
testdb=PostgreSQL on testdb

[ODBC]
IANAAppCodePage=106
InstallDir=/u01/app/oracle/product/19/ogg_pg_home1

[testdb]
Driver=/u01/app/oracle/product/19/ogg_pg_home1/lib/GGpsql25.so
Description=PostgreSQL 12.6
Database=testdb
HostName=postgres
PortNumber=5432
TransactionErrorBehavior=2

Initial Data

There is a table in the Oracle database:
SQL> create table tc.test(
  2    id int primary key,
  3    n1 number,
  4    n2 number);

Table created.

SQL>
SQL> insert into tc.test values (0, 123.0123456789, 123.0123456789);

1 row created.

SQL> commit;

Commit complete.

SQL>
SQL> select * from tc.test;

             ID              N1              N2
--------------- --------------- ---------------
              0  123.0123456789  123.0123456789
The corresponding PostgreSQL table is this:
testdb=# create table tc.test(
testdb(#   id bigint primary key,
testdb(#   n1 numeric,
testdb(#   n2 numeric(38,10));
CREATE TABLE
testdb=# \d tc.test
                     Table "tc.test"
 Column |      Type      | Collation | Nullable | Default
--------+----------------+-----------+----------+---------
 id     | bigint         |           | not null |
 n1     | numeric        |           |          |
 n2     | numeric(38,10) |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

GoldenGate Initial Load Processes

Extract

GGSCI (ogg-hub) 1> view params einit

EXTRACT einit
USERIDALIAS oggadmin
RMTHOST ogg-hub, MGRPORT 7810
RMTTASK REPLICAT, GROUP rinit

TABLE pdb.tc.test;

Replicat

GGSCI (ogg-hub) 1> view params rinit

REPLICAT rinit
SETENV (PGCLIENTENCODING="UTF8")
SETENV (ODBCINI="/home/ogg/pg/odbc.ini" )
TARGETDB testdb, USERIDALIAS ggadmin
DISCARDFILE ./dirrpt/rinit.dsc, PURGE

MAP pdb.*.*, TARGET *.*;

Performing Initial Load

Extract

GGSCI (ogg-hub) 1> start extract einit

Sending START request to MANAGER ...
EXTRACT EINIT starting

Replicat Report File

***********************************************************************
               Oracle GoldenGate Delivery for PostgreSQL
 Version 19.1.0.0.200714 OGGCORE_19.1.0.0.0OGGBP_PLATFORMS_200628.2141
   Linux, x64, 64bit (optimized), PostgreSQL on Jun 29 2020 04:06:46

Copyright (C) 1995, 2019, Oracle and/or its affiliates. All rights reserved.

                    Starting at 2021-03-08 18:05:04
***********************************************************************

Operating System Version:
Linux
Version #2 SMP Fri Mar 29 17:05:02 PDT 2019, Release 4.1.12-124.26.7.el7uek.x86_64
Node: ogg-hub
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 15745

Description:

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************

2021-03-08 18:05:09  INFO    OGG-03059  Operating system character set identified as UTF-8.

2021-03-08 18:05:09  INFO    OGG-02695  ANSI SQL parameter syntax is used for parameter parsing.

2021-03-08 18:05:09  INFO    OGG-02095  Successfully set environment variable PGCLIENTENCODING=UTF8.

2021-03-08 18:05:09  INFO    OGG-02095  Successfully set environment variable ODBCINI=/home/ogg/pg/odbc.ini.

2021-03-08 18:05:09  INFO    OGG-01360  REPLICAT is running in Remote Task mode.
REPLICAT rinit
SETENV (PGCLIENTENCODING="UTF8")
SETENV (ODBCINI="/home/ogg/pg/odbc.ini" )
TARGETDB testdb, USERIDALIAS ggadmin

2021-03-08 18:05:09  INFO    OGG-03036  Database character set identified as UTF-8. Locale: en_US.UTF-8.

2021-03-08 18:05:09  INFO    OGG-03037  Session character set identified as UTF-8.
DISCARDFILE ./dirrpt/rinit.dsc, purge
MAP pdb.*.*, TARGET *.*;

2021-03-08 18:05:09  INFO    OGG-01815  Virtual Memory Facilities for: COM
    anon alloc: mmap(MAP_ANON)  anon free: munmap
    file alloc: mmap(MAP_SHARED)  file free: munmap
    target directories:
    /u01/app/oracle/product/19/ogg_pg_home1/dirtmp.

2021-03-08 18:05:09  INFO    OGG-25340
Database Version:
PostgreSQL
Version 12.06.0000 PostgreSQL 12.6
ODBC Version 03.52.0000

Driver Information:
GGpsql25.so
Version 07.16.0353 (B0519, U0369)
ODBC Version 03.52.

2021-03-08 18:05:09  INFO    OGG-25341
Database Language and Character Set:
SERVER_ENCODING = "UTF8"
LC_CTYPE        = "en_US.UTF-8"
TIMEZONE        = "Europe/Dublin".

***********************************************************************
**                     Run Time Messages                             **
***********************************************************************


2021-03-08 18:05:10  INFO    OGG-06506  Wildcard MAP resolved (entry pdb.*.*): MAP "PDB"."TC"."TEST", TARGET "TC"."TEST".

2021-03-08 18:05:10  INFO    OGG-02756  The definition for table PDB.TC.TEST is obtained from the trail file.

2021-03-08 18:05:10  INFO    OGG-06511  Using following columns in default map by name: id, n1, n2.

2021-03-08 18:05:10  INFO    OGG-06510  Using the following key columns for target table tc.test: id.

PostgreSQL

testdb=# select * from tc.test;
 id |     n1     |       n2
----+------------+----------------
  0 | 123.012346 | 123.0123456789
(1 row)

testdb=# \d tc.test
                     Table "tc.test"
 Column |      Type      | Collation | Nullable | Default
--------+----------------+-----------+----------+---------
 id     | bigint         |           | not null |
 n1     | numeric        |           |          |
 n2     | numeric(38,10) |           |          |
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)
It can be seen that the column n1 has value 123.012346 rather than the expected value 123.0123456789.
It is also worth noting that the column n2 defined as numeric(38,10) has the proper value as it is in the source Oracle database.

Thoughts and Fix

The issue itself is that the number was truncated to 6 decimal places. It is quite easy to find the same issue was already reported in 2018, but it is still not resolved: Golden Gate truncating decimal places for number data type post migration.

Firstly, when I encountered that issue, I tried to identify what component is causing this: GoldenGate for Oracle, GoldenGate for PostgreSQL, or may be PostgreSQL itself. GoldenGate for PostgreSQL uses Progress DataDirect driver that has an interesting setting: Unbounded Numeric Scale that has a default value of 6. Well it looked like something. I added that parameter to odbc.ini:
[testdb]
Driver=/u01/app/oracle/product/19/ogg_pg_home1/lib/GGpsql25.so
Description=PostgreSQL 12.6
Database=testdb
HostName=ora
PortNumber=5432
TransactionErrorBehavior=2
UnboundedNumericScale=10
Having rerun the load from scratch, I finally got the correct number in PostgreSQL:
testdb=# select * from tc.test;
 id |       n1       |       n2
----+----------------+----------------
  0 | 123.0123456789 | 123.0123456789
(1 row)

Conclusion

The ODBC driver used by GoldenGate truncates unbounded numeric columns to 6 decimal places by default. Thankfully, there is an option to alter it: Unbounded Numeric Scale. I am disposed to think that it is usually more reliable to specify a certain scale in the target database than relying on such settings.

суббота, 27 февраля 2021 г.

Create Standby Skip Pluggable Database

Goal

Let us assume there is a need to create a standby database that excludes some pluggable databases of the primary, e.g. the PDBs must not be restored during standby database creation, and the standby will ignore any redo related to them.
It is known that the DUPLICATE command does not allow that: Backup and Recovery Reference
Note: RMAN does not support partial PDB duplication. Therefore, you cannot use the SKIP TABLESPACE, TABLESPACE, SKIP PLUGGABLE DATABASE, and PLUGGABLE DATABASE options when creating a standby database.
This blog post describes how the desired can be accomplished.

Idea

  • Skip the tablespaces of the pluggable databases that should not be protected by Data Guard: SKIP [FOREVER] TABLESPACE clause of the RESTORE command can be used
  • Drop the excluded data files on the standby
  • Set ENABLED_PDBS_ON_STANDBY to apply redo only to certain PDBs

Setup

Primary and Standby

  • db_name: orcl
  • Oracle version: 19.10

Primary

  • Hostname: servera
  • db_unique_name: orcla
  • instance_name: orcla1
  • Pluggable Databases: PDB1-3

Standby

  • Hostname: serverb
  • db_unique_name: orclb
  • instance_name: orclb1
  • Pluggable Database: PDB1

Creating Primary

The primary database was created by the following DBCA command:
dbca -silent -createDatabase \
  -responseFile NO_VALUE \
  -gdbName orcl \
  -sid orcla1 \
  -templateName New_Database.dbt \
  -createAsContainerDatabase true \
    -numberOfPDBs 3 \
    -pdbName pdb \
    -pdbAdminPassword Oracle123 \
  -sysPassword Oracle123 \
  -systemPassword Oracle123 \
  -recoveryAreaDestination +FRA \
    -recoveryAreaSize 10000 \
  -storageType ASM \
    -datafileDestination +DATA \
  -enableArchive true \
  -memoryMgmtType AUTO_SGA \
  -dbOptions "APEX:false,DV:false,CWMLITE:false,IMEDIA:false,JSERVER:false,OMS:false,ORACLE_TEXT:false,SAMPLE_SCHEMA:false,SPATIAL:false" \
  -initParams "db_unique_name=orcla,dg_broker_start=true"
The actual output from the terminal:
[oracle@servera ~]$ dbca -silent -createDatabase \
>   -responseFile NO_VALUE \
>   -gdbName orcl \
>   -sid orcla1 \
>   -templateName New_Database.dbt \
>   -createAsContainerDatabase true \
>     -numberOfPDBs 3 \
>     -pdbName pdb \
>     -pdbAdminPassword Oracle123 \
>   -sysPassword Oracle123 \
>   -systemPassword Oracle123 \
>   -recoveryAreaDestination +FRA \
>     -recoveryAreaSize 10000 \
>   -storageType ASM \
>     -datafileDestination +DATA \
>   -enableArchive true \
>   -memoryMgmtType AUTO_SGA \
>   -dbOptions "APEX:false,DV:false,CWMLITE:false,IMEDIA:false,JSERVER:false,OMS:false,ORACLE_TEXT:false,SAMPLE_SCHEM
A:false,SPATIAL:false" \
>   -initParams "db_unique_name=orcla,dg_broker_start=true"
[WARNING] [DBT-06801] Specified Fast Recovery Area size (10,000 MB) is less than the recommended value.
   CAUSE: Fast Recovery Area size should at least be three times the database size (8,897 MB).
   ACTION: Specify Fast Recovery Area Size to be at least three times the database size.
[WARNING] [DBT-06801] Specified Fast Recovery Area size (10,000 MB) is less than the recommended value.
   CAUSE: Fast Recovery Area size should at least be three times the database size (4,502 MB).
   ACTION: Specify Fast Recovery Area Size to be at least three times the database size.
Prepare for db operation
6% complete
Registering database with Oracle Restart
10% complete
Creating and starting Oracle instance
12% complete
16% complete
Creating database files
17% complete
23% complete
Creating data dictionary views
25% complete
29% complete
32% complete
34% complete
36% complete
38% complete
42% complete
Creating cluster database views
43% complete
55% complete
Completing Database Creation
59% complete
61% complete
Creating Pluggable Databases
65% complete
69% complete
73% complete
81% complete
Executing Post Configuration Actions
100% complete
Database creation complete. For details check the logfiles at:
 /u01/app/oracle/cfgtoollogs/dbca/orcla.
Database Information:
Global Database Name:orcla
System Identifier(SID):orcla1
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/orcla/orcla.log" for further details.
[oracle@servera ~]$
There are three pluggable databases created: PDB1-3. PDB1 should be replicated as usual. Both PDB2 and PDB3 should not be available on standby.

Creating Standby

Adding Static Service to Listener

I edited /u01/app/19.3.0/grid/network/admin/listener.ora and added a static registration for my standby instance:
SID_LIST_LISTENER=
  (SID_LIST=
      (SID_DESC=
         (GLOBAL_DBNAME=orclb)
         (SID_NAME=orclb1)
         (ORACLE_HOME=/u01/app/oracle/product/19.3.0/dbhome_1)
       )
   )

Copying password file

[oracle@serverb ~]$ scp servera:$ORACLE_HOME/dbs/orapworcla1 $ORACLE_HOME/dbs/orapworclb1
Warning: Permanently added 'servera' (ECDSA) to the list of known hosts.
orapworcla1                                                                        100% 2048   860.9KB/s   00:00

Bootstrap init.ora

db_name=orcl
db_unique_name=orclb
instance_name=orclb1
enable_pluggable_database=true
db_recovery_file_dest=+FRA
db_recovery_file_dest_size=10G
dg_broker_start=true
enabled_pdbs_on_standby=pdb1
Only PDB1 will be enabled on the standby database.

Starting Standby Instance

[oracle@serverb ~]$ sqlplus / as sysdba

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Feb 26 15:00:55 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2020, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> startup nomount pfile=/tmp/init.ora
ORACLE instance started.

Total System Global Area  457176832 bytes
Fixed Size                  9135872 bytes
Variable Size             306184192 bytes
Database Buffers          134217728 bytes
Redo Buffers                7639040 bytes
SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.10.0.0.0

Restoring Standby Control File and Mounting Standby Database

[oracle@serverb ~]$ rman

Recovery Manager: Release 19.0.0.0.0 - Production on Fri Feb 26 15:01:04 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

RMAN> connect target sys/Oracle123@servera/orcla

connected to target database: ORCL (DBID=1593309578)

RMAN> connect auxiliary sys/Oracle123@serverb/orclb

connected to auxiliary database: ORCL (not mounted)

RMAN> run
{
   sql clone 'create spfile from memory';
   shutdown clone immediate;
   startup clone nomount;
   restore clone from service  'servera/orcla' standby controlfile;
}

2> 3> 4> 5> 6> 7>
using target database control file instead of recovery catalog
sql statement: create spfile from memory

Oracle instance shut down

connected to auxiliary database (not started)
Oracle instance started

Total System Global Area     457176832 bytes

Fixed Size                     9135872 bytes
Variable Size                306184192 bytes
Database Buffers             134217728 bytes
Redo Buffers                   7639040 bytes

Starting restore at 26.02.2021 15:02:27
allocated channel: ORA_AUX_DISK_1
channel ORA_AUX_DISK_1: SID=76 device type=DISK

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: restoring control file
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:02
output file name=+FRA/ORCLB/CONTROLFILE/current.256.1065538951
Finished restore at 26.02.2021 15:02:31

RMAN>
RMAN> run
{
   sql clone 'alter database mount standby database';
}

2> 3> 4>
sql statement: alter database mount standby database

RMAN>

Restoring Standby Database

Here are the primary data files:
alter session set "_exclude_seed_cdb_view"=false;

select p.pdb_name pdb_name, f.file_id, f.file_name
  from cdb_data_files f, dba_pdbs p
 where p.pdb_id(+) = f.con_id
 order by f.file_id, p.pdb_id;


PDB_NAME      FILE_ID FILE_NAME
---------- ---------- --------------------------------------------------------------------------------
                    1 +DATA/ORCLA/DATAFILE/system.260.1065528853
PDB$SEED            2 +DATA/ORCLA/BC3D34CE047E62F0E0530600000AF770/DATAFILE/system.261.1065528859
                    3 +DATA/ORCLA/DATAFILE/sysaux.262.1065528863
PDB$SEED            4 +DATA/ORCLA/BC3D34CE047E62F0E0530600000AF770/DATAFILE/sysaux.263.1065528865
                    5 +DATA/ORCLA/DATAFILE/undotbs1.264.1065528867
PDB$SEED            6 +DATA/ORCLA/BC3D34CE047E62F0E0530600000AF770/DATAFILE/undotbs1.265.1065528869
                    7 +DATA/ORCLA/DATAFILE/users.268.1065528883
PDB1                8 +DATA/ORCLA/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/system.272.1065531169
PDB1                9 +DATA/ORCLA/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/sysaux.270.1065531169
PDB1               10 +DATA/ORCLA/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/undotbs1.271.1065531169
PDB1               11 +DATA/ORCLA/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/users.274.1065531177
PDB2               12 +DATA/ORCLA/BC3DBF5BB0310A08E0530600000A91B8/DATAFILE/system.277.1065531179
PDB2               13 +DATA/ORCLA/BC3DBF5BB0310A08E0530600000A91B8/DATAFILE/sysaux.275.1065531179
PDB2               14 +DATA/ORCLA/BC3DBF5BB0310A08E0530600000A91B8/DATAFILE/undotbs1.276.1065531179
PDB2               15 +DATA/ORCLA/BC3DBF5BB0310A08E0530600000A91B8/DATAFILE/users.279.1065531185
PDB3               16 +DATA/ORCLA/BC3DBFD5112A0AB0E0530600000AE27E/DATAFILE/system.282.1065531187
PDB3               17 +DATA/ORCLA/BC3DBFD5112A0AB0E0530600000AE27E/DATAFILE/sysaux.280.1065531187
PDB3               18 +DATA/ORCLA/BC3DBFD5112A0AB0E0530600000AE27E/DATAFILE/undotbs1.281.1065531187
PDB3               19 +DATA/ORCLA/BC3DBFD5112A0AB0E0530600000AE27E/DATAFILE/users.284.1065531193
All data files of pluggable databases PDB2 and PDB3 should not be restored on the standby site, so that I would like to skip data files 12-19.
The command to restore the standby database and its output are as follows:
run
{
   set newname for tempfile 1 to '+DATA';
   set newname for tempfile 2 to '+DATA';
   set newname for tempfile 3 to '+DATA';
   set newname for tempfile 4 to '+DATA';
   set newname for tempfile 5 to '+DATA';
   switch clone tempfile all;
   set newname for database to '+DATA';
   restore
   from service 'servera/orcla' clone database
   skip forever tablespace
     pdb2:sysaux, pdb2:system, pdb2:undotbs1, pdb2:users,
     pdb3:sysaux, pdb3:system, pdb3:undotbs1, pdb3:users
   ;
}

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

executing command: SET NEWNAME

renamed tempfile 1 to +DATA in control file
renamed tempfile 2 to +DATA in control file
renamed tempfile 3 to +DATA in control file
renamed tempfile 4 to +DATA in control file
renamed tempfile 5 to +DATA in control file

executing command: SET NEWNAME

Starting restore at 26.02.2021 15:03:50
using channel ORA_AUX_DISK_1

channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00001 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00002 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00003 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:07
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00004 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00005 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00006 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:04
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00007 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00008 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00009 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00010 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:03
channel ORA_AUX_DISK_1: starting datafile backup set restore
channel ORA_AUX_DISK_1: using network backup set from service servera/orcla
channel ORA_AUX_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_AUX_DISK_1: restoring datafile 00011 to +DATA
channel ORA_AUX_DISK_1: restore complete, elapsed time: 00:00:01
Finished restore at 26.02.2021 15:04:33

RMAN>
It can be seen that data files 12-19 were not restored.

Updating Standby Control File

I reconnect to the standby database and run SWITCH TO COPY commands:
RMAN>

Recovery Manager complete.
[oracle@serverb ~]$ rman target sys/Oracle123@serverb/orclb

Recovery Manager: Release 19.0.0.0.0 - Production on Fri Feb 26 15:06:55 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

connected to target database: ORCL (DBID=1593309578, not open)

RMAN> switch database root to copy;

Starting implicit crosscheck backup at 26.02.2021 15:06:59
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=76 device type=DISK
Crosschecked 1 objects
Finished implicit crosscheck backup at 26.02.2021 15:06:59

Starting implicit crosscheck copy at 26.02.2021 15:06:59
using channel ORA_DISK_1
Crosschecked 11 objects
Finished implicit crosscheck copy at 26.02.2021 15:07:00

searching for all files in the recovery area
cataloging files...
no files cataloged

datafile 1 switched to datafile copy "+DATA/ORCLB/DATAFILE/system.257.1065539031"
datafile 3 switched to datafile copy "+DATA/ORCLB/DATAFILE/sysaux.258.1065539045"
datafile 5 switched to datafile copy "+DATA/ORCLB/DATAFILE/undotbs1.259.1065539055"
datafile 7 switched to datafile copy "+DATA/ORCLB/DATAFILE/users.260.1065539063"

RMAN> switch pluggable database "PDB$SEED", pdb1 to copy;

datafile 2 switched to datafile copy "+DATA/ORCLB/BC3D34CE047E62F0E0530600000AF770/DATAFILE/system.261.1065539039"
datafile 4 switched to datafile copy "+DATA/ORCLB/BC3D34CE047E62F0E0530600000AF770/DATAFILE/sysaux.262.1065539053"
datafile 6 switched to datafile copy "+DATA/ORCLB/BC3D34CE047E62F0E0530600000AF770/DATAFILE/undotbs1.256.1065539059"
datafile 8 switched to datafile copy "+DATA/ORCLB/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/system.263.1065539063"
datafile 9 switched to datafile copy "+DATA/ORCLB/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/sysaux.264.1065539067"
datafile 10 switched to datafile copy "+DATA/ORCLB/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/undotbs1.265.1065539069"
datafile 11 switched to datafile copy "+DATA/ORCLB/BC3DBECD0A3C07D3E0530600000A8D82/DATAFILE/users.266.1065539073"

Dropping Excluded Data Files

I can drop all data files of pluggable databases PDB2 and PDB3 on standby:
SQL> alter session set container=pdb2;

Session altered.

SQL> alter database datafile 12,13,14,15 offline drop;

Database altered.

SQL> alter session set container=pdb3;

Session altered.

SQL> alter database datafile 16,17,18,19 offline drop;

Creating Data Guard Configuration

[oracle@serverb ~]$ dgmgrl
DGMGRL for Linux: Release 19.0.0.0.0 - Production on Fri Feb 26 15:13:12 2021
Version 19.10.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Welcome to DGMGRL, type "help" for information.
DGMGRL> connect sys/Oracle123@servera/orcla
Connected to "orcla"
Connected as SYSDBA.
DGMGRL>
DGMGRL> create configuration orcl as primary database is orcla connect identifier is '//servera/orcla';
Configuration "orcl" created with primary database "orcla"
DGMGRL> add database orclb as connect identifier is '//serverb/orclb';
Database "orclb" added
DGMGRL> enable configuration;
Enabled.
DGMGRL> show configuration;

Configuration - orcl

  Protection Mode: MaxPerformance
  Members:
  orcla - Primary database
    Warning: ORA-16789: standby redo logs configured incorrectly

    orclb - Physical standby database
      Warning: ORA-16809: multiple warnings detected for the member

Fast-Start Failover:  Disabled

Configuration Status:
WARNING   (status updated 50 seconds ago)

DGMGRL> show database verbose orclb;

Database - orclb

  Role:               PHYSICAL STANDBY
  Intended State:     APPLY-ON
  Transport Lag:      14 seconds (computed 5 seconds ago)
  Apply Lag:          14 seconds (computed 5 seconds ago)
  Average Apply Rate: 3.65 MByte/s
  Active Apply Rate:  0 Byte/s
  Maximum Apply Rate: 0 Byte/s
  Real Time Query:    OFF
  Instance(s):
    orclb1

  Database Warning(s):
    ORA-16789: standby redo logs configured incorrectly

  Properties:
    DGConnectIdentifier             = '//serverb/orclb'
    ObserverConnectIdentifier       = ''
    FastStartFailoverTarget         = ''
    PreferredObserverHosts          = ''
    LogShipping                     = 'ON'
    RedoRoutes                      = ''
    LogXptMode                      = 'ASYNC'
    DelayMins                       = '0'
    Binding                         = 'optional'
    MaxFailure                      = '0'
    ReopenSecs                      = '300'
    NetTimeout                      = '30'
    RedoCompression                 = 'DISABLE'
    PreferredApplyInstance          = ''
    ApplyInstanceTimeout            = '0'
    ApplyLagThreshold               = '30'
    TransportLagThreshold           = '30'
    TransportDisconnectedThreshold  = '30'
    ApplyParallel                   = 'AUTO'
    ApplyInstances                  = '0'
    StandbyFileManagement           = ''
    ArchiveLagTarget                = '0'
    LogArchiveMaxProcesses          = '0'
    LogArchiveMinSucceedDest        = '0'
    DataGuardSyncLatency            = '0'
    LogArchiveTrace                 = '0'
    LogArchiveFormat                = ''
    DbFileNameConvert               = ''
    LogFileNameConvert              = ''
    ArchiveLocation                 = ''
    AlternateLocation               = ''
    StandbyArchiveLocation          = ''
    StandbyAlternateLocation        = ''
    InconsistentProperties          = '(monitor)'
    InconsistentLogXptProps         = '(monitor)'
    LogXptStatus                    = '(monitor)'
    SendQEntries                    = '(monitor)'
    RecvQEntries                    = '(monitor)'
    HostName                        = 'serverb.example.com'
    StaticConnectIdentifier         = '(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=serverb.example.com)(PORT=1521))(CONNECT_DATA=(SERVICE_NAME=orclb_DGMGRL)(INSTANCE_NAME=orclb1)(SERVER=DEDICATED)))'
    TopWaitEvents                   = '(monitor)'
    SidName                         = '(monitor)'

  Log file locations:
    Alert log               : /u01/app/oracle/diag/rdbms/orclb/orclb1/trace/alert_orclb1.log
    Data Guard Broker log   : /u01/app/oracle/diag/rdbms/orclb/orclb1/trace/drcorclb1.log

Database Status:
WARNING
I can ignore standby log file warnings. They are not important for this exercise.

Running Tests

Opening Standby Database Read-Only

Let us open the standby database:
DGMGRL> edit database orclb set state='apply-off';
Succeeded.

SQL> alter database open read only;

Database altered.

SQL> select name from v$tempfile;

NAME
--------------------------------------------------------------------------------
+DATA/ORCLB/TEMPFILE/temp.267.1065539757
+DATA/ORCLB/BC3D34CE047E62F0E0530600000AF770/TEMPFILE/temp.268.1065539759
+DATA
+DATA
+DATA

SQL> sho pdbs

    CON_ID CON_NAME                       OPEN MODE  RESTRICTED
---------- ------------------------------ ---------- ----------
         2 PDB$SEED                       READ ONLY  NO
         3 PDB1                           MOUNTED
         4 PDB2                           MOUNTED
         5 PDB3                           MOUNTED
SQL> alter pluggable database pdb1 open read only;

Pluggable database altered.

SQL> alter pluggable database pdb2 open read only;
alter pluggable database pdb2 open read only
*
ERROR at line 1:
ORA-01147: SYSTEM tablespace file 12 is offline
While PDB1 can be opened just fine, the pluggable database PDB2 cannot be opened as expected. Neither can PDB3.

Enabling standby recovery:
DGMGRL> edit database orclb set state='apply-on';
Succeeded.

Testing Data Guard Replication

Finally, I can check that the data guard replication works as expected:
STANDBY> alter session set container=pdb1;

Session altered.

STANDBY> select count(*) from new_table;
select count(*) from new_table
                     *
ERROR at line 1:
ORA-00942: table or view does not exist

PRIMARY> alter session set container=pdb1;

Session altered.

PRIMARY> create table new_table as select * from dba_objects;

Table created.

PRIMARY> alter session set container=cdb$root;

Session altered.

PRIMARY> alter system archive log current;

System altered.

STANDBY> select count(*) from new_table;

  COUNT(*)
----------
     23043

Adding new Pluggable Database

A newly added pluggable database will not be replicated due to the current ENABLED_PDBS_ON_STANDBY setting. Depending on the parameter, some PDBs can be excluded and others can be replicated by default.
SQL> create pluggable database pdb4 admin user pdb_admin identified by pdb_admin;

Pluggable database created.

SQL> alter pluggable database pdb4 open;

Pluggable database altered.

SQL> alter system archive log current;

System altered.
The standby alert log shows that the PDB data files were not created:
2021-02-26T15:19:29.147366+00:00
PR00 (PID:4163): Media Recovery Log +FRA/ORCLB/ARCHIVELOG/2021_02_26/thread_1_seq_16.263.1065539909
Recovery created pluggable database PDB4
PDB4(6):File copy for ts-SYSTEM skipped for excluded/offline tablespace
PDB4(6):File #20 added to control file as 'UNNAMED00020'. Originally created as:
PDB4(6):'+DATA/ORCLA/BC3FCA17DFEB23DEE0530600000AF95F/DATAFILE/system.287.1065539949'
PDB4(6):because the pluggable database was created with nostandby
PDB4(6):or the tablespace belonging to the pluggable database is
PDB4(6):offline.
PDB4(6):File copy for ts-SYSAUX skipped for excluded/offline tablespace
PDB4(6):File #21 added to control file as 'UNNAMED00021'. Originally created as:
PDB4(6):'+DATA/ORCLA/BC3FCA17DFEB23DEE0530600000AF95F/DATAFILE/sysaux.285.1065539949'
PDB4(6):because the pluggable database was created with nostandby
PDB4(6):or the tablespace belonging to the pluggable database is
PDB4(6):offline.
PDB4(6):File copy for ts-UNDOTBS1 skipped for excluded/offline tablespace
PDB4(6):File #22 added to control file as 'UNNAMED00022'. Originally created as:
PDB4(6):'+DATA/ORCLA/BC3FCA17DFEB23DEE0530600000AF95F/DATAFILE/undotbs1.286.1065539949'
PDB4(6):because the pluggable database was created with nostandby
PDB4(6):or the tablespace belonging to the pluggable database is
PDB4(6):offline.
PDB4(6):File copy for ts-TEMP skipped for excluded/offline tablespace
PR00 (PID:4163): Media Recovery Waiting for T-1.S-17 (in transit)

Conclusion

Oracle already has enough features to create a standby database that includes subset PDBs of the primary. Although it is possible to create a clone (non-standby) that excludes certain PDBs using the DUPLICATE command, the SKIP PLUGGABLE DATABASE clause cannot be used while duplicating for standby. Instead, this blog post provides an alternative solution. Another option is to use Refreshable Clone PDBs - they can be considered as a standby replacement with certain restrictions.

The situation itself reminds me the Snapshot Standby Feature. I used it even before Oracle introduced the feature with a much of fuss. Depending on the customers' demand, Oracle can extend DUPLICATE DATABASE FOR STANDBY to start supporting the SKIP PLUGGABLE DATABASE clause.

среда, 24 февраля 2021 г.

CHA Alert CHA_DBWR_CKPT_IO_COMPLETE in OEM

A customer was getting the following alerts from Oracle Enterprise Manager (OEM):
Host=rac1.example.com
Target type=Cluster
Target name=myrac
Message=DB Checkpoint Performance on Host rac1 Database/Cluster racdba Instance racdba1.
The Cluster Health Advisor (CHA) detected that Database Writer (DBW) checkpoints were slow because the database writes took longer than expected to complete.
Increase the number of DBWR processes. Add additional disks to the disk group for the database.
Relocate the database files to faster disks or to Solid State Devices.
If the storage subsystem supports a storage write back cache, check that the storage cache is functioning properly.
Severity=Warning
Event reported time=Feb 23, 2021 12:04:53 PM GMT
Operating System=Linux
Platform=x86_64
Event Type=Metric Alert
Event name=cha_alerts:cha_alert_level
Metric Group=CHA Alerts
Metric=Alert Level
Metric value=Warning
Key Value=CHA_INCIDENT_STATE_CHANGE_DATABASE_racdba_racdba1_CHA_DBWR_CKPT_IO_COMPLETE
Key Column 1=Key
Rule Name=SYSMAN Monitor Targets,rule 61
Rule Owner=SYSMAN
Update Details:
DB Checkpoint Performance on Host rac1 Database/Cluster racdba Instance racdba1.
The Cluster Health Advisor (CHA) detected that Database Writer (DBW) checkpoints were slow because the database writes took longer than expected to complete.
Increase the number of DBWR processes. Add additional disks to the disk group for the database.
Relocate the database files to faster disks or to Solid State Devices.
If the storage subsystem supports a storage write back cache, check that the storage cache is functioning properly.
I observed some infamous checkpoint not complete messages in the alert log, so I suspected that they might be the real cause.
Thus, I created quite small log files 20MB in size:
SQL> select group#, thread#, bytes from v$log;

    GROUP#    THREAD#      BYTES
---------- ---------- ----------
         5          1   20971520
         6          1   20971520
         7          1   20971520
         8          2   20971520
         9          2   20971520
        10          2   20971520

6 rows selected.
Then, I wrote a script that generates a lot of redo:
create table t
as
select o.*
  from dba_objects o,
       xmltable('1 to 50');

update t
   set object_name=object_name;
I ran the script and here is an excerpt from the alert log showing checkpoint not complete messages:
2021-02-23T12:04:47.234452+00:00
Thread 1 advanced to log sequence 114 (LGWR switch),  current SCN: 7677512
  Current log# 6 seq# 114 mem# 0: +DATA/RACDBA/ONLINELOG/group_6.290.1065267481
2021-02-23T12:04:47.506174+00:00
PDB(3):Resize operation completed for file# 12, old size 645120K, new size 650240K
2021-02-23T12:04:47.848002+00:00
Thread 1 cannot allocate new log, sequence 115
Checkpoint not complete
  Current log# 6 seq# 114 mem# 0: +DATA/RACDBA/ONLINELOG/group_6.290.1065267481
2021-02-23T12:04:50.392351+00:00
Thread 1 advanced to log sequence 115 (LGWR switch),  current SCN: 7677540
  Current log# 7 seq# 115 mem# 0: +DATA/RACDBA/ONLINELOG/group_7.291.1065267483
2021-02-23T12:04:50.661220+00:00
PDB(3):Resize operation completed for file# 12, old size 650240K, new size 655360K
2021-02-23T12:04:50.993950+00:00
Thread 1 cannot allocate new log, sequence 116
Checkpoint not complete
  Current log# 7 seq# 115 mem# 0: +DATA/RACDBA/ONLINELOG/group_7.291.1065267483
2021-02-23T12:04:53.576282+00:00
Thread 1 advanced to log sequence 116 (LGWR switch),  current SCN: 7677574
  Current log# 5 seq# 116 mem# 0: +DATA/RACDBA/ONLINELOG/group_5.289.1065267477
2021-02-23T12:04:53.845381+00:00
PDB(3):Resize operation completed for file# 12, old size 655360K, new size 660480K
2021-02-23T12:04:54.183174+00:00
Thread 1 cannot allocate new log, sequence 117
Checkpoint not complete
  Current log# 5 seq# 116 mem# 0: +DATA/RACDBA/ONLINELOG/group_5.289.1065267477
2021-02-23T12:04:56.721595+00:00
Thread 1 advanced to log sequence 117 (LGWR switch),  current SCN: 7677600
  Current log# 6 seq# 117 mem# 0: +DATA/RACDBA/ONLINELOG/group_6.290.1065267481
2021-02-23T12:04:56.995439+00:00
PDB(3):Resize operation completed for file# 12, old size 660480K, new size 665600K
2021-02-23T12:04:57.332102+00:00
Thread 1 cannot allocate new log, sequence 118
Checkpoint not complete
  Current log# 6 seq# 117 mem# 0: +DATA/RACDBA/ONLINELOG/group_6.290.1065267481
Shortly afterwards, I got a new CHA alert in the OEM:

To understand what is happening, firstly, I started looking for some clues in the CHA log: /u01/app/grid/crsdata/rac1/trace/chad/ochad.trc.0, in which /u01/app/grid is ORACLE_BASE of the grid user.
[Thread-22] [ 2021-02-23 12:04:53.390 UTC ] [BNet.validate:422]  INFO: Tue Feb 23 12:04:50 UTC 2021 Decided CHA_DBWR_CKPT_IO_COMPLETE NORMAL (Belief= 94.900925  %) --> ABNORMAL (Belief= 81.315659  %)
[Thread-22] [ 2021-02-23 12:04:53.390 UTC ] [BNet.validate:429]  INFO: Tue Feb 23 12:04:50 UTC 2021      Children
[Thread-22] [ 2021-02-23 12:04:53.391 UTC ] [BNet.validate:446]  INFO: Tue Feb 23 12:04:50 UTC 2021           DBWR IO Completion= NORMAL
[Thread-22] [ 2021-02-23 12:04:53.391 UTC ] [BNet.validate:446]  INFO: Tue Feb 23 12:04:50 UTC 2021           log_file_switch_checkpoint_incompleteRANGE3= HIGH
[Thread-22] [ 2021-02-23 12:04:53.391 UTC ] [EventPublisher$EventRequest.dump:193]  INFO: type:DATABASE, target:racdba, host:rac1, instance:racdba1, 
problem:CHA_DBWR_CKPT_IO_COMPLETE, state:ABNORMAL, time:2021-02-23 12:04:50,
evidence:[{"signal":"log_file_switch_checkpoint_incomplete", "fd":"log_file_switch_checkpoint_incompleteRANGE3", "state":"HIGH", "value":1.09745E06, "pred":null, "since":"2021-02-23 12:04:50"}]
Based on the evidence above, I can conclude that CHA is checking the log file switch (checkpoint incomplete) wait event:
SQL> select event, total_waits, total_timeouts, time_waited_micro
  2    from v$system_event
  3   where event='log file switch (checkpoint incomplete)'
  4  /

EVENT                                    TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED_MICRO
---------------------------------------- ----------- -------------- -----------------
log file switch (checkpoint incomplete)          166              0         196362926
The following process accesses this file:
[grid@rac1 grid]$ fuser /u01/app/grid/crsdata/rac1/trace/chad/ochad.trc.0
/u01/app/grid/crsdata/rac1/trace/chad/ochad.trc.0: 13858
[grid@rac1 grid]$ ps fww -p 13858
  PID TTY      STAT   TIME COMMAND
13858 ?        Sl     0:37 /u01/app/19.3.0/grid/jdk/bin/java -server -Xms30M -Xmx512M 
-Djava.awt.headless=true -Ddisable.checkForUpdate=true -DTRACING.ENABLED=false 
-XX:ParallelGCThreads=1 -cp /u01/app/19.3.0/grid/jlib/cha.jar:/u01/app/19.3.0/grid/jlib/chaconfig.jar:/u01/app/19.3.0/grid/jlib/cha-diag-msg.jar:/u01/app/19.3.0/grid/jlib/clsce.jar:/u01/app/19.3.0/grid/jlib/srvm.jar:/u01/app/19.3.0/grid/jlib/srvmhas.jar:/u01/app/19.3.0/grid/jlib/srvmasm.jar:/u01/app/19.3.0/grid/jlib/netcfg.jar:/u01/app/19.3.0/grid/jdbc/lib/ojdbc8.jar:/u01/app/19.3.0/grid/ucp/lib/ucp.jar:/u01/app/19.3.0/grid/jlib/fte.jar:/u01/app/19.3.0/grid/jlib/jai_core.jar:/u01/app/19.3.0/grid/jlib/mlibwrapper_jai.jar:/u01/app/19.3.0/grid/jlib/vecmath.jar:/u01/app/19.3.0/grid/jlib/jai_codec.jar:/u01/app/19.3.0/grid/jlib/jh.jar
oracle.cha.server.CHADDriver
The corresponding CRS resource is ora.chad:
[grid@rac1 trace]$ crsctl stat res ora.chad -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.chad
               ONLINE  ONLINE       rac1                     STABLE
               ONLINE  ONLINE       rac2                     STABLE
--------------------------------------------------------------------------------
It can be managed by srvctl commands: srvctl stop/status/start cha.

CHA is tightly integrated with the Grid Infrastructure Management Repository (GIMR). In fact, most chactl commands cannot work without it. However, I do not use GIMR, and it appears that some CHA functionality is still available, at least part of it that can publish database related events to OEM. It is explained in the following whitepaper: Oracle Database 12c Rel. 2 Cluster Health Advisor - Deep Dive How it Works and How to Use It

Conclusion

I do not think that the message produced by the CHA_DBWR_CKPT_IO_COMPLETE alert has appropriate solutions:
DB Checkpoint Performance on Host rac1 Database/Cluster racdba Instance racdba1. The Cluster Health Advisor (CHA) detected that Database Writer (DBW) checkpoints were slow because the database writes took longer than expected to complete. Increase the number of DBWR processes. Add additional disks to the disk group for the database. Relocate the database files to faster disks or to Solid State Devices. If the storage subsystem supports a storage write back cache, check that the storage cache is functioning properly.
It is usually much easier to increase the total redo size, either by adding additional log groups, changing the size of the existing ones, or both. All of these actions give more chances to DBWR to catch up.

When there are CHA alerts in OEM, it makes sense to check the CHA log: /u01/app/grid/crsdata/rac1/trace/chad/ochad.trc.0. It has some invaluable information that could reveal exactly the cause.

вторник, 23 февраля 2021 г.

SU bypassed: More than 1 column in connect condition

A recent thread on oracle-l discussed a CBO regression that was encountered following 19.8->19.9 Release Update application.
A simplified example that demonstrates the observed behavior:
create table departments
as
select rownum department_id,
       10 manager_count,
       100 employee_count
  from dual
  connect by level<=10;

create table employees
as
select rownum employee_id,
       department_id,
       lpad('x', 100, 'x') name
  from departments,
       lateral(
         select level 
           from dual
           connect by level<=employee_count
       )(+);

alter session set events 'trace[sql_optimizer.*]';
explain plan for
select *
  from departments d
 where manager_count > employee_count
                       + (select --+ unnest
                                 count(*)
                            from employees e
                           where e.department_id = d.department_id);
The last query has the following execution plans in 19.8 and 19.9 correspondingly:
-- ----------------------------------------------------------------------------------
-- 19.8
-- ----------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 1545815388

------------------------------------------------------------------------------------
| Id  | Operation            | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |     1 |    35 |    11  (10)| 00:00:01 |
|*  1 |  HASH JOIN           |             |     1 |    35 |    11  (10)| 00:00:01 |
|   2 |   VIEW               | VW_SQ_1     |    10 |   260 |     8  (13)| 00:00:01 |
|   3 |    HASH GROUP BY     |             |    10 |    30 |     8  (13)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| EMPLOYEES   |  1000 |  3000 |     7   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | DEPARTMENTS |    10 |    90 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_0"="D"."DEPARTMENT_ID")
       filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+"COUNT(*)")

18 rows selected.

-- ----------------------------------------------------------------------------------
-- 19.9
-- ----------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 1015098386

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |     1 |     9 |    38   (0)| 00:00:01 |
|*  1 |  FILTER             |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL | DEPARTMENTS |    10 |    90 |     3   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |             |     1 |     3 |            |          |
|*  4 |    TABLE ACCESS FULL| EMPLOYEES   |   100 |   300 |     7   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+ (SELECT /*+ UNNEST */
              COUNT(*) FROM "EMPLOYEES" "E" WHERE "E"."DEPARTMENT_ID"=:B1))
   4 - filter("E"."DEPARTMENT_ID"=:B1)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   3 -  SEL$2
         U -  unnest / Failed basic validity checks

25 rows selected.
It can be seen that subquery unnesting does not happen in 19.9 anymore. Even worse than that, the outline/SQL Plan Baseline from 19.8 fails to reproduce in 19.9. That change by itself can have an adverse effect on the existing queries, e.g.:
SQL> select --+ gather_plan_statistics
  2         *
  3    from departments d
  4   where manager_count > employee_count
  5                         + (select --+ unnest
  6                                   count(*)
  7                              from employees e
  8                             where e.department_id = d.department_id);

no rows selected

SQL> select * from dbms_xplan.display_cursor(format=> 'allstats last');

-- ----------------------------------------------------------------------------------
-- 19.8
-- ----------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8gcr6b9xkkqv0, child number 0
-------------------------------------

Plan hash value: 1545815388

----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |      1 |        |      0 |00:00:00.01 |      20 |     16 |       |       |          |
|*  1 |  HASH JOIN           |             |      1 |      1 |      0 |00:00:00.01 |      20 |     16 |  2078K|  2078K|  551K (0)|
|   2 |   VIEW               | VW_SQ_1     |      1 |     10 |     10 |00:00:00.01 |      18 |     16 |       |       |          |
|   3 |    HASH GROUP BY     |             |      1 |     10 |     10 |00:00:00.01 |      18 |     16 |  1558K|  1558K|  666K (0)|
|   4 |     TABLE ACCESS FULL| EMPLOYEES   |      1 |   1000 |   1000 |00:00:00.01 |      18 |     16 |       |       |          |
|   5 |   TABLE ACCESS FULL  | DEPARTMENTS |      1 |     10 |     10 |00:00:00.01 |       2 |      0 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_0"="D"."DEPARTMENT_ID")
       filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+"COUNT(*)")


27 rows selected.

-- ----------------------------------------------------------------------------------
-- 19.9
-- ----------------------------------------------------------------------------------

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8gcr6b9xkkqv0, child number 0
-------------------------------------

Plan hash value: 1015098386

------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |      1 |        |      0 |00:00:00.01 |     182 |     16 |
|*  1 |  FILTER             |             |      1 |        |      0 |00:00:00.01 |     182 |     16 |
|   2 |   TABLE ACCESS FULL | DEPARTMENTS |      1 |     10 |     10 |00:00:00.01 |       2 |      0 |
|   3 |   SORT AGGREGATE    |             |     10 |      1 |     10 |00:00:00.01 |     180 |     16 |
|*  4 |    TABLE ACCESS FULL| EMPLOYEES   |     10 |    100 |   1000 |00:00:00.01 |     180 |     16 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+)
   4 - filter("E"."DEPARTMENT_ID"=:B1)


26 rows selected.
The buffer gets goes from 20 in 19.8 to 182 in 19.9 for this simple query.
The 10053 trace file provides more details regarding that SU bypass:
SQE: Query after SQ elimination:******* UNPARSED QUERY IS *******
SELECT "D"."DEPARTMENT_ID" "DEPARTMENT_ID",
       "D"."MANAGER_COUNT" "MANAGER_COUNT",
       "D"."EMPLOYEE_COUNT" "EMPLOYEE_COUNT"
  FROM "TC"."DEPARTMENTS" "D"
 WHERE "D"."MANAGER_COUNT">"D"."EMPLOYEE_COUNT"
                          + (SELECT /*+ UNNEST */ 
                                    COUNT(*) "COUNT(*)"
                               FROM "TC"."EMPLOYEES" "E"
                              WHERE "E"."DEPARTMENT_ID"="D"."DEPARTMENT_ID")
SU: Considering subquery unnesting in query block SEL$1 (#0)
********************
Subquery Unnest (SU)
********************
SU:   Checking validity of unnesting subquery SEL$2 (#0)
SU:     SU bypassed: More than 1 column in connect condition.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.
I believe the SU bypass was introduced by the fix for bug 30593046 - A query having a scalar subquery returned a wrong result (Doc ID 30593046.8).
To demonstrate that, let us add a new department to the DEPARTMENTS table that has no corresponding row in EMPLOYEES:
SQL> insert into departments values (100, 1, 0);

1 row created.

SQL>
SQL> select --+ gather_plan_statistics
  2         *
  3    from departments d
  4   where manager_count > employee_count
  5                         + (select --+ unnest
  6                                   count(*)
  7                              from employees e
  8                             where e.department_id = d.department_id);
  
-- ----------------------------------------------------------------------------------
-- 19.8 - PRODUCES WRONG RESULTS! no rows selected 
-- ----------------------------------------------------------------------------------

no rows selected

SQL> select * from dbms_xplan.display_cursor(format=> 'allstats last');

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8gcr6b9xkkqv0, child number 0
-------------------------------------

Plan hash value: 1545815388

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |      1 |        |      0 |00:00:00.01 |      24 |       |       |          |
|*  1 |  HASH JOIN           |             |      1 |      1 |      0 |00:00:00.01 |      24 |  2078K|  2078K|  545K (0)|
|   2 |   VIEW               | VW_SQ_1     |      1 |     10 |     10 |00:00:00.01 |      18 |       |       |          |
|   3 |    HASH GROUP BY     |             |      1 |     10 |     10 |00:00:00.01 |      18 |  1558K|  1558K|  659K (0)|
|   4 |     TABLE ACCESS FULL| EMPLOYEES   |      1 |   1000 |   1000 |00:00:00.01 |      18 |       |       |          |
|   5 |   TABLE ACCESS FULL  | DEPARTMENTS |      1 |     10 |     11 |00:00:00.01 |       6 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_0"="D"."DEPARTMENT_ID")
       filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+"COUNT(*)")


27 rows selected.


-- ----------------------------------------------------------------------------------
-- 19.9 - CORRECTLY RETURNS ONE ROW
-- ----------------------------------------------------------------------------------

DEPARTMENT_ID MANAGER_COUNT EMPLOYEE_COUNT
------------- ------------- --------------
          100             1              0

SQL> select * from dbms_xplan.display_cursor(format=> 'allstats last');

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
SQL_ID  8gcr6b9xkkqv0, child number 0
-------------------------------------

Plan hash value: 1015098386

---------------------------------------------------------------------------------------------
| Id  | Operation           | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |      1 |        |      1 |00:00:00.01 |     204 |
|*  1 |  FILTER             |             |      1 |        |      1 |00:00:00.01 |     204 |
|   2 |   TABLE ACCESS FULL | DEPARTMENTS |      1 |     10 |     11 |00:00:00.01 |       6 |
|   3 |   SORT AGGREGATE    |             |     11 |      1 |     11 |00:00:00.01 |     198 |
|*  4 |    TABLE ACCESS FULL| EMPLOYEES   |     11 |    100 |   1000 |00:00:00.01 |     198 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("MANAGER_COUNT">"EMPLOYEE_COUNT"+)
   4 - filter("E"."DEPARTMENT_ID"=:B1)


26 rows selected.
It is worth noting that a similar query that uses one column from DEPARTMENTS has a proper outer join in both 19.8 and 19.9:
SQL> explain plan for
  2  select *
  3    from departments d
  4   where employee_count > (select --+ unnest
  5                                  count(*)
  6                             from employees e
  7                            where e.department_id = d.department_id);

Explained.

SQL> select * from dbms_xplan.display();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 4272783132

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |    10 |   350 |    11  (10)| 00:00:01 |
|*  1 |  FILTER               |             |       |       |            |          |
|*  2 |   HASH JOIN OUTER     |             |    10 |   350 |    11  (10)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | DEPARTMENTS |    10 |    90 |     3   (0)| 00:00:01 |
|   4 |    VIEW               | VW_SQ_1     |    10 |   260 |     8  (13)| 00:00:01 |
|   5 |     HASH GROUP BY     |             |    10 |    30 |     8  (13)| 00:00:01 |
|   6 |      TABLE ACCESS FULL| EMPLOYEES   |  1000 |  3000 |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COUNT(*)" IS NULL AND "EMPLOYEE_COUNT">0 OR
              "EMPLOYEE_COUNT">"COUNT(*)")
   2 - access("ITEM_1"(+)="D"."DEPARTMENT_ID")

20 rows selected.
It made me think that it should be possible to obtain the desired transformation in 19.9 by rewriting the query. Sure enough, I verified that both non-mergeable view or temp table transformation can be used for that:
SQL> explain plan for
  2  with d as (
  3    select --+ no_merge
  4           d.*, manager_count - employee_count employee_diff
  5      from departments d)
  6  select *
  7    from d
  8   where employee_diff > (select --+ unnest
  9                                 count(*)
 10                            from employees e
 11                           where e.department_id = d.department_id);

Explained.

SQL> select * from dbms_xplan.display();

PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------------
Plan hash value: 1773562247

-------------------------------------------------------------------------------------
| Id  | Operation             | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |             |    10 |   780 |    11  (10)| 00:00:01 |
|*  1 |  FILTER               |             |       |       |            |          |
|*  2 |   HASH JOIN OUTER     |             |    10 |   780 |    11  (10)| 00:00:01 |
|   3 |    VIEW               |             |    10 |   520 |     3   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL | DEPARTMENTS |    10 |    90 |     3   (0)| 00:00:01 |
|   5 |    VIEW               | VW_SQ_1     |    10 |   260 |     8  (13)| 00:00:01 |
|   6 |     HASH GROUP BY     |             |    10 |    30 |     8  (13)| 00:00:01 |
|   7 |      TABLE ACCESS FULL| EMPLOYEES   |  1000 |  3000 |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("COUNT(*)" IS NULL AND "EMPLOYEE_DIFF">0 OR
              "EMPLOYEE_DIFF">"COUNT(*)")
   2 - access("ITEM_1"(+)="D"."DEPARTMENT_ID")

21 rows selected.

Conclusion

As Jonathan Lewis remarked there are usually certain reasons for such changes of the existing optimizer behavior:
Various possibilties.
a) Oracle Corp. has realised that there's a boundary condition with this pattern that could produce wrong results and blocks the transformation (e.g. are either of lastmoddate and lastmodtime declared not null - if not the unnest ought to be invalid)
b) Oracle Corp. has modified the optimizer code to produced more efficient plans in almost all cases, but the change introduces certain restrictiions that your SQL now meets (e.g. even if both lastmoddate and lastmodetime the code may now assume that to_char() or to_date() could produce a null from a non-null.
c) Some relatively simple code change has introduced a bug
Although it is quite unfortunate there is no fix control or any other backout switching bringing back the old behavior (as far as I am aware), Oracle actually introduced a fix for the wrong results issue. It might be the lesser of two evils.
The script to this post is available in my GitHub repo.

пятница, 12 февраля 2021 г.

Tracing Library Cache Locks Using _kgl_debug

A recent blog post from Nenad Noveljic prompted me to review the library cache tracing facilities available in Oracle database.

After a bit of tinkering I discovered the actual library cache lock tracing is governed by the _kgl_debug parameter.

Here is a little sample code to demonstrate that - I borrowed part of it from the excellent Nenad's blog post:
SQL> create table t (n1 integer,n2 integer);

Table created.

SQL> create index ix_t on t(n1,n2);

Index created.

SQL> exec dbms_stats.gather_table_stats ('', 'T', cascade => true)

PL/SQL procedure successfully completed.

SQL>
SQL> alter system set "_kgl_debug"=32 scope=memory;

System altered.

SQL>
SQL> alter index ix_t invisible ;

Index altered.
In a nutshell, _kgl_debug=32 appears to result in writing information about library cache locks in the trace file. Here is how it looks:
SQL> col trace_file old_v tf for a72
SQL> col dirname old_v dn for a50
SQL> col basename old_v bn for a21
SQL>
SQL> select value trace_file,
  2         substr(value, 1, instr(value, '/', -1)-1) dirname,
  3         substr(value, instr(value, '/', -1)+1) basename
  4    from v$diag_info
  5   where name='Default Trace File';

TRACE_FILE                                                               DIRNAME                                            BASENAME
------------------------------------------------------------------------ -------------------------------------------------- ---------------------
/u01/app/oracle/diag/rdbms/racdba/racdba1/trace/racdba1_ora_16440.trc    /u01/app/oracle/diag/rdbms/racdba/racdba1/trace    racdba1_ora_16440.trc

SQL>
SQL> ho tail -64 &tf.
    <Mode>N</Mode>
  </LibraryObjectLock>
</KGLTRACE>
<KGLTRACE>
  <Timestamp>2021-02-12 15:41:23.656</Timestamp>
  <SID>136</SID>
  <Function>kglLock</Function>
  <Reason>TRACELOCK</Reason>
  <Param1>0x706c20e0</Param1>
  <Param2>1</Param2>
  <LibraryHandle>
    <Address>0x67f78888</Address>
    <Hash>94e2179b</Hash>
    <LockMode>N</LockMode>
    <PinMode>0</PinMode>
    <LoadLockMode>0</LoadLockMode>
    <Status>VALD</Status>
    <ObjectName>
      <Name>select value trace_file,
       substr(value, 1, instr(value, '/', -1)-1) dirname,
       substr(value, instr(value, '/', -1)+1) basename
  from v$diag_info
 where name='Default Trace File'</Name>
      <FullHashValue>8c302f585b9b9a83239f686f94e2179b</FullHashValue>
      <Namespace>SQL AREA(00)</Namespace>
      <Type>CURSOR(00)</Type>
      <ContainerId>1</ContainerId>
      <ContainerUid>1</ContainerUid>
      <Identifier>2497845147</Identifier>
      <OwnerIdn>89</OwnerIdn>
    </ObjectName>
  </LibraryHandle>
  <LibraryObjectLock>
    <Address>0x706c20e0</Address>
    <Handle>0x67f78888</Handle>
    <Mode>N</Mode>
  </LibraryObjectLock>
</KGLTRACE>
<KGLTRACE>
  <Timestamp>2021-02-12 15:41:23.657</Timestamp>
  <SID>136</SID>
  <Function>kgllkal</Function>
  <Reason>TRACELOCK</Reason>
  <Param1>0x706c1ff8</Param1>
  <Param2>0</Param2>
  <LibraryHandle>
    <Address>0x67f78328</Address>
    <Hash>0</Hash>
    <LockMode>N</LockMode>
    <PinMode>0</PinMode>
    <LoadLockMode>0</LoadLockMode>
    <Status>VALD</Status>
    <Name>
      <Namespace>SQL AREA(00)</Namespace>
      <Type>CURSOR(00)</Type>
      <ContainerId>3</ContainerId>
    </Name>
  </LibraryHandle>
  <LibraryObjectLock>
    <Address>0x706c1ff8</Address>
    <Handle>0x67f78328</Handle>
    <Mode>N</Mode>
  </LibraryObjectLock>
</KGLTRACE>
It is quite convenient that the trace data is provided in XML - it can be parsed easily:
SQL> create or replace directory trace_dir as '&dn.';
old   1: create or replace directory trace_dir as '&dn.'
new   1: create or replace directory trace_dir as '/u01/app/oracle/diag/rdbms/racdba/racdba1/trace'

Directory created.

SQL>
SQL> create table trace_ext (
  2    trace_data clob
  3  )
  4  organization external (
  5    type oracle_loader
  6    default directory trace_dir
  7    access parameters (
  8      records
  9      xmltag ("KGLTRACE")
 10      fields ldrtrim
 11      missing field values are null (
 12        trace_data char(1000000)
 13      )
 14    )
 15    location ('&bn.')
 16  )
 17  reject limit unlimited;
old  15:   location ('&bn.')
new  15:   location ('racdba1_ora_16440.trc')

Table created.

SQL>
SQL> select count(*) from trace_ext;

  COUNT(*)
----------
       275

SQL> ho grep KGLTRACE &tf. | wc -l
550
Looks good. 275 rows in the external table for 550 KGLTRACE tags - these are both opening and closing tags, so that the number of rows matches precisely the number of XML elements in the trace file.

Finally, we can retrieve the information about kgllkal calls for interesting objects:
SQL> select xt."Timestamp",
  2         xt."Function",
  3         xt."Reason",
  4         xt."Param1",
  5         lh."LockMode",
  6         lh."PinMode",
  7         obj."Name",
  8         obj."Namespace",
  9         obj."Type",
 10         lol."Address" lol_address,
 11         lol."Mode" lol_mode
 12    from trace_ext,
 13         xmltable('/KGLTRACE' passing xmltype(trace_data)
 14           columns "Timestamp" varchar2(24),
 15                   sid number,
 16                   "Function" varchar2(20),
 17                   "Reason" varchar2(10),
 18                   "Param1" varchar2(14),
 19                   "Param2" number,
 20                   "LibraryHandle" xmltype,
 21                   "LibraryObjectLock" xmltype
 22         )(+) xt,
 23         xmltable('/LibraryHandle' passing xt."LibraryHandle"
 24           columns "Address" varchar2(10),
 25                   "Hash" varchar2(10),
 26                   "LockMode" varchar2(8),
 27                   "PinMode" varchar2(8),
 28                   "LoadLockMode" varchar2(8),
 29                   "Status" varchar2(10),
 30                   "ObjectName" xmltype
 31         )(+) lh,
 32         xmltable('/ObjectName' passing lh."ObjectName"
 33           columns "Name" varchar2(64),
 34                   "FullHashValue" varchar2(32),
 35                   "Namespace" varchar2(32),
 36                   "Type" varchar2(32),
 37                   "ContainerId" number,
 38                   "ContainerUid" number,
 39                   "Identifier" number,
 40                   "OwnerIdn" number
 41         )(+) obj,
 42         xmltable('/LibraryObjectLock' passing xt."LibraryObjectLock"
 43           columns "Address" varchar2(10),
 44                   "Handle" varchar2(10),
 45                   "Mode"   varchar2(4)
 46         )(+) lol
 47   where 1=1
 48     and obj."Name" like '%PDB.TC.%'
 49     and xt."Function"='kgllkal';

Timestamp                Function Reason     Param1         LockMode PinMode  Name            Namespace            Type       LOL_ADDRES LOL_MODE
------------------------ -------- ---------- -------------- -------- -------- --------------- -------------------- ---------- ---------- --------
2021-02-12 15:41:23.595  kgllkal  TRACELOCK  0x62da2ef0     S        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x62da2ef0 S
2021-02-12 15:41:23.598  kgllkal  TRACELOCK  0x62da2ef0     S        0        PDB.TC.T        TABLE/PROCEDURE(01)  TABLE(02)  0x62da2ef0 S
2021-02-12 15:41:23.599  kgllkal  TRACELOCK  0x62c9f8d0     S        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x62c9f8d0 S
2021-02-12 15:41:23.601  kgllkal  TRACELOCK  0x62c9f8d0     S        S        PDB.TC.T        TABLE/PROCEDURE(01)  TABLE(02)  0x62c9f8d0 S
2021-02-12 15:41:23.613  kgllkal  TRACELOCK  0x65db4480     S        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x65db4480 S
2021-02-12 15:41:23.617  kgllkal  TRACELOCK  0x65db4480     S        0        PDB.TC.T        TABLE/PROCEDURE(01)  TABLE(02)  0x65db4480 S
2021-02-12 15:41:23.618  kgllkal  TRACELOCK  0x62db5708     S        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x62db5708 S
2021-02-12 15:41:23.626  kgllkal  TRACELOCK  0x65db4480     S        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x65db4480 S
2021-02-12 15:41:23.629  kgllkal  TRACELOCK  0x65db4480     X        0        PDB.TC.T        TABLE/PROCEDURE(01)  TABLE(02)  0x65db4480 X
2021-02-12 15:41:23.632  kgllkal  TRACELOCK  0x62da2ef0     X        0        PDB.TC.IX_T     INDEX(04)            INDEX(01)  0x62da2ef0 X

10 rows selected.

Conclusion

We can trace library cache locks, or more specifically certain kgllkal calls. The resulting trace data is written to the trace file in the XML format. It can be loaded into the database for further analysis.

Usual disclaimer

This blog post is a pure speculation. Although the results might be be reasonable and suggestive, I have no idea whether or not _kgl_debug=32 covers all or most library cache locks.

Diagnosing Cluster Verification Utility (CVU) Errors

A cluster verification utility (CVU) command failed with the following error:
[grid@rac1 ~]$ cluvfy comp vdisk -n rac1 -verbose

Verifying Voting Disk ...FAILED (PRVF-5157, PRVF-5431)

Verification of Voting Disk was unsuccessful on all the specified nodes.


Failures were encountered during execution of CVU verification request "Voting Disk".

Verifying Voting Disk ...FAILED
PRVF-5157 : could not verify ASM disk group "GRID" for voting disk location
"/dev/flashgrid/racq.lun1" is available on node "rac1"
PRVF-5157 : could not verify ASM disk group "GRID" for voting disk location
"/dev/flashgrid/rac1.lun1" is available on node "rac1"
PRVF-5157 : could not verify ASM disk group "GRID" for voting disk location
"/dev/flashgrid/rac2.lun1" is available on node "rac1"
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed


CVU operation performed:      Voting Disk
Date:                         Feb 11, 2021 6:18:51 PM
CVU home:                     /u01/app/19.3.0/grid/
User:                         grid
It can be seen that the command already had the verbose option which did not give any hint about the cause of the error.
The documentation says that CVU generates trace data by default unless it has been disabled. The default trace file location is $ORACLE_BASE/crsdata/host_name/cvu. A non-default location can be specified using the CV_TRACELOC environment variable.

That is what I got in the aforementioned directory:
[grid@rac1 ~]$ cd $ORACLE_BASE/crsdata/$(hostname -s)/cvu
[grid@rac1 cvu]$ ll
total 4
drwxrwxrwt 2 grid   oinstall 4096 Feb 11 14:01 cvulog
drwxrwxrwt 2 grid   oinstall  144 Feb 11 14:03 cvutrc
drwxr-xr-x 2 grid   oinstall   78 Feb 11 18:18 grid
drwxr-xr-x 2 grid   oinstall    6 Feb 11 14:03 init
drwxr-xr-x 2 oracle oinstall   28 Feb  2 20:48 oracle
[grid@rac1 cvu]$ cd grid
[grid@rac1 grid]$ ll
total 11272
-rw-r----- 1 grid oinstall   641638 Feb 11 18:18 cvuhelper.log.0
-rw-r--r-- 1 grid oinstall        0 Jan  4 12:12 cvuhelper.log.0.lck
-rw-r--r-- 1 grid oinstall 10899074 Feb 11 18:18 cvutrace.log.0
The cvutrace.log.0 contained actual trace data in which the following lines drew my eye:
[main] [ 2021-02-11 18:18:57.681 UTC ] [ClusterConfig.block:624]  block acquired semnum=0
[main] [ 2021-02-11 18:18:57.681 UTC ] [ClusterConfig.submit:573]  Out of block
[main] [ 2021-02-11 18:18:57.681 UTC ] [ClusterConfig.submit:590]  status=true
[main] [ 2021-02-11 18:18:57.681 UTC ] [ClusterConfig.destroy:468]  destroying resources for client thread Thread[main,5,main]
[main] [ 2021-02-11 18:18:57.681 UTC ] [ResultSet.traceResultTable:1065]

ResultTable Info ===>

        contents of resultTable

Dumping Result data.
  Status     : SUCCESSFUL
  Name       : rac1
  Type       : Node
  Has Results: No
  Exp. value : null
  Act. value : null

  Errors  :
    PRVG-2043 : Command "/u01/app/19.3.0/grid/bin/kfod op=GROUPS nohdr=true " failed on node "rac1" and produced the following output:
KFOD-00101: LRM error [110] while parsing command line arguments
KFOD-00103: LRM message: LRM-00118: syntax error at '=' at the end of input
As the above output demonstrates, CVU called kfod which was failing. Thus, I decided to run it myself and got the same LRM errors:
[grid@rac1 ~]$ /u01/app/19.3.0/grid/bin/kfod op=GROUPS nohdr=true
KFOD-00101: LRM error [110] while parsing command line arguments
KFOD-00103: LRM message: LRM-00118: syntax error at '=' at the end of input
[grid@rac1 ~]$ /u01/app/19.3.0/grid/bin/kfod op=GROUPS
KFOD-00101: LRM error [110] while parsing command line arguments
KFOD-00103: LRM message: LRM-00118: syntax error at '=' at the end of input
LRM seems to be the parameter manager, but it did not ring a bell yet:
[grid@rac1 ~]$ head -9 $ORACLE_HOME/oracore/mesg/lrmus.msg
/
/ $Header: lrmus.msg 19-jun-2003.17:48:08 Exp $
/
/ Copyright (c) 1996, 2003, Oracle.  All rights reserved.
/   NAME
/     lrmus.msg - CORE error message file for the parameter manager (LRM)
/   DESCRIPTION
/     Listing of all CORE error messages for LRM
/
kfod is just a shell script that invokes its binary counterpart kfod.bin similar to what many other Grid Infrastructure (GI) utilities do.
They obviously interact with the operating system, so that we can trace system calls being invoked. That is what I did:
[grid@rac1 ~]$ strace -f -e trace=open,write -s 256 /u01/app/19.3.0/grid/bin/kfod op=GROUPS
..
<skip>
..
open("/u01/app/19.3.0/grid/nls/data/lx40011.nlb", O_RDONLY) = 7
open("/u01/app/19.3.0/grid/dbs/init+ASM1.ora", O_RDONLY) = 7
open("/u01/app/19.3.0/grid/rdbms/mesg/kfodus.msb", O_RDONLY) = 7
write(1, "KFOD-00101: LRM error [110] while parsing command line arguments\n", 65KFOD-00101: LRM error [110] while parsing command line arguments
) = 65
..
In a nutshell, the process opened /u01/app/19.3.0/grid/dbs/init+ASM1.ora. It then went to /u01/app/19.3.0/grid/rdbms/mesg/kfodus.msb, which is a binary message file. After that it put a KFOD error to stdout.
Therefore, it made sense to check that parameter file:
[grid@rac1 ~]$ cat /u01/app/19.3.0/grid/dbs/init+ASM1.ora
SPFILE=
It became obvious that the parameter file is the main culprit of these LRM errors:
KFOD-00101: LRM error [110] while parsing command line arguments
KFOD-00103: LRM message: LRM-00118: syntax error at '=' at the end of input
To confirm that, I renamed the file and reran both kfod and cluvfy commands successfully:
[grid@rac1 ~]$ mv /u01/app/19.3.0/grid/dbs/init+ASM1.ora{,.bak}
[grid@rac1 ~]$ /u01/app/19.3.0/grid/bin/kfod op=GROUPS
--------------------------------------------------------------------------------
Group          Size          Free Redundancy Name
================================================================================
   1:      10240 MB       3712 MB     NORMAL GRID
   3:      65536 MB       3712 MB     NORMAL DATA
[grid@rac1 ~]$ cluvfy comp vdisk -n rac1 -verbose

Verifying Voting Disk ...PASSED

Verification of Voting Disk was successful.

CVU operation performed:      Voting Disk
Date:                         Feb 11, 2021 6:50:38 PM
CVU home:                     /u01/app/19.3.0/grid/
User:                         grid
I still do not know why that parameter file was created, but it is a neat example of how Oracle tools can be debugged.