Страницы

четверг, 13 июня 2019 г.

srvctl config database OSDBA and OSOPER groups not defined

I have recently investigated why there are some databases in my environment which are shown with empty OSDBA or OSOPER groups:
$ srvctl config database -d orcl
Database unique name: orcl
Database name: orcl
Oracle home: /u01/app/oracle/product/db_19
Oracle user: oracle
Spfile: +DATA/ORCL/PARAMETERFILE/spfile.270.1010270597
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA,FRA
Services:
OSDBA group:
OSOPER group:
Database instance: orcl
srvctl is a usual shell script that calls the following:
# JRE Executable and Class File Variables
JRE=${JREDIR}/bin/java
..skip..
# Run srvctl
${JRE} ${JRE_OPTIONS} -DORACLE_HOME=${ORACLE_HOME} -classpath ${CLASSPATH} ${SRVM_PROPERTY_DEFS} oracle.ops.opsctl.OPSCTLDriver "$@"
That's just a Java class call. We call oracle.ops.opsctl.OPSCTLDriver passing command-line arguments. CLASSPATH is defined as follows:
CLASSPATH=${NETCFGJAR}:${LDAPJAR}:${JREJAR}:${SRVMJAR}:${SRVMHASJAR}:${SRVMASMJAR}:\
${EONSJAR}:${SRVCTLJAR}:${GNSJAR}:${ANTLRJAR}:${CLSCEJAR}:${CHACONFIGJAR}:${JDBCJAR}:\
${MAILJAR}:${ACTIVATIONJAR}:${JWCCREDJAR}
Those jar-variables are set in the script so it's trivial to find out all classes that are used there.
I used to use JAD to decompile them but it appears to be not in vogue and not developed anymore.
Thankfully, there are a bunch of free sites that can be used as a replacement. I personally have used this one.
It is usually advised to identify the entry jar first by looking into the jar files so as to figure out where exactly OPSCTLDriver is coming from.
Not surprisingly, it is coming from ${SRVCTLJAR} which is set to ${ORACLE_HOME}/srvm/jlib/srvctl.jar.
OPSCTLDriver calls oracle.ops.opsctl.ConfigAction that does the following:
for (Database db : dblist) {
..skip..
  if ((isUnixSystem) && (!isMgmtDB)) {
    groups = db.getGroups();
    dbaGrp = groups.get("OSDBA") == null ? "" : (String)groups.get("OSDBA");
    operGrp = groups.get("OSOPER") == null ? "" : (String)groups.get("OSOPER");
  }
Hence, the groups I am interested in are from the Database class which is set in the import section: import oracle.cluster.database.Database;
That's just an interface from srvm.jar:
public abstract interface Database
  extends SoftwareModule
{
The actual implementation is this: oracle.cluster.impl.database.DatabaseImpl.
Here are how those groups are determined:
String oracleBin = getOracleHome() + File.separator + "bin";
      Trace.out("Creating OSDBAGRPUtil with path: " + oracleBin);
      OSDBAGRPUtil grpUtil = new OSDBAGRPUtil(oracleBin);
      Map<String, String> groups = grpUtil.getAdminGroups(version());
      

      ResourcePermissionsImpl perm = (ResourcePermissionsImpl)m_crsResource.getPermissions();
      String acl = perm.getAclString();
      Map<String, List<string>> aclMap = splitACL(acl);
      
      List<String> acl_groups = (List)aclMap.get(ResourceType.ACL.GROUP.toString());
      

      String dba = (String)groups.get("SYSDBA");
      String oper = (String)groups.get("SYSOPER");
      if ((!dba.isEmpty()) && (acl_groups.contains(dba.toLowerCase()))) {
        groupMap.put("OSDBA", dba);
      }
      if ((!oper.isEmpty()) && (acl_groups.contains(oper.toLowerCase()))) {
        groupMap.put("OSOPER", oper);
      }
      return groupMap;
Having applied the same technique, it's easy to find out that OSDBAGRPUtil calls ${ORACLE_HOME}/bin/osdbagrp passing either "-d" or "-o" flags depending on what group we are interested in.
In my case, those commands returned dba and oper for OSDBA and OSOPER respectively:
$ osdbagrp -d
dba
$ osdbagrp -o
oper
Hence, this part of the if statement is true: "(!dba.isEmpty())" and the dba group is not set because of: "(acl_groups.contains(dba.toLowerCase()))".
So that is something related to ACLs which is coming from "ResourcePermissionsImpl perm = (ResourcePermissionsImpl)m_crsResource.getPermissions();".
Let's use the crsctl getperm command passing the database resource to it:
$ crsctl getperm resource ora.orcl.db
Name: ora.orcl.db
owner:oracle:rwx,pgrp:asmdba:r-x,other::r--,group:oinstall:r-x,user:oracle:rwx
That looks promising - neither dba nor oper groups are set. I ran the command below to set dba group:
$ crsctl setperm resource ora.orcl.db -u group:dba:r-x
CRS-4995:  The command 'Setperm  resource' is invalid in crsctl. Use srvctl for this command.
$ crsctl setperm resource ora.orcl.db -u group:dba:r-x -unsupported
Well, that is an Oracle Restart environment, so that I added the unsupported flag.
Once it was done, the OSDBA group was properly coming back:
$ srvctl config database -d orcl
Database unique name: orcl
Database name: orcl
Oracle home: /u01/app/oracle/product/db_19
Oracle user: oracle
Spfile: +DATA/ORCL/PARAMETERFILE/spfile.270.1010270597
Password file:
Domain:
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Disk Groups: DATA,FRA
Services:
OSDBA group: dba
OSOPER group:
Database instance: orcl

вторник, 11 июня 2019 г.

A case for DATAFILECOPY FORMAT

I was migrating several databases from AWS EC2 non-Nitro based instances to the Nitro-based ones when I came across one issue with Oracle Recovery Manager (RMAN). This blog post is about it.
The high-level process of the migration was as follows:
  1. Attach a new ASM diskgroup to the host that is to be migrated
  2. Make an initial level 0 copy of the database
  3. Roll forward the copy as many times as needed using an incremental level 1 backup
  4. When it's time to switch to the new server, roll forward the copy once again, switch logfile, backup all archivelogs covering the last backup, dismount the ASM diskgroup, mount it on the new server, and open the database (there are also controlfile and spfile copies as well as some extra steps specific to that environment)
I would rather use a physical standby or Golden Gate than that meticulously designed process I developed, albeit those alternatives were ruled out since they would require additional licenses.
At the end of the day, the final downtime was less than 30 minutes as almost everything was automated using Ansible.

I ran that procedure several times in non-Production instances without any issues, however, I got a missing file when I performed the same steps in the Production instance.
Here is how that happened.
The diskgroup configuration is the following:

DATA - db_create_file_dest
FRA - db_recovery_file_dest
MIGR - the transient ASM diskgroup to keep image copies

Let's setup a test tablespace:
SQL> create tablespace test_ts;

Tablespace created.
Make a copy of it:
RMAN> backup as copy incremental level 0 format '+MIGR' tablespace pdb:test_ts tag migr;

Starting backup at 10.06.2019 21:14:51
using channel ORA_DISK_1
channel ORA_DISK_1: starting datafile copy
input datafile file number=00016 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.276.1010610875
output file name=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.256.1010610893 tag=MIGR RECID=6 STAMP=1010610895
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:07
Finished backup at 10.06.2019 21:14:59
Then add a datafile to that tablespace:
SQL> alter tablespace test_ts add datafile;

Tablespace altered.
The final backup/recover block:
RMAN> run {
  backup incremental level 1 format '+MIGR' for recover of copy with tag migr tablespace pdb:test_ts;
  recover copy of tablespace pdb:test_ts with tag migr;
}2> 3> 4>

Starting backup at 10.06.2019 21:16:42
using channel ORA_DISK_1
no parent backup or copy of datafile 17 found
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00016 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.276.1010610875
channel ORA_DISK_1: starting piece 1 at 10.06.2019 21:16:42
channel ORA_DISK_1: finished piece 1 at 10.06.2019 21:16:43
piece handle=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/BACKUPSET/2019_06_10/nnndn1_migr_0.258.1010611003 tag=MIGR comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting datafile copy
input datafile file number=00017 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.277.1010610945
output file name=+FRA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.265.1010611003 tag=MIGR RECID=7 STAMP=1010611006
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 10.06.2019 21:16:46
Despite the fact that the format was set to '+MIGR', the copy of the new added datafile was put to the FRA:
RMAN> list copy tag migr;

specification does not match any control file copy in the repository
specification does not match any archived log in the repository
List of Datafile Copies
=======================

Key     File S Completion Time     Ckp SCN    Ckp Time            Sparse
------- ---- - ------------------- ---------- ------------------- ------
8       16   A 10.06.2019 21:16:47 1468031    10.06.2019 21:16:42 NO
        Name: +MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.256.1010610893
        Tag: MIGR
        Container ID: 3, PDB Name: PDB

7       17   A 10.06.2019 21:16:46 1468032    10.06.2019 21:16:43 NO
        Name: +FRA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.265.1010611003
        Tag: MIGR
        Container ID: 3, PDB Name: PDB
That is pretty much the same issue that I encountered while doing a test migration of that multi-terabyte database in the Production system - a few datafiles have been added between the initial level 0 and the subsequent level 1 copies.
It is the case when the DATAFILECOPY FORMAT clause can be used:
RMAN> run {
  backup incremental level 1 
    format '+MIGR' for recover of copy with tag migr 
    datafilecopy format '+MIGR'
    tablespace pdb:test_ts;
  recover copy of tablespace pdb:test_ts with tag migr;
}2> 3> 4>

Starting backup at 10.06.2019 21:21:57
using channel ORA_DISK_1
no parent backup or copy of datafile 19 found
channel ORA_DISK_1: starting incremental level 1 datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00018 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.277.1010611243
channel ORA_DISK_1: starting piece 1 at 10.06.2019 21:21:57
channel ORA_DISK_1: finished piece 1 at 10.06.2019 21:21:58
piece handle=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/BACKUPSET/2019_06_10/nnndn1_migr_0.259.1010611317 tag=MIGR comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
channel ORA_DISK_1: starting datafile copy
input datafile file number=00019 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.276.1010611299
output file name=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.260.1010611319 tag=MIGR RECID=10 STAMP=1010611321
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 10.06.2019 21:22:01
That's just another flexibility that Oracle provides. If you think about it, it makes complete sense - image copies and backupsets can be stored separately.
Another way to specify location for image copies in that case is an explicit channel configuration:
RMAN> run {
  # allocate as many channels as needed
  allocate channel c1 device type disk format '+MIGR';
  backup incremental level 1 for recover of copy with tag migr tablespace pdb:test_ts;
  recover copy of tablespace pdb:test_ts with tag migr;
}2> 3> 4> 5>

released channel: ORA_DISK_1
allocated channel: c1
channel c1: SID=94 device type=DISK

Starting backup at 10.06.2019 21:33:43
no parent backup or copy of datafile 21 found
channel c1: starting incremental level 1 datafile backup set
channel c1: specifying datafile(s) in backup set
input datafile file number=00020 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.276.1010611951
channel c1: starting piece 1 at 10.06.2019 21:33:44
channel c1: finished piece 1 at 10.06.2019 21:33:45
piece handle=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/BACKUPSET/2019_06_10/nnndn1_migr_0.256.1010612025 tag=MIGR comment=NONE
channel c1: backup set complete, elapsed time: 00:00:01
channel c1: starting datafile copy
input datafile file number=00021 name=+DATA/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.277.1010612011
output file name=+MIGR/ORCL/8AAFC31944116B0CE0554A7F9DE2B2FD/DATAFILE/test_ts.261.1010612025 tag=MIGR RECID=13 STAMP=1010612028
channel c1: datafile copy complete, elapsed time: 00:00:03
Finished backup at 10.06.2019 21:33:48

среда, 5 июня 2019 г.

ORA-01031 select V$RESTORE_POINT in PL/SQL

It is a well known fact that V$RESTORE_POINT requires special handling, namely the SELECT_CATALOG_ROLE should be granted to a low-privileged user trying to access this view.
I had several PL/SQL units working with V$RESTORE_POINT in a 12.1 database. Those units were owned by a user that has SELECT_CATALOG_ROLE.
Once I upgraded the database to 12.2, those units stopped working and I started getting an infamous ORA-1031 error.
This blog post is about how I fixed that issue for definer rights program units.

Here is a simple test case demonstrating the initial ORA-1031 error:
SYS@CDB$ROOT> create restore point rp_test;

Restore point created.

SYS@CDB$ROOT> alter session set container=pdb;

Session altered.

SYS@PDB> grant connect, create procedure to tc identified by tc;

Grant succeeded.

SYS@PDB> grant read on v_$restore_point to tc;

Grant succeeded.

SYS@PDB> conn tc/tc@localhost/pdb
Connected.
TC@PDB>
TC@PDB> create or replace procedure p_test
  2  is
  3  begin
  4    for test_rec in (
  5      select *
  6        from v$restore_point)
  7    loop
  8      dbms_output.put_line(test_rec.name);
  9    end loop;
 10  end;
 11  /

Procedure created.

TC@PDB>
TC@PDB> set serverout on
TC@PDB>
TC@PDB> exec p_test
BEGIN p_test; END;

*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "TC.P_TEST", line 4
ORA-06512: at line 1

Despite the fact that the user TC does have the READ privilege on V_$RESTORE_POINT, it still is not able to access it.
Till 12.2 it was enough to grant SELECT_CATALOG_ROLE to the owner of a program unit to avoid the error:
SYS@PDB> grant select_catalog_role to tc;

Grant succeeded.

SYS@PDB> conn tc/tc@localhost/pdb
Connected.
TC@PDB>
TC@PDB> set serverout on
TC@PDB>
TC@PDB> exec p_test
RP_TEST

PL/SQL procedure successfully completed.
It is not the case anymore in 12.2 and subsequent versions which I tested: 18c and 19c.
The output from 19c is below:
SYS@PDB> grant select_catalog_role to tc;

Grant succeeded.

SYS@PDB> conn tc/tc@localhost/pdb
Connected.
SYS@PDB>
TC@PDB> set serverout on
TC@PDB>
TC@PDB> exec p_test
BEGIN p_test; END;

*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "TC.P_TEST", line 5
ORA-06512: at "TC.P_TEST", line 5
ORA-06512: at line 1
BTW, the line 'ORA-06512: at "TC.P_TEST", line 5' is reported twice, and 19c shows a slightly different errorstack than 12.1.
The following solution works in 12.2 on:
SYS@PDB> grant select_catalog_role to procedure tc.p_test;

Grant succeeded.

SYS@PDB>
SYS@PDB> conn tc/tc@localhost/pdb
Connected.
TC@PDB>
TC@PDB> set serverout on
TC@PDB>
TC@PDB> exec p_test
RP_TEST

PL/SQL procedure successfully completed.
The need for having the SELECT_CATALOG_ROLE granted to the user in 12.1 does not make much sense as roles do not work in named PL/SQL definer rights program units. I am not talking about roles granted to PL/SQL units here.
Therefore, the "new" behavior requiring the role to be granted to PL/SQL units appears to be more proper and logical.

While working on this issue, I was tinkering with gdb a little bit in an attempt to find an explanation to that SELECT_CATALOG_ROLE requirement - that role is not coming from V$-views as it was said in the blogpost which I referred before.
It turns out that role is used in Oracle code:
(gdb) disassemble kccxrsp
Dump of assembler code for function kccxrsp:
   0x000000000a225740 <+0>:     xchg   %ax,%ax
   0x000000000a225742 <+2>:     push   %rbp
   0x000000000a225743 <+3>:     mov    %rsp,%rbp
   0x000000000a225746 <+6>:     sub    $0x60,%rsp
   0x000000000a22574a <+10>:    mov    %rbx,-0x58(%rbp)
   0x000000000a22574e <+14>:    mov    %rdx,%rbx
..skip..
   0x000000000a2257e5 <+165>:   mov    $0xdda7b48,%edi
   0x000000000a2257ea <+170>:   mov    $0x13,%esi
   0x000000000a2257ef <+175>:   callq  0x859bd90 
..skip..
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) x/s 0xdda7b48
0xdda7b48:      "SELECT_CATALOG_ROLE"
GV$RESTORE_POINT is based on x$kccrsp and x$kccnrs. The former is seems to be accessed through the kccxrsp function.
kccxrsp calls kzsrol to perform extra security checks and passes SELECT_CATALOG_ROLE to it.

TL;DR: V$-views are really special views (i.e. no read consistency) and V$RESTORE_POINT has its own little peculiarity among them.
Not only does it require to have the SELECT_CATALOG_ROLE granted to a non-administrative user but also the definer rights PL/SQL unit owned by such a user should have that role granted as well.

суббота, 1 июня 2019 г.

OEM Target Version not updated after applying RU patch

After applying the 12.2.0.1.190416 Release Update (RU) patch, I noticed that the target version had not been updated:

I searched through My Oracle Support (MOS) and found a few similar issues where it was recommended to refresh the configuration of the host in question.
Thus, I performed the refresh operation for both the host configuration and the database configuration, yet the version was still the old one.

Here are the targets that are registered on the problem host:
[oraagent@oracle-sandbox bin]$ ./emctl config agent listtargets
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation.  All rights reserved.
[oracle-sandbox.domain, host]
[oracle-sandbox.domain:3872, oracle_emd]
[+ASM_oracle-sandbox.domain, osm_instance]
[OraDB12Home1_2_oracle-sandbox.domain_3696, oracle_home]
[OraHome1Grid_1_oracle-sandbox.domain_6710, oracle_home]
[agent13c1_3_oracle-sandbox.domain_5948, oracle_home]
[has_oracle-sandbox.domain, has]
[OraGI12Home1_4_oracle-sandbox.domain_67, oracle_home]
[BOXCDB_oracle-sandbox.domain, oracle_database]
[BOXCDB_oracle-sandbox.domain_CDB$ROOT, oracle_pdb]
[BOXCDB_oracle-sandbox.domain_BOXPDB, oracle_pdb]
I ran the following command after which the issue was resolved:
[oraagent@oracle-sandbox bin]$ ./emctl reload agent dynamicproperties BOXCDB_oracle-sandbox.domain:oracle_database
Oracle Enterprise Manager Cloud Control 13c Release 3
Copyright (c) 1996, 2018 Oracle Corporation.  All rights reserved.
---------------------------------------------------------------
EMD recompute dynprops completed successfully
The correct version finally appeared on the database page: