The MOS article Grid Infrastructure Out of Place ( OOP ) Patching using opatchauto (Doc ID 2419319.1) describes how to perform opatchauto Out of Place Patching of a Grid Infrastructure environment. The article assumes that commands are to be entered interactively. Although it is possible to automate the patching process using the Linux expect command, an alternative approach utilizing property files can be adapted to perform unattended patching.
Initial setup
I have a Grid Infrastructure Oracle Home /u01/app/oracle/product/19.4.0/grid on a Linux server. That is Grid Infrastructure Release Update 19.4.0.0.190716.
I would like to install Grid Infrastructure Release Update 19.5.0.0.191015 to a new location /u01/app/oracle/product/19.5.0/grid.
There are also a few 12.2 databases running on the same server.
I will not be using the 2 Step Method from the aforementioned MOS article, so that I will not be running the prepare-clone and switch-clone commands.
The patching will be performed by running a single command that can be easily scripted and used from any IT automation tool, such as Ansible.
Configuring Oracle Home mapping file
Here is an excerpt from the OPatch User's Guide describing how to provide original and cloned home details:
To create a property file, add all the required original and cloned home details and save the file with the .properties extension.
For example: clone.properties.
[root@emrep 30116789]# opatchauto apply -outofplace -phBaseDir /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015 \
> -oh /u01/app/oracle/product/19.4.0/grid \
> -silent /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/clone.properties \
> -analyze
OPatchauto session is initiated at Thu Dec 12 12:09:32 2019
System initialization log file is /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchautodb/systemconfig2019-12-12_12-09-36PM.log.
Session log file is /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchauto/opatchauto2019-12-12_12-09-41PM.log
The id for this session is ATZ9
Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/19.4.0/grid
Patch applicability verified successfully on home /u01/app/oracle/product/19.4.0/grid
OPatchAuto successful.
--------------------------------Summary--------------------------------
Out of place patching clone home(s) summary
____________________________________________
Host : emrep
Actual Home : /u01/app/oracle/product/19.4.0/grid
Version:19.0.0.0.0
Clone Home Path : /u01/app/oracle/product/19.5.0/grid
Analysis for applying patches has completed successfully:
Host:emrep
SIHA Home:/u01/app/oracle/product/19.4.0/grid
Version:19.0.0.0.0
==Following patches were SKIPPED:
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/29401763
Reason: This patch is already been applied, so not going to apply again.
==Following patches were SUCCESSFULLY analyzed to be applied:
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30122149
Log: /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-10-00PM_1.log
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30122167
Log: /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-10-00PM_1.log
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30125133
Log: /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-10-00PM_1.log
OPatchauto session completed at Thu Dec 12 12:10:14 2019
Time taken to complete the session 0 minute, 42 seconds
Stopping Oracle Databases that use Oracle ASM
Since I would like to patch only the Grid Infrastructure Oracle Home, which implies its unavailability in my setup, I can stop all Oracle databases beforehand. Otherwise, they will be shutting down with the abort option. The following lines from the alert log confirm that the database was brought down using the abort option on one server where I was testing the Out of Place patching:
Thu Dec 12 12:26:00 2019
Shutting down instance (abort) (OS id: 13679)
License high water mark = 97
Thu Dec 12 12:26:00 2019
USER (ospid: 13679): terminating the instance
Thu Dec 12 12:26:01 2019
Instance terminated by USER, pid = 13679
Applying the Release Update
[root@emrep 30116789]# opatchauto apply -outofplace -phBaseDir /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015 \
> -oh /u01/app/oracle/product/19.4.0/grid \
> -silent /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/clone.properties
OPatchauto session is initiated at Thu Dec 12 12:12:57 2019
System initialization log file is /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchautodb/systemconfig2019-12-12_12-13-01PM.log.
Session log file is /u01/app/oracle/product/19.4.0/grid/cfgtoollogs/opatchauto/opatchauto2019-12-12_12-13-06PM.log
The id for this session is IMWY
Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/19.4.0/grid
Patch applicability verified successfully on home /u01/app/oracle/product/19.4.0/grid
Copying the files from the existing oracle home /u01/app/oracle/product/19.4.0/grid to a new location. Please wait...
Clone of oracle home /u01/app/oracle/product/19.4.0/grid is /u01/app/oracle/product/19.5.0/grid on host emrep
Copying the files from the existing oracle home /u01/app/oracle/product/19.4.0/grid to a new location is successful.
Unlocking CRS clone home for home /u01/app/oracle/product/19.4.0/grid.
Prepatch operation log file location: /u01/app/oracle/crsdata/emrep/crsconfig/hapatch_2019-12-12_12-16-50AM.log
Unlocked CRS clone home successfully for home /u01/app/oracle/product/19.4.0/grid.
Creating clone for oracle home /u01/app/oracle/product/19.4.0/grid.
Clone operation successful for oracle home /u01/app/oracle/product/19.4.0/grid.
Performing post clone operation for oracle home /u01/app/oracle/product/19.4.0/grid.
Performing post clone operation was successful for oracle home /u01/app/oracle/product/19.5.0/grid.
Start applying binary patch on home /u01/app/oracle/product/19.5.0/grid
Binary patch applied successfully on home /u01/app/oracle/product/19.5.0/grid
Update nodelist in the inventory for oracle home /u01/app/oracle/product/19.5.0/grid.
Update nodelist in the inventory is completed for oracle home /u01/app/oracle/product/19.5.0/grid.
Starting CRS service on home /u01/app/oracle/product/19.5.0/grid
Postpatch operation log file location: /u01/app/oracle/crsdata/emrep/crsconfig/hapatch_2019-12-12_12-25-58AM.log
CRS service started successfully on home /u01/app/oracle/product/19.5.0/grid
Confirm that all resources have been started from home /u01/app/oracle/product/19.5.0/grid.
All resources have been started successfully from home /u01/app/oracle/product/19.5.0/grid.
OPatchAuto successful.
--------------------------------Summary--------------------------------
Patching is completed successfully. Please find the summary as follows:
Host:emrep
SIHA Home:/u01/app/oracle/product/19.4.0/grid
Version:19.0.0.0.0
Summary:
==Following patches were SKIPPED:
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/29401763
Reason: This patch is already been applied, so not going to apply again.
==Following patches were SUCCESSFULLY applied:
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30122149
Log: /u01/app/oracle/product/19.5.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-18-01PM_1.log
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30122167
Log: /u01/app/oracle/product/19.5.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-18-01PM_1.log
Patch: /u01/app/oracle/stage/p30116789_190000_Linux-x86-64_gi_ru_19.5.0.0.191015/30116789/30125133
Log: /u01/app/oracle/product/19.5.0/grid/cfgtoollogs/opatchauto/core/opatch/opatch2019-12-12_12-18-01PM_1.log
Patching session reported following warning(s):
_________________________________________________
[Note]: Please verify the database is running from the desired Oracle home, if not then manually execute
$ORACLE_HOME/bin/srvctl modify database command to fix the problem
Out of place patching clone home(s) summary
____________________________________________
Host : emrep
Actual Home : /u01/app/oracle/product/19.4.0/grid
Version:19.0.0.0.0
Clone Home Path : /u01/app/oracle/product/19.5.0/grid
OPatchauto session completed at Thu Dec 12 12:37:27 2019
Time taken to complete the session 24 minutes, 30 seconds
[root@emrep 30116789]#
Thus, it took just 24 minutes to perform the patching on that server; it is an AWS t3.large instance.
The time can be even shortened more by preparing the cloned Oracle Home in advance.
A rough estimate on the log above can be made that the patching time may be shortened twice using the 2 Step Method.
SQL92_SECURITY has changed its default value to TRUE starting from 12.2.0.1: Upgrade Guide.
That might lead to ORA-01031: insufficient privileges errors being thrown in runtime as the following script demonstrates.
SQL> grant create session, create table, unlimited tablespace to tc_data identified by tc_data;
Grant succeeded.
SQL>
SQL> create table tc_data.t
2 as
3 select *
4 from dual;
Table created.
SQL>
SQL> grant create session, create procedure to tc_app identified by tc_app;
Grant succeeded.
SQL>
SQL> grant delete on tc_data.t to tc_app;
Grant succeeded.
SQL>
SQL> create or replace procedure tc_app.p
2 is
3 begin
4 delete tc_data.t
5 where dummy = 'X';
6 end;
7 /
Procedure created.
SQL>
SQL> exec tc_app.p
BEGIN tc_app.p; END;
*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "TC_APP.P", line 4
ORA-06512: at line 1
Although the procedure is valid, it throws an ORA-01031 error as soon as the statement refers to any table columns including pseudo-columns, such as ROWID (I have tested it only for ROWID).
I have even seen a case similar to the one below, when the code throws an error depending on its input parameters:
SQL> create or replace procedure tc_app.p_collection(p_tbl sys.odcivarchar2list)
2 is
3 begin
4 forall i in 1..p_tbl.count
5 delete tc_data.t
6 where dummy = p_tbl(i);
7 end;
8 /
Procedure created.
SQL>
SQL> exec tc_app.p_collection(sys.odcivarchar2list())
PL/SQL procedure successfully completed.
SQL> exec tc_app.p_collection(sys.odcivarchar2list('y'))
BEGIN tc_app.p_collection(sys.odcivarchar2list('y')); END;
*
ERROR at line 1:
ORA-01031: insufficient privileges
ORA-06512: at "TC_APP.P_COLLECTION", line 4
ORA-06512: at line 1
I can imagine that Oracle has not implemented the behavior when the code does not compile if it is known that there is a missing privilege due to SQL92_SECURITY=TRUE - it is just an instance parameter and it can be changed back and forth. Thus, it would make the status of the objects misleading. For instance, a valid PL/SQL unit with SQL92_SECURITY=FALSE, should either become invalid or throw a runtime ORA-01031 error when SQL92_SECURITY=TRUE.
However, it might come in handy to have a PL/SQL warning at least to identify possible missing privileges.
It can also be a good idea to change the scope of the SQL92_SECURITY parameter and make it one of the PL/SQL compiler settings of the stored objects shown in DBA_PLSQL_OBJECT_SETTINGS.
I have recently optimized one of processes working with LOBs and decided to write a blog post about it.
There was an anonymous block like the following:
declare
v_result clob;
begin
for test_rec in (
select substr(clob_col, 1, 32000) clob_part1,
substr(clob_col, 32001, 32000) clob_part2
from driving_table
where some_conditions
order by sort_column)
loop
v_result := v_result || test_rec.clob_part1 || ';' || test_rec.clob_part2;
end loop;
end;
/
In a nutshell, that block unloaded some LOB data to a CLOB. Developers identified that this block was working quite fine being run for a relatively small number of rows, but the performance was deteriorating drastically when the same block was running against a large number of rows.
To set the scene for further analysis, let's create a table with 100K rows that has a 100 byte CLOB column:
SQL> create table t100(
2 n int generated as identity,
3 c clob);
Table created.
SQL>
SQL> insert /*+ append*/into t100(c)
2 select lpad('x', 100, 'x')
3 from xmltable('1 to 100000');
100000 rows created.
SQL>
SQL> commit;
Firstly, I show what happens when I try to emulate the original cursor:
-- I am gathering some memory statistics to be used in further research
SQL> select pga_used_mem, pga_alloc_mem
2 from v$session s,
3 v$process p
4 where s.sid = sys_context('userenv', 'sid')
5 and p.addr = s.paddr;
PGA_USED_MEM PGA_ALLOC_MEM
------------ -------------
2544108 4114740
SQL>
SQL> select pm.*
2 from v$session s,
3 v$process p,
4 v$process_memory pm
5 where s.sid = sys_context('userenv', 'sid')
6 and p.addr = s.paddr
7 and pm.pid = p.pid
8 order by pm.category;
PID SERIAL# CATEGORY ALLOCATED USED MAX_ALLOCATED CON_ID
---------- ---------- --------------- ---------- ---------- ------------- ----------
77 157 Other 5667940 5667940 3
77 157 PL/SQL 30744 23680 31920 3
77 157 SQL 54456 43688 319280 3
SQL>
SQL> var rc refcursor
SQL>
SQL> begin
2 open :rc for
3 select n,
4 substr(c, 1, 40) c_part1,
5 substr(c, 41, 40) c_part2,
6 substr(c, 81, 40) c_part3
7 from t100
8 where n <= 50000
9 order by n;
10 end;
11 /
PL/SQL procedure successfully completed.
SQL> declare
2 n int;
3 c1 clob;
4 c2 c1%type;
5 c3 c1%type;
6 begin
7 fetch :rc
8 into n, c1, c2, c3;
9 end;
10 /
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL> select pm.*
2 from v$session s,
3 v$process p,
4 v$process_memory pm
5 where s.sid = sys_context('userenv', 'sid')
6 and p.addr = s.paddr
7 and pm.pid = p.pid
8 order by pm.category;
PID SERIAL# CATEGORY ALLOCATED USED MAX_ALLOCATED CON_ID
---------- ---------- --------------- ---------- ---------- ------------- ----------
77 157 Freeable 1048576 0 3
77 157 Other 1375048620 1375048620 3
77 157 PL/SQL 32856 23104 60960 3
77 157 SQL 14988592 11145152 17361192 3
Oops! The code allocated 1.3G PGA in the Other category. There are also 150K abstract LOBs created:
Let us find out what we have in V$PROCESS_MEMORY_DETAIL:
SQL> alter session set events 'immediate trace name pga_detail_get level 77';
Session altered.
SQL> select name, heap_name, bytes, allocation_count, bytes/150000
2 from v$process_memory_detail
3 where pid = 77
4 and category = 'Other'
5 and allocation_count >= 140000
6 order by allocation_count
7 /
NAME HEAP_NAME BYTES ALLOCATION_COUNT BYTES/150000
-------------------------- --------------- ---------- ---------------- ------------
kghdsx kokltcr: creat 16800000 150000 112
free permanent memory kokltcr: creat 15600216 150000 104.00144
kolrarfc:lobloc_kolrhte kolr heap ds i 9610304 150000 64.0686933
kolraloc-1 kolr heap ds i 12289592 150002 81.9306133
kokltcr: create clob koh dur heap d 53599912 300000 357.332747
kghsseg: kolaslCreateCtx kokltcr: creat 1212000000 300000 8080
That is more than 8,192 bytes per abstract LOB even for these 40 byte LOBs!
These are in-memory LOBs, however, memory is a limited resource obviously and Oracle can create non-in-memory LOBs starting with a certain number of abstract LOBs (it is around 215K LOBs in my test database). Notice that I limited the number of rows to 100K to create 300K abstract LOBs:
SQL> begin
2 open :rc for
3 select n,
4 substr(c, 1, 40) c_part1,
5 substr(c, 41, 40) c_part2,
6 substr(c, 81, 40) c_part3
7 from t100
8 where n <= 100000
9 order by n;
10 end;
11 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.20
SQL> declare
2 n int;
3 c1 clob;
4 c2 c1%type;
5 c3 c1%type;
6 begin
7 fetch :rc
8 into n, c1, c2, c3;
9 end;
10 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:08.04
SQL> select *
2 from v$temporary_lobs
3 where sid = sys_context('userenv', 'sid');
SID CACHE_LOBS NOCACHE_LOBS ABSTRACT_LOBS CON_ID
---------- ---------- ------------ ------------- ----------
435 84626 0 300000 3
Once the session reached the max heap, which seems to be 75% in my environment, it starts allocating non-in-memory temporary LOBs and there are functions such as kolaslWrite and kolaslWriteCreateTempLob on the callstack that implies that they might be writing something to the temporary tablespace.
Let us take a look now at the following PL/SQL block that emulates what the application code does:
SQL> declare
2 v_reporting_freq constant pls_integer := 20000;
3 procedure log_info(i_msg varchar2)
4 is
5 begin
6 dbms_output.put_line(to_char(sysdate, 'hh24:mi:ss')||' '||i_msg);
7 end log_info;
8 begin
9 log_info('Starting execution');
10 for test_rec in (
11 select n,
12 substr(c, 1, 40) c_part1,
13 substr(c, 41, 40) c_part2,
14 substr(c, 81, 40) c_part3
15 from t100
16 where n <= 100000
17 order by n)
18 loop
19 if mod(test_rec.n, v_reporting_freq) = 0
20 then
21 log_info('Processed '||v_reporting_freq||' rows');
22 end if;
23 end loop;
24 log_info('Finishing execution');
25 end;
26 /
06:35:34 Starting execution
06:36:58 Processed 20000 rows
06:37:10 Processed 20000 rows
06:37:18 Processed 20000 rows
06:37:21 Processed 20000 rows
06:37:22 Processed 20000 rows
06:44:52 Finishing execution
PL/SQL procedure successfully completed.
Elapsed: 00:09:18.82
There are a few observations about this output:
The SELECT was running from 35:34 to 37:22, it is 108 seconds.
The block itself was running in 9 minutes 18 seconds or in 558 seconds, which is 108(SELECT)+450(other)
What exactly happened between 06:37:22, when was the last fetch, and 06:44:52 - that is exactly 450 seconds, when the block finished?
Here is a tkprof processed tracefile:
SQL ID: btysk50atwccn Plan Hash: 1939537825
SELECT N, SUBSTR(C, 1, 40) C_PART1, SUBSTR(C, 41, 40) C_PART2, SUBSTR(C, 81,
40) C_PART3
FROM
T100 WHERE N <= 100000 ORDER BY N
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1001 106.43 107.82 5029 3341 649751 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 1003 106.43 107.82 5029 3341 649751 100000
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 107 (recursive depth: 1)
Number of plan statistics captured: 1
Rows (1st) Rows (avg) Rows (max) Row Source Operation
---------- ---------- ---------- ---------------------------------------------------
100000 100000 100000 SORT ORDER BY (cr=3341 pr=5029 pw=1695 time=108105549 us starts=1 cost=2080 size=4600000 card=100000)
100000 100000 100000 TABLE ACCESS FULL T100 (cr=3341 pr=3334 pw=0 time=474939 us starts=1 cost=923 size=4600000 card=100000)
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
PGA memory operation 37387 0.00 0.29
Disk file operations I/O 2 0.00 0.00
db file scattered read 67 0.00 0.09
CSS initialization 1 0.00 0.00
CSS operation: query 6 0.00 0.00
CSS operation: action 1 0.00 0.00
direct path write temp 26 0.00 0.02
direct path read temp 66 0.00 0.07
********************************************************************************
declare
v_reporting_freq constant pls_integer := 20000;
procedure log_info(i_msg varchar2)
is
begin
dbms_output.put_line(to_char(sysdate, 'hh24:mi:ss')||' '||i_msg);
end log_info;
begin
log_info('Starting execution');
for test_rec in (
select n,
substr(c, 1, 40) c_part1,
substr(c, 41, 40) c_part2,
substr(c, 81, 40) c_part3
from t100
where n <= 100000
order by n)
loop
if mod(test_rec.n, v_reporting_freq) = 0
then
log_info('Processed '||v_reporting_freq||' rows');
end if;
end loop;
log_info('Finishing execution');
end;
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 446.41 450.44 0 0 468361 1
Fetch 0 0.00 0.00 0 0 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 2 446.41 450.44 0 0 468361 1
Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 107
Elapsed times include waiting on following events:
Event waited on Times Max. Wait Total Waited
---------------------------------------- Waited ---------- ------------
PGA memory operation 419 0.11 0.13
SQL*Net message to client 1 0.00 0.00
SQL*Net message from client 1 0.10 0.10
There are no waits that can explain why the block took 450 seconds to execute.
Let us get to the raw tracefile then:
PARSING IN CURSOR #140371888210368 len=617 dep=0 uid=107 oct=47 lid=107 tim=9831631653615 hv=2253167668 ad='c9f31410' sqlid='3sfmy3y34t71n'
declare
v_reporting_freq constant pls_integer := 20000;
procedure log_info(i_msg varchar2)
is
begin
dbms_output.put_line(to_char(sysdate, 'hh24:mi:ss')||' '||i_msg);
end log_info;
begin
log_info('Starting execution');
for test_rec in (
select n,
substr(c, 1, 40) c_part1,
substr(c, 41, 40) c_part2,
substr(c, 81, 40) c_part3
from t100
where n <= 100000
order by n)
loop
if mod(test_rec.n, v_reporting_freq) = 0
then
log_info('Processed '||v_reporting_freq||' rows');
end if;
end loop;
log_info('Finishing execution');
end;
END OF STMT
PARSE #140371888210368:c=96,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=9831631653614
=====================
PARSING IN CURSOR #140371888222536 len=127 dep=1 uid=107 oct=3 lid=107 tim=9831631654062 hv=362688916 ad='c91e39a0' sqlid='btysk50atwccn'
SELECT N, SUBSTR(C, 1, 40) C_PART1, SUBSTR(C, 41, 40) C_PART2, SUBSTR(C, 81, 40) C_PART3 FROM T100 WHERE N <= 100000 ORDER BY N
END OF STMT
PARSE #140371888222536:c=95,e=95,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1939537825,tim=9831631654062
EXEC #140371888222536:c=41,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1939537825,tim=9831631654152
..skip..
FETCH #140371888222536:c=946,e=539,p=0,cr=0,cu=0,mis=0,r=100,dep=1,og=1,plh=1939537825,tim=9831739623035
FETCH #140371888222536:c=59,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1939537825,tim=9831739623185
STAT #140371888222536 id=1 cnt=100000 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=3341 pr=5029 pw=1695 str=1 time=108105549 us cost=2080 size=4600000 card=100000)'
STAT #140371888222536 id=2 cnt=100000 pid=1 pos=1 obj=158217 op='TABLE ACCESS FULL T100 (cr=3341 pr=3334 pw=0 str=1 time=474939 us cost=923 size=4600000 card=100000)'
*** 2019-10-23 06:44:52.738 (PDB(3))
CLOSE #140371888222536:c=1,e=1,dep=1,type=3,tim=9832189916721
EXEC #140371888210368:c=552847911,e=558267064,p=5029,cr=3341,cu=1118112,mis=0,r=1,dep=0,og=1,plh=0,tim=9832189920765
*** 2019-10-23 06:44:52.934 (PDB(3))
WAIT #140371888210368: nam='PGA memory operation' ela= 89 p1=0 p2=0 p3=0 obj#=158217 tim=9832190113279
WAIT #140371888210368: nam='PGA memory operation' ela= 61 p1=0 p2=0 p3=0 obj#=158217 tim=9832190113427
The tracefile does not show what was going on between the last fetch(tim=9831739623185) and the time the cursor was closed(tim=9832189916721).
Here is the V$ACTIVE_SESSION_HISTORY (ASH) output for the problem block:
SQL> select to_char(sample_time, 'hh24:mi:ss') sample_time, sql_id, sql_opname, top_level_sql_id, session_state, event,
2 trim(case when IN_CONNECTION_MGMT='Y' then ' IN_CONNECTION_MGMT' end||
3 case when IN_PARSE='Y' then ' IN_PARSE' end||
4 case when IN_HARD_PARSE='Y' then ' IN_HARD_PARSE' end||
5 case when IN_SQL_EXECUTION='Y' then ' IN_SQL_EXECUTION' end||
6 case when IN_PLSQL_EXECUTION='Y' then ' IN_PLSQL_EXECUTION' end||
7 case when IN_PLSQL_RPC='Y' then ' IN_PLSQL_RPC' end||
8 case when IN_PLSQL_COMPILATION='Y' then ' IN_PLSQL_COMPILATION' end||
9 case when IN_JAVA_EXECUTION='Y' then ' IN_JAVA_EXECUTION' end||
10 case when IN_BIND='Y' then ' IN_BIND' end||
11 case when IN_CURSOR_CLOSE='Y' then ' IN_CURSOR_CLOSE' end||
12 case when IN_SEQUENCE_LOAD='Y' then ' IN_SEQUENCE_LOAD' end||
13 case when IN_INMEMORY_QUERY='Y' then ' IN_INMEMORY_QUERY' end||
14 case when IN_INMEMORY_POPULATE='Y' then ' IN_INMEMORY_POPULATE' end||
15 case when IN_INMEMORY_PREPOPULATE='Y' then ' IN_INMEMORY_PREPOPULATE' end||
16 case when IN_INMEMORY_REPOPULATE='Y' then ' IN_INMEMORY_REPOPULATE' end||
17 case when IN_INMEMORY_TREPOPULATE='Y' then ' IN_INMEMORY_TREPOPULATE' end||
18 case when IN_TABLESPACE_ENCRYPTION='Y' then ' IN_TABLESPACE_ENCRYPTION' end) activity,
19 pga_allocated,
20 temp_space_allocated
21 from v$active_session_history
22 where session_id = 52
23 and sample_time >= timestamp'2019-10-23 06:35:33'
24 order by sample_time;
SAMPLE_T SQL_ID SQL_OPNAME TOP_LEVEL_SQL SESSION_STATE EVENT ACTIVITY PGA_ALLOCATED TEMP_SPACE_ALLOCATED
-------- ------------- ------------- ------------- ------------- ---------- ---------------- ------------- --------------------
-- the execution starts and the session steadily allocates PGA and TEMP
06:35:35 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 316826624 0
06:35:36 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 526869504 0
06:35:37 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 667640832 0
06:35:38 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 775185408 0
06:35:39 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 870933504 5242880
06:35:40 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 946693120 5242880
06:35:41 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1025664000 5242880
06:35:42 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1087202304 5242880
06:35:43 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1153000448 6291456
06:35:44 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1216242688 6291456
06:35:45 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1276404736 6291456
06:35:46 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1321624576 6291456
06:35:47 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1372152832 7340032
06:35:48 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1424450560 7340032
06:35:49 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1474192384 7340032
06:35:50 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1520133120 7340032
06:35:51 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1563255808 7340032
06:35:52 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 1593336832 8388608
..skip..
SAMPLE_T SQL_ID SQL_OPNAME TOP_LEVEL_SQL SESSION_STATE EVENT ACTIVITY PGA_ALLOCATED TEMP_SPACE_ALLOCATED
-------- ------------- ------------- ------------- ------------- ---------- ---------------- ------------- --------------------
-- the SELECT completes
06:37:19 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2461950976 715128832
06:37:20 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2461950976 715128832
06:37:21 btysk50atwccn SELECT 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2461950976 715128832
06:37:23 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
06:37:24 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
06:37:25 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
06:37:26 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
-- the session is burning up CPUs
06:44:50 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
06:44:51 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
06:44:52 3sfmy3y34t71n ON CPU IN_SQL_EXECUTION 2460902400 700448768
I would rather expect to see some PGA and TEMP deallocations between 06:37:22 (the time when the SELECT completes) and 06:44:52 (the time when the block completes) but there are none.
While I was looking at V$TEMPORARY_LOBS, I was observing that the session was constantly releasing LOBs during the period in question.
Remember there are 300K abstract LOBs created - they are being released in those 450 seconds.
However, it is not reflected in ASH as far as I am aware.
All PGA and TEMP deallocations happen at once and there is really no hint in ASH to figure out what is going on.
Let us take a look at the Flamegraph to see what Oracle functions contributed to most load:
It can be seen that there are functions such as kokegGarbageCollectOpn and kokegGarbageCollectRworo that were active when qersoFetch was active (19% of the time).
Those seem to be related to garbage collection and the koke prefix is present in the oradebug doc component output:
Most of the time (80%), though, the functions such as: kksumc, kxsFreeExecutionHeap were active - they are supposedly related to releasing memory.
Oracle code might not be updating PGA_ALLOCATED, TEMP_SPACE_ALLOCATED till the whole duration is freed as I do not see any changes in those columns of V$ACTIVE_SESSION_HISTORY.
When I found that SUBSTR on LOBs produces new abstract LOBs, I implemented the following workaround:
SQL> declare
2 v_reporting_freq constant pls_integer := 20000;
3 v_result clob;
4 procedure log_info(i_msg varchar2)
5 is
6 begin
7 dbms_output.put_line(to_char(sysdate, 'hh24:mi:ss')||' '||i_msg);
8 end log_info;
9 begin
10 log_info('Starting execution');
11 dbms_lob.createtemporary(v_result, cache=> true, dur=> dbms_lob.call);
12 for test_rec in (
13 select n,
14 dbms_lob.substr(c, 40, 1) c_part1,
15 dbms_lob.substr(c, 40, 41) c_part2,
16 dbms_lob.substr(c, 40, 81) c_part3
17 from t100
18 where n <= 100000
19 order by n)
20 loop
21 dbms_lob.append(v_result, test_rec.c_part1);
22 dbms_lob.append(v_result, test_rec.c_part2);
23 dbms_lob.append(v_result, test_rec.c_part3);
24 if mod(test_rec.n, v_reporting_freq) = 0
25 then
26 log_info('Processed '||v_reporting_freq||' rows');
27 end if;
28 end loop;
29 dbms_output.put_line('Lob size: '||dbms_lob.getlength(v_result));
30 dbms_lob.freetemporary(v_result);
31 log_info('Finishing execution');
32 end;
33 /
08:15:04 Starting execution
08:15:08 Processed 20000 rows
08:15:09 Processed 20000 rows
08:15:10 Processed 20000 rows
08:15:12 Processed 20000 rows
08:15:13 Processed 20000 rows
Lob size: 10000000
08:15:13 Finishing execution
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.95
In spite of the fact that this block even writes a whole new LOB (there are no such operations in the block that was working 9 minutes), it completes just in 9 seconds. It is quite a neat optimization, isn't it?
That behavior of producing abstract LOBs of SUBSTR is documented: Database SecureFiles and Large Objects Developer's Guide: Temporary LOBs:
If you use the newly provided enhanced SQL semantics functionality in your applications, then there are many more temporary LOBs created silently in SQL and PL/SQL than before. Ensure that temporary tablespace for storing these temporary LOBs is large enough for your applications. In particular, these temporary LOBs are silently created when you use the following:
SQL functions on LOBs
PL/SQL built-in character functions on LOBs
Variable assignments from VARCHAR2/RAW to CLOBs/BLOBs, respectively.
Perform a LONG-to-LOB migration
If SQL operators are used on LOBs, the PGA memory and temporary tablespace must be large enough to accommodate the temporary LOBs generated by SQL operators.
Apart from that workaround, I can come up with a few others that also provide significant performance boost:
This query is almost a corner-case due to that sorting. If the ORDER BY is really needed, then an indexed access on the N column can be considered.
These SUBSTR calls can be moved within the LOOP part, so that only the C column is selected in the query. Although it would not be the most efficient way to process the cursor (abstract LOBs will still be allocated), those abstract LOBs will be released after each iteration. Therefore, the time-consuming garbage collection at the end will be avoided.
All tests were performed on 12.2.0.1.190716. 19c has some nice optimizations, such as one mentioned in Session spending its time IN_CURSOR_CLOSE
I was trying to reproduce that issue on 19c but I was not able to get the same results after a few attempts.
Though abstract LOBs were still allocated, they were being released rather quickly.
tl;dr
Oracle can create temporary LOBs when SQL functions are used on LOBs as this blog post demonstrates.
For example, SUBSTR behaves like that. Those calls can be replaced with DBMS_LOB.SUBSTR that does not lead to the aforementioned deficiency.