Страницы

четверг, 15 августа 2019 г.

Starting Oracle Database on Demand Using Socket Activation

I remember watching a video about Amazon Aurora Serverless where it was demonstrated that your database can be started on demand when the first connection comes. Although it is not something that you might want for your 24x7 OLTP database, it can come in handy for Development environments that have the intermittent usage pattern. I decided to write a blog post about how that can be done for an Oracle Database server working on a systemd-based Linux Operating System.

My setup uses systemd Socket Activation which you can read more about on Lennart Poettering's website.
The following link has been used as a basis for that setup: systemd: on-demand start of services like postgresql and mysql that do not yet support socket-based activation.
I took Amazon Linux distribution for this demo with the following kernel: 4.14.123-111.109.amzn2.x86_64, however, the setup steps should be the same for any other Red Hat based distributions.
The Oracle Database version is 19.3.

In a nutshell, we need to create several systemd units:
1. /etc/systemd/system/proxy-to-oracle.socket:
[Unit]
Description="Socket for Oracle Socket Proxy"
Documentation=https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html

[Socket]
ListenStream=0.0.0.0:1522

[Install]
WantedBy=sockets.target
The socket listens on port 1522 which accepts client connections to the database. When a connection comes, it starts the following service if it's not already running:
2. /etc/systemd/system/proxy-to-oracle.service:
[Unit]
Description="Oracle Socket Proxy"
Documentation=https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html
Requires=dbora.service
After=dbora.service

[Service]
User=nobody
Group=nobody
ExecStart=/lib/systemd/systemd-socket-proxyd ip-172-17-31-208.ec2.internal:1521
Restart=on-failure
PrivateTmp=true
I am using systemd-socket-proxyd that proxies the connection to the host ip-172-17-31-208.ec2.internal on port 1521 - it's just the hostname/port of the server where the listener is running (I'm using the same machine to host all services).
There are After and Requires dependencies on dbora.service, which manages the database and the listener.
3. /etc/systemd/system/dbora.service:
[Unit]
Description="The Oracle Database Service"
After=syslog.target network.target

[Service]
LimitMEMLOCK=infinity
LimitNOFILE=65535
RemainAfterExit=yes
User=oracle
Group=dba
ExecStart=/home/oracle/scripts/start_oracle.sh
ExecStop=/u01/app/oracle/product/19/dbhome_1/bin/dbshut /u01/app/oracle/product/19/dbhome_1

[Install]
WantedBy=multi-user.target
In the service above, I used the standard stop command and a slightly changed startup script: start_oracle.sh:
#!/bin/sh

export ORACLE_SID=orcl
ORAENV_ASK=NO . /usr/local/bin/oraenv -s
sqlplus / as sysdba <<_EOF
startup
_EOF

lsnrctl start

sqlplus / as sysdba <<_EOF
alter system register;
_EOF
I did not use the dbstart command as it starts the listener first before bringing up databases. I was getting ORA-01109: database not open errors with the standard dbstart script, so I resorted to my own version of the startup script. The TNS-descriptor is below:
pdb =
    (DESCRIPTION=
      (RETRY_COUNT=10)
      (RETRY_DELAY=10)
      (ADDRESS=
        (PROTOCOL=TCP)(HOST=ip-172-17-31-208.ec2.internal)(PORT=1522)
      )
      (CONNECT_DATA=
        (SERVICE_NAME=pdb)
      )
    )
It's worth mentioning that the port is set to 1522 - that is where the systemd proxy is listening on. I use a non-standard client sqlnet.ora file that disables Out of Band Break (see more information about it: What is DISABLE_OOB (Out Of Band Break)? (Doc ID 373475.1)):
DISABLE_OOB=ON
Without that, I kept getting the ORA-12637: Packet receive failed error. I searched through SQL*Net traces and finally found a clue that pointed out to that parameter disabling Out of Band Breaks. Actually, I do not recall that I got that error when I configured a similar setup on Oracle Database 12.1 and Oracle Linux 6 last year. I'm going to connect to the socket on port 1522; the proxy-to-oracle.service should start the dbora.service if it is not running, and proxy the connection to the listener. There is a little helper script to test the database connection that writes some output to estimate how long it takes to establish the connection:
-- test_db_conn.sh
#!/bin/sh
export TNS_ADMIN="${HOME}/tns_admin"
echo "Started at: "$(date +"%F %T")

sqlplus -L tc/tc@pdb <<_EOF
select to_char(sysdate, 'yyyy-mm-dd hh24:mi:ss') current_date
  from dual;
_EOF

echo ""
echo "Finished at: "$(date +"%F %T")
The initial state of services is below - only the proxy-to-oracle.socket is running:
# systemctl status proxy-to-oracle.socket proxy-to-oracle.service dbora.service
● proxy-to-oracle.socket - "Socket for Oracle Socket Proxy"
   Loaded: loaded (/etc/systemd/system/proxy-to-oracle.socket; enabled; vendor preset: disabled)
   Active: active (listening) since Thu 2019-08-08 13:31:19 UTC; 20h ago
     Docs: https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html
   Listen: 0.0.0.0:1522 (Stream)

● proxy-to-oracle.service - "Oracle Socket Proxy"
   Loaded: loaded (/etc/systemd/system/proxy-to-oracle.service; static; vendor preset: disabled)
   Active: inactive (dead) since Fri 2019-08-09 10:25:21 UTC; 3min 15s ago
     Docs: https://www.freedesktop.org/software/systemd/man/systemd-socket-proxyd.html
  Process: 16042 ExecStart=/lib/systemd/systemd-socket-proxyd ip-172-17-31-208.ec2.internal:1521 (code=killed, signal=TERM)
 Main PID: 16042 (code=killed, signal=TERM)

● dbora.service - "The Oracle Database Service"
   Loaded: loaded (/etc/systemd/system/dbora.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2019-08-09 10:25:46 UTC; 2min 50s ago
  Process: 22234 ExecStop=/u01/app/oracle/product/19/dbhome_1/bin/dbshut /u01/app/oracle/product/19/dbhome_1 (code=exited, status=0/SUCCESS)
  Process: 16032 ExecStart=/home/oracle/scripts/start_oracle.sh (code=exited, status=0/SUCCESS)
 Main PID: 16032 (code=exited, status=0/SUCCESS)
Neither the instance nor the listener are running:
[oracle@ip-172-17-31-208 ~]$ pgrep -af 'pmon|tnslsnr'
[oracle@ip-172-17-31-208 ~]$
Now I am starting the script that tests the database connection:
[oracle@ip-172-17-31-208 scripts]$ ./test_db_conn.sh
Started at: 2019-08-09 10:32:53

SQL*Plus: Release 19.0.0.0.0 - Production on Fri Aug 9 10:32:53 2019
Version 19.3.0.0.0

Copyright (c) 1982, 2019, Oracle.  All rights reserved.


Last Successful login time: Fri Aug 09 2019 09:27:45 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

SQL>   2
CURRENT_DATE
-------------------
2019-08-09 10:33:13

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0

Finished at: 2019-08-09 10:33:14
It takes less than 20 seconds to start the instance (the TNS descriptor used RETRY_DELAY=10, so that the client tries to establish a connection every 10 seconds).

пятница, 9 августа 2019 г.

That Old Restart Problem Strikes Back: Triggers to the Rescue

The previous articles in this series:
1. That Old Restart Problem Strikes Back: Setting the Stage
2. That Old Restart Problem Strikes Back: Getting What You Pay for

Database triggers are notoriously known for slowing down DML operations.
Yet, they lead to a remarkable consequence while applying to the first post of this series in which I demonstrated statement restarts within a single user session updating a non-partitioned table.

Let's create a non-partitioned table using the script from the first part of this series:
SQL> create table big(id int, pad char(100), val int);

Table created.

SQL>
SQL> insert /*+ append*/
  2    into big
  3  select rownum,
  4         'x',
  5         case
  6           when rownum <= 10000000 - 100000
  7           then 0
  8           else 1
  9         end
 10    from xmltable('1 to 10000000');

10000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select blocks, bytes/power(2,20) mbytes
  2    from user_segments
  3   where segment_name = 'BIG';

    BLOCKS     MBYTES
---------- ----------
    164352       1284

SQL>
SQL> select val, count(*)
  2    from big
  3   group by val
  4   order by val;

       VAL   COUNT(*)
---------- ----------
         0    9900000
         1     100000
This time, though, I am adding a new after update row-level trigger:
SQL> create trigger big_au_trg
  2  after update
  3  on big
  4  for each row
  5  declare
  6  begin
  7    null;
  8  end;
  9  /

Trigger created.
Then I am updating the table and am going to measure consistent and current buffer gets statistics:
SQL> alter session set events 'trace[dml]:sql_trace wait=true';

Session altered.

SQL>
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                          45
db block gets                                                             3

SQL>
SQL> update big
  2     set val = val + 1
  3   where val = 1;

100000 rows updated.

SQL>
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                      158587
db block gets                                                        103337
We got 158K consistent gets and 103K current gets for our 164K block table - that is a decent amount of buffer gets in this scenario.
The SQL Monitoring Report shows that we executed the rowsource functions only once and read 1GB of data:
SQL> select dbms_sqltune.report_sql_monitor('7z5h5dg0pa0fv') from dual;

DBMS_SQLTUNE.REPORT_SQL_MONITOR('7Z5H5DG0PA0FV')
---------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
update big set val = val + 1 where val = 1

Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  TC (48:40781)
 SQL ID              :  7z5h5dg0pa0fv
 SQL Execution ID    :  16777226
 Execution Started   :  08/01/2019 12:02:01
 First Refresh Time  :  08/01/2019 12:02:05
 Last Refresh Time   :  08/01/2019 12:02:12
 Duration            :  11s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb
 Program             :  sqlplus.exe

Global Stats
================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | PL/SQL  | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) |  Gets  | Reqs | Bytes |
================================================================================
|      11 |    4.58 |     6.30 |        0.00 |    0.08 |   262K | 2931 |   1GB |
================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3225303172)
=====================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |      Activity Detail       |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |        (# samples)         |
=====================================================================================================================================================
|  0 | UPDATE STATEMENT     |      |         |       |         4 |     +8 |     1 |        0 |      |       |          |                            |
|  1 |   UPDATE             | BIG  |         |       |         5 |     +8 |     1 |        0 | 1112 |   9MB |    22.22 | Cpu (2)                    |
|  2 |    TABLE ACCESS FULL | BIG  |      5M | 43217 |        11 |     +1 |     1 |     100K | 1818 |   1GB |    77.78 | db file scattered read (7) |
=====================================================================================================================================================
Notice also that it took us only 11 seconds to execute that update statement whereas it was around 16 seconds without any triggers - your mileage can vary.
I would not jump to conclusions keeping in mind that there still might be statement restarts even when the rowsource functions were executed only once (it was demonstrated in the second article of this series), so that I provide the relevant output from the trace file instead:
PARSING IN CURSOR #140092319088552 len=46 dep=0 uid=138 oct=6 lid=138 tim=2680019864414 hv=3243573723 ad='c7471690' sqlid='7z5h5dg0pa0fv'
update big
   set val = val + 1
 where val = 1
END OF STMT
PARSE #140092319088552:c=130,e=131,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3225303172,tim=2680019864414
updSetExecCmpColInfo: not RHS: objn=136467, cid=3
  kduukcmpf=0x7f69ce5d99ea, kduukcmpl=0x7f69ce5d99e8, kduukcmpp=(nil)
updThreePhaseExe: objn=136467 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x8000.0277b741  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  97sch: scn: 0x0000.00000000  mascn: (scn: 0
x8000.0277b72c) env: (scn: 0x0000.00000000  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  512sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.00000000)
..
updrow: objn=136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a6964.34 code 0
updrow: objn=136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a6964.35 code 0
updrow: objn=136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a6964.36 code 0
.. there are many updrow calls
.. there are several ROW_RETRY as well
WAIT #140092319088552: nam='undo segment extension' ela= 15 segment#=6 p2=0 p3=0 obj#=0 tim=2680026841072
updrow: objn=136467 error=1551
updrow: qecinvsub objn=136467
updrow: setting ROW_RETRY objn=136467
updrow: retry_this_row: ROW_RETRY set, objn= 136467 phase=NOT LOCKED
WAIT #140092319088552: nam='undo segment extension' ela= 10848 segment#=6 p2=0 p3=0 obj#=0 tim=2680026852020
updrow: objn=136467 error=1551
updrow: qecinvsub objn=136467
updrow: setting ROW_RETRY objn=136467
updrow: retry_this_row: ROW_RETRY set, objn= 136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a698b.23 code 0
updrow: objn=136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a698b.24 code 0
.. 
updrow: objn=136467 phase=NOT LOCKED
updrow: kauupd objn:136467 table:0 rowMigrated:FALSE  rowid 00021513.098a6fb1.2c code 0
EXEC #140092319088552:c=4580202,e=10824372,p=158086,cr=158543,cu=103457,mis=0,r=100000,dep=0,og=1,plh=3225303172,tim=2680030688827
STAT #140092319088552 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  BIG (cr=158543 pr=158086 pw=0 str=1 time=10824071 us)'
STAT #140092319088552 id=2 cnt=100000 pid=1 pos=1 obj=136467 op='TABLE ACCESS FULL BIG (cr=158531 pr=156973 pw=0 str=1 time=6924111 us cost=43217 size=15000000 card=5000000)'
There are no updrowFastPath lines and that output which looks suspiciously similar to the corresponding output with a partitioned table from the second part of this series.
That statement does not use the fast-path code path anymore and follows the usual non-fast code path that appears to be due to the presence of a trigger.

The ORA-01551 error can also explain the issue described in this link when after a certain number of executions the statement restart is observed. In fact, I ran the sample code provided by (Valentin?) Nikotin on the SQL.RU forum and I got the same ROW_RETRY messages in the trace file when there was a mismatch between the number of times the trigger was fired and the number of rows being updated.

tl;dr.

When triggers are present, the updrowFastPath function appears to be not used anymore and the execution goes through the updrow code path (the same one is used for partitioned tables).
In this specific example, the statement executes faster and generates far less load - it does not read the table three times after all, so that the UPDATE statement is actually faster with a trigger then without it.
Therefore, a suggestion to optimize a DML statement by creating a new trigger (a statement level trigger works as well) - does not sound so wacky anymore (of course, I am making use of the Oracle deficiency here).

пятница, 2 августа 2019 г.

That Old Restart Problem Strikes Back: Getting What You Pay for

The previous article in this series:
1. That Old Restart Problem Strikes Back: Setting the Stage

During my previous research, I was also testing the same UPDATE statement against a partitioned table.
I was slightly puzzled that I did not observe any statement restarts in the performance statistics, even after I updated tens of millions of rows in the partitioned table.
It turns out that there are still statement restarts there, however, they do not lead to such dreadful ramifications when a table is being read three times.
Disclaimer: Please bear in mind, I am talking about statement restarts happening without any concurrent activity in this series of articles. Statement restarts when several sessions are involved can result in reading a table being modified several times under these circumstances (I would refer to the excellent Tanel Poder's video on this subject).

For this demo, I am going to create a partitioned table with one partition (that's right, one is enough) and populate it with the same data that I used in my previous post:
SQL> create table big_part(
  2    id int,
  3    pad char(100),
  4    val int)
  5  partition by hash(id)
  6  partitions 1;

Table created.

SQL>
SQL> insert /*+ append*/
  2    into big_part
  3  select rownum,
  4         'x',
  5         case
  6           when rownum <= 10000000 - 100000
  7           then 0
  8           else 1
  9         end
 10    from xmltable('1 to 10000000');

10000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select partition_name, blocks, bytes/power(2,20) mbytes
  2    from user_segments
  3   where segment_name = 'BIG_PART';

PARTITION_NAME      BLOCKS     MBYTES
--------------- ---------- ----------
SYS_P3408           159744       1248

SQL> 
SQL> select val, count(*)
  2    from big_part
  3   group by val
  4   order by val;

       VAL   COUNT(*)
---------- ----------
         0    9900000
         1     100000
Let us update the table:
SQL> alter session set events 'trace[dml]:sql_trace wait=true';

Session altered.

SQL>
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                        2472
db block gets                                                             6

SQL>
SQL> update big_part
  2     set val = val + 1
  3   where val = 1;

100000 rows updated.

SQL>
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                      161286
db block gets                                                        103448
We performed 160K consistent gets and 100K current gets which is perfectly fine for a 160K table.
The SQL Monitoring report does not contain anything unusual:
SQL> select dbms_sqltune.report_sql_monitor('2k4ggcsqqwjgb') from dual;

DBMS_SQLTUNE.REPORT_SQL_MONITOR('2K4GGCSQQWJGB')
--------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
update big_part set val = val + 1 where val = 1

Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  TC (44:11676)
 SQL ID              :  2k4ggcsqqwjgb
 SQL Execution ID    :  16777218
 Execution Started   :  08/01/2019 09:26:06
 First Refresh Time  :  08/01/2019 09:26:10
 Last Refresh Time   :  08/01/2019 09:26:14
 Duration            :  8s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb
 Program             :  sqlplus.exe

Global Stats
======================================================================
| Elapsed |   Cpu   |    IO    | Application | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Reqs | Bytes |
======================================================================
|    7.80 |    3.04 |     4.76 |        0.00 |   262K | 5211 |   1GB |
======================================================================

SQL Plan Monitoring Details (Plan Hash Value=1296248929)
==============================================================================================================================================================
| Id |        Operation         |   Name   |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       |
|    |                          |          | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |
==============================================================================================================================================================
|  0 | UPDATE STATEMENT         |          |         |       |         3 |     +6 |     1 |        0 |      |       |          |                             |
|  1 |   UPDATE                 | BIG_PART |         |       |         4 |     +5 |     1 |        0 | 2724 |  21MB |    42.86 | Cpu (1)                     |
|    |                          |          |         |       |           |        |       |          |      |       |          | db file sequential read (2) |
|  2 |    PARTITION HASH SINGLE |          |    157K | 43213 |         3 |     +6 |     1 |     100K |      |       |          |                             |
|  3 |     TABLE ACCESS FULL    | BIG_PART |    157K | 43213 |         8 |     +1 |     1 |     100K | 2487 |   1GB |    57.14 | direct path read (4)        |
==============================================================================================================================================================
The exact amount of data read is 1259MB and our single partition is 1248MB in size:
SQL> select physical_read_bytes/power(2,20) read_mbytes
  2    from v$sql_monitor
  3   where sql_id = '2k4ggcsqqwjgb'
  4     and sql_exec_id = 16777218
  5     and sid = sys_context('userenv', 'sid');

READ_MBYTES
-----------
   1259.625
The same update took 16 seconds and read 3.6G for a non-partitioned table, so that this partitioned table update is twice as faster and reads three times less data.
It might have seen as if there were no statement restarts but it would be a misconception. The relevant entries from the trace file are below:
PARSING IN CURSOR #140643497507696 len=51 dep=0 uid=138 oct=6 lid=138 tim=2670664237436 hv=762201579 ad='ba7d5268' sqlid='2k4ggcsqqwjgb'
update big_part
   set val = val + 1
 where val = 1
END OF STMT
PARSE #140643497507696:c=42246,e=64115,p=14,cr=291,cu=0,mis=1,r=0,dep=0,og=1,plh=1296248929,tim=2670664237436
updSetExecCmpColInfo: not RHS: objn=136464, cid=3
  kduukcmpf=0x7fea236d560a, kduukcmpl=0x7fea236d5608, kduukcmpp=(nil)
updThreePhaseExe: objn=136464 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x8000.02777f0c  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  97sch: scn: 0x0000.00000000  mascn: (scn: 0x8000.02777ef6) env: (scn: 0x0000.00000000  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  512sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.00000000)
..
updrow: kauupd objn:136465 table:0 rowMigrated:FALSE  rowid 00021511.098b274b.34 code 0
updrow: objn=136464 phase=NOT LOCKED
updrow: kauupd objn:136465 table:0 rowMigrated:FALSE  rowid 00021511.098b274b.35 code 0
updrow: objn=136464 phase=NOT LOCKED
updrow: kauupd objn:136465 table:0 rowMigrated:FALSE  rowid 00021511.098b274b.36 code 0
.. a lot of updrow calls ..
..
updrow: objn=136464 error=1551
updrow: qecinvsub objn=136464
updrow: setting ROW_RETRY objn=136464
updrow: retry_this_row: ROW_RETRY set, objn= 136464 phase=NOT LOCKED
updrow: kauupd objn:136465 table:0 rowMigrated:FALSE  rowid 00021511.098b27ab.37 code 0
updrow: objn=136464 phase=NOT LOCKED
..
STAT #140643497507696 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  BIG_PART (cr=158523 pr=161232 pw=0 str=1 time=7769538 us)'
STAT #140643497507696 id=2 cnt=100000 pid=1 pos=1 obj=0 op='PARTITION HASH SINGLE PARTITION: 1 1 (cr=158514 pr=158508 pw=0 str=1 time=4086054 us cost=43213 size=2035865 card=156605)'
STAT #140643497507696 id=3 cnt=100000 pid=2 pos=1 obj=136464 op='TABLE ACCESS FULL BIG_PART PARTITION: 1 1 (cr=158514 pr=158508 pw=0 str=1 time=4063030 us cost=43213 size=2035865 card=156605)'
Although we got the same ORA-01551 error, it was gracefully handled using some ROW_RETRY logic without resorting to the LOCK/UPDATE sequence.
It can also be spotted that Oracle uses the updrow function whereas it used updrowFastPath for the non-partitioned table from my previous post.
It seems to be because of a different code path - partitioned tables just processed by another routine within Oracle kernel: updrow.
Non-partitioned tables are going through the updrowFastPath function that does not follow the cheap ROW_RETRY flow, so when we get an ORA-01551 error the underlying Oracle routine restarts the whole statement from scratch to pass it through the LOCK/UPDATE sequence.

tl;dr.

This post demonstrates that a certain kind of statement restart, which are presumably caused by ORA-01551, are handled differently for partitioned tables than for non-partitioned ones.
The consequences are such that updates performed to partitioned tables can be much faster than the same updates against non-partitioned tables (it was twice as faster in the example from this post).
Hence, that is where this joking title getting what you pay for is coming from - Oracle Partitioning option costs money after all (I don't want to say that Oracle made this deliberately - it's just a random consequence of that statement restart anomaly).

четверг, 1 августа 2019 г.

That Old Restart Problem Strikes Back: Setting the Stage

When I was reading about statement restarts, I thought it usually happens in a multi-user system or in the presence of triggers. However, last week one of our developers asked me to help him to figure out why a simple update statement was reading far more data than it was in the table. That blog post is published to share my findings about that.
I created a simple non-partitioned table for this demo that I performed on the Oracle Database 12.2.0.1.190416 (the issue has been also reproduced on 19.4.0.0.190716):
SQL> create table big(id int, pad char(100), val int);

Table created.

SQL>
SQL> insert /*+ append*/
  2    into big
  3  select rownum,
  4         'x',
  5         case
  6           when rownum <= 10000000 - 100000
  7           then 0
  8           else 1
  9         end
 10    from xmltable('1 to 10000000');

10000000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select blocks, bytes/power(2,20) mbytes
  2    from user_segments
  3   where segment_name = 'BIG';

    BLOCKS     MBYTES
---------- ----------
    163840       1280

SQL>
SQL> select val, count(*)
  2    from big
  3   group by val
  4   order by val;

       VAL   COUNT(*)
---------- ----------
         0    9900000
         1     100000
It has 10M rows and 163K blocks occupying 1280MB; there are 9.9M 0's and 100K 1's.
Now I am going to update that table and set all VAL=1 to 2 (I will be using a new session because I am going to query some statistics as well):
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                          45
db block gets                                                             3

SQL>
SQL> update big
  2     set val = val + 1
  3   where val = 1;

100000 rows updated.

SQL>
SQL> select name, value
  2    from v$mystat natural join v$statname
  3   where name in ('db block gets', 'consistent gets')
  4   order by name;

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
consistent gets                                                      479962
db block gets                                                        107064
It's a slightly unusual number of consistent gets given that that table has only 163K blocks.
Let's see how the SQL Monitoring report looks like:
SQL> select dbms_sqltune.report_sql_monitor('7z5h5dg0pa0fv') from dual;

DBMS_SQLTUNE.REPORT_SQL_MONITOR('7Z5H5DG0PA0FV')
------------------------------------------------------------------------------------------------------------------------------------------------------
SQL Monitoring Report

SQL Text
------------------------------
update big set val = val + 1 where val = 1

Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  TC (15:16913)
 SQL ID              :  7z5h5dg0pa0fv
 SQL Execution ID    :  16777224
 Execution Started   :  08/01/2019 07:53:30
 First Refresh Time  :  08/01/2019 07:53:34
 Last Refresh Time   :  08/01/2019 07:53:46
 Duration            :  16s
 Module/Action       :  SQL*Plus/-
 Service             :  pdb
 Program             :  sqlplus.exe

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|      17 |    5.80 |       11 |   587K | 6786 |   4GB |
========================================================

SQL Plan Monitoring Details (Plan Hash Value=3225303172)
======================================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost  |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity |       Activity Detail       |
|    |                      |      | (Estim) |       | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |         (# samples)         |
======================================================================================================================================================
|  0 | UPDATE STATEMENT     |      |         |       |        11 |     +6 |     3 |        0 |      |       |          |                             |
|  1 |   UPDATE             | BIG  |         |       |        11 |     +6 |     3 |        0 | 1502 |  12MB |    13.33 | Cpu (2)                     |
|  2 |    TABLE ACCESS FULL | BIG  |      5M | 43216 |        16 |     +1 |     3 |     205K | 5284 |   4GB |    86.67 | Cpu (3)                     |
|    |                      |      |         |       |           |        |       |          |      |       |          | db file scattered read (10) |
======================================================================================================================================================
So my session read 4G and all plan lines started thrice. Here is the exact amount of data read:
SQL> select physical_read_bytes/power(2,20) read_mbytes
  2    from v$sql_monitor
  3   where sql_id = '7z5h5dg0pa0fv'
  4     and sql_exec_id = 16777224
  5     and sid = sys_context('userenv', 'sid');

READ_MBYTES
-----------
 3689.39063
Remember, that the table's size is 1200M, so that we read that table three times. It can be also confirmed by looking into the raw trace file (notice the str=3 string which means that that row-source started three times):
STAT #140538058031192 id=1 cnt=0 pid=0 pos=1 obj=0 op='UPDATE  BIG (cr=322910 pr=315316 pw=0 str=3 time=11011457 us)'
STAT #140538058031192 id=2 cnt=205426 pid=1 pos=1 obj=136452 op='TABLE ACCESS FULL BIG (cr=479905 pr=470740 pw=0 str=3 time=13478866 us cost=43216 size=15000000 card=5000000)'
If I set some diagnostic events in my session:
alter session set events 'trace[dml]:sql_trace wait=true';
I am able to see what is going on in the trace file:
PARSING IN CURSOR #140538058031192 len=46 dep=0 uid=138 oct=6 lid=138 tim=2665107653071 hv=3243573723 ad='c7471690' sqlid='7z5h5dg0pa0fv'
update big
   set val = val + 1
 where val = 1
END OF STMT
PARSE #140538058031192:c=1704,e=1744,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=3225303172,tim=2665107653070
updSetExecCmpColInfo: not RHS: objn=136452, cid=3
  kduukcmpf=0x7fd1968399ea, kduukcmpl=0x7fd1968399e8, kduukcmpp=(nil)
updThreePhaseExe: objn=136452 phase=NOT LOCKED
updaul: phase is NOT LOCKED snap oldsnap env: (scn: 0x8000.02774b2c  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  97sch: scn: 0x0000.00000000  mascn: (scn: 0x8000.02774b0e) env: (scn: 0x0000.00000000  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  512sch: scn: 0x0000.00000000  mascn: (scn: 0x0000.00000000)
...
updrowFastPath: objn=136452 error=1551
updrowFastPath: qecinvsub objn=136452
updrowFastPath: resignal error due to array update objn=136452updaul: objn=136452 error=1551
updThreePhaseExe: objn=136452 phase=LOCK
updaul: phase is LOCK snap oldsnap env: (scn: 0x8000.02774b2e  xid: 0x0008.009.00009487  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x8000.02774b2e  97sch: scn: 0x0000.00000000  mascn: (scn: 0x8000.02774b0e) env: (scn: 0x8000.02774b2c  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  608sch: scn: 0x0000.00000000  mascn: (scn: 0x8000.02774b0e)
...
updrowFastPath: kddlkr objn 136452 table 0  rowid 00021504.09874962.34 code 0
updrowFastPath: kddlkr objn 136452 table 0  rowid 00021504.09874962.35 code 0
updrowFastPath: kddlkr objn 136452 table 0  rowid 00021504.09874962.36 code 0
updrowFastPath: kddlkr objn 136452 table 0  rowid 00021504.09874962.37 code 0
--100K such lines
updrowFastPath: kddlkr objn 136452 table 0  rowid 00021504.09874faf.2c code 0
updThreePhaseExe: objn=136452 phase=ALL LOCKED
updaul: phase is ALL LOCKED snap oldsnap env: (scn: 0x8000.02774b2e  xid: 0x0008.009.00009487  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x8000.02774b2e  96sch: scn: 0x0000.00000000  mascn: (scn: 0
x8000.02774b0e) env: (scn: 0x8000.02774b2c  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000  608sch: scn: 0x0000.00000000  mascn: (scn: 0x8000.02774b0e)
Somehow the update hit ORA-01551 and fell back to the LOCK/UPDATE sequence (which is similar to SELECT FOR UPDATE/UPDATE in SQL).
The table was read three times:
1. the first time we read it at the NOT LOCKED stage
2. we seemed to get ORA-01551 and went through the LOCK stage to lock all the rows (the kddlkr lines)
3. once we locked all the rows, we reached the ALL LOCKED stage and started changing the column
The error appears to be internal for Oracle:
$oerr ora 1551
01551, 00000, "extended rollback segment, pinned blocks released"
// *Cause: Doing recursive extent of rollback segment, trapped internally
//        by the system
// *Action: None
It might be related to the _inplace_update_retry parameter that has the following description: "inplace update retry for ora1551", but I have not tested it yet.
The closest issue that I found for this error is this old thread (2002) on Google Groups which was left without resolution and a detailed explanation (it's in Russian).
There is also Bug 24827102 : PERFORMANCE OF UPDATE STATEMENT IS NOT CONSISTENT that mentions the same error.
I cannot reproduce this issue every time and it has not happened in one session twice which makes me think that it is related to the undo segment assigned to that transaction (update: that could happen in one session multiple times as well, my initial observation was wrong).

tl;dr.

DML restarts are possible even in a single-user system. At least, that is how all modern versions of Oracle Database work (I tested 12.2 and 19c).
Thankfully, it's easy to identify them in SQL Monitoring reports or in trace files (when the trace[DML] or sql_trace events are set).
When you see some unexpected hikes in buffer gets or reads that cannot be explained by the amount of data being processed, it might as well be due to statement restarts as this post demonstrates.