What about customers who do not use either of these? I used to work for a company where we were using alexandria-plsql-utils to load data from S3. Coincidentally, I have been looking for something recently on MOS and came across the following bug: Bug 28867698 - Inclusion of DBMS_CLOUD into "STANDARD" Oracle (Doc ID 28867698.8). It sounded promising, so that I decided to investigate as to how this package can be adapted for reading S3 data.
Prerequisites
The fix for the bug is included in 19.9 DB October 2020 Release Update. I assume this package and its dependencies can be installed in earlier releases/release updates. However, I have no idea whether it will work there. I can imagine that there might be some issues, to name a few:- DBMS_CLOUD and its dependencies use JSON. Its support has been gradually improving over the last releases. It might be the case that some of the functionality will not work properly in older releases.
- Not only is this about the package and other supplied database objects, but also it is about the ORACLE_LOADER access driver. If it does not provide the required capabilities in older releases, then again it is not something that can be fixed easily.
Environment
My environment is 19.9 CDB Single Instance on Azure:[oracle@myhostname ~]$ $ORACLE_HOME/OPatch/opatch lspatches 31772784;OCW RELEASE UPDATE 19.9.0.0.0 (31772784) 31771877;Database Release Update : 19.9.0.0.201020 (31771877) OPatch succeeded.I created a new PDB for this exercise:
SQL> create pluggable database tc admin user pdb_admin identified by oracle; Pluggable database created. SQL> alter pluggable database tc open; Pluggable database altered.Since I am going to read data from S3, I created a new bucket
mikhail-bucket-20210104
. It is non-public, so that I created a new IAM user, and assigned the following policy to it:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1508867055000", "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": [ "arn:aws:s3:::mikhail-bucket-20210104/*" ] } ] }Then, I saved user's access and secret keys which will be used to establish an S3 connection from the Oracle database.
Installation Steps
I used the following script to install DBMS_CLOUD as SYS:spo /tmp/dbms_cloud @?/rdbms/admin/dbms_cloud_types.sql @?/rdbms/admin/dbms_cloud.sql @?/rdbms/admin/dbms_cloud_task_catalog.sql @?/rdbms/admin/dbms_cloud_catalog.sql @?/rdbms/admin/dbms_cloud_task_views.sql @?/rdbms/admin/dbms_cloud_capability.sql @?/rdbms/admin/prvt_cloud_core.plb @?/rdbms/admin/prvt_cloud_core_body.plb @?/rdbms/admin/prvt_cloud_internal.plb @?/rdbms/admin/prvt_cloud_request.plb @?/rdbms/admin/prvt_cloud_task.plb @?/rdbms/admin/prvt_cloud_task_body.plb @?/rdbms/admin/prvt_cloud_body.plb @?/rdbms/admin/prvt_cloud_capability_body.plb @?/rdbms/admin/prvt_cloud_internal_body.plb @?/rdbms/admin/prvt_cloud_request_body.plb @?/rdbms/admin/dbms_cloud_metadata.sql spo offThese are the files that Oracle added in 19.9. Presumably they were provided by the fix for Bug 28867698 - Inclusion of DBMS_CLOUD into "STANDARD" Oracle (Doc ID 28867698.8).
I am going to access S3 via HTTPS. Therefore, I created a new empty wallet
/u01/app/oracle/wallet
:
[oracle@myhostname ~]$ mkdir -p /u01/app/oracle/wallet [oracle@myhostname ~]$ orapki wallet create -wallet /u01/app/oracle/wallet -pwd Oracle123 -auto_login Oracle PKI Tool Release 21.0.0.0.0 - Production Version 21.0.0.0.0 Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved. Operation is successfully completed.I also need to load SSL certificates into the wallet. I use openssl to get the certificates.
[oracle@myhostname ~]$ openssl s_client -showcerts -connect s3.amazonaws.com:443 </dev/null CONNECTED(00000003) depth=2 C = IE, O = Baltimore, OU = CyberTrust, CN = Baltimore CyberTrust Root verify return:1 depth=1 C = US, O = DigiCert Inc, OU = www.digicert.com, CN = DigiCert Baltimore CA-2 G2 verify return:1 depth=0 C = US, ST = Washington, L = Seattle, O = "Amazon.com, Inc.", CN = s3.amazonaws.com verify return:1 --- Certificate chain 0 s:/C=US/ST=Washington/L=Seattle/O=Amazon.com, Inc./CN=s3.amazonaws.com i:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Baltimore CA-2 G2 -----BEGIN CERTIFICATE----- MIIIGTCCBwGgAwIBAgIQDWRQa0XzDONabC3fLBi0NzANBgkqhkiG9w0BAQsFADBk <skip> x+JAorfCzDKa+P1lgCh3+V5Lnqvla2hwCyCnYAy1RR0y1UEUB8FUYj1/PIDs9RJX cVq+ZBjAtIrm6j5b+Q== -----END CERTIFICATE----- 1 s:/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Baltimore CA-2 G2 i:/C=IE/O=Baltimore/OU=CyberTrust/CN=Baltimore CyberTrust Root -----BEGIN CERTIFICATE----- MIIEYzCCA0ugAwIBAgIQAYL4CY6i5ia5GjsnhB+5rzANBgkqhkiG9w0BAQsFADBa <skip> 0WyzT7QrhExHkOyL4kGJE7YHRndC/bseF/r/JUuOUFfrjsxOFT+xJd1BDKCcYm1v upcHi9nzBhDFKdT3uhaQqNBU4UtJx5g= -----END CERTIFICATE----- --- Server certificate subject=/C=US/ST=Washington/L=Seattle/O=Amazon.com, Inc./CN=s3.amazonaws.com issuer=/C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Baltimore CA-2 G2 --- No client certificate CA names sent Peer signing digest: SHA256 Server Temp Key: ECDH, P-256, 256 bits --- SSL handshake has read 3712 bytes and written 415 bytes --- New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES128-GCM-SHA256 Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE No ALPN negotiated SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES128-GCM-SHA256 Session-ID: 5BE1D3D1A79399E9728D7E0B2CD4E1F041DCEC03087D4EFAEF6131AF5A8ABF80 Session-ID-ctx: Master-Key: 7E05806771AF48B9A9EA7DC6B5DBC4E064F315825F4BE411C7D5FB97B189B0643BE7200463EB29D44A4CF12951AF2C2A Key-Arg : None Krb5 Principal: None PSK identity: None PSK identity hint: None Start Time: 1609936308 Timeout : 300 (sec) Verify return code: 0 (ok) --- DONEEssentially, I have to load two certificates bounded by BEGIN/END CERTIFICATE lines. I wrote a one-liner for that - it extracts certificates using sed, and pipes it through csplit to write each certificate to a separate file:
[oracle@myhostname ~]$ openssl s_client -showcerts -connect s3.amazonaws.com:443 </dev/null 2>/dev/null | sed '/BEGIN/,/END/!d' | csplit -f aws -b '%d.pem' - /END/+1 2870 1582 [oracle@myhostname ~]$ ll aws?.pem -rw-r--r-- 1 oracle oinstall 2870 Jan 6 12:33 aws0.pem -rw-r--r-- 1 oracle oinstall 1582 Jan 6 12:33 aws1.pemNow I can load both certificates into the wallet:
[oracle@myhostname ~]$ orapki wallet add -wallet /u01/app/oracle/wallet -cert aws0.pem -trusted_cert -pwd Oracle123 Oracle PKI Tool Release 21.0.0.0.0 - Production Version 21.0.0.0.0 Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved. Operation is successfully completed. [oracle@myhostname ~]$ orapki wallet add -wallet /u01/app/oracle/wallet -cert aws1.pem -trusted_cert -pwd Oracle123 Oracle PKI Tool Release 21.0.0.0.0 - Production Version 21.0.0.0.0 Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved. Operation is successfully completed. [oracle@myhostname ~]$ orapki wallet display -wallet /u01/app/oracle/wallet Oracle PKI Tool Release 21.0.0.0.0 - Production Version 21.0.0.0.0 Copyright (c) 2004, 2020, Oracle and/or its affiliates. All rights reserved. Requested Certificates: User Certificates: Trusted Certificates: Subject: CN=DigiCert Baltimore CA-2 G2,OU=www.digicert.com,O=DigiCert Inc,C=US Subject: CN=s3.amazonaws.com,O=Amazon.com\, Inc.,L=Seattle,ST=Washington,C=USAt this stage, I can fire an HTTP request:
SQL> select utl_http.request('https://mikhail-bucket-20210104.s3.amazonaws.com/sales_2020.csv', 2 null, 3 'file:/u01/app/oracle/wallet') 4 get_request 5 from dual; GET_REQUEST -------------------------------------------------------------------------------- <?xml version="1.0" encoding="UTF-8"?> <Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>9A9C3 73E50A59095</RequestId><HostId>cGMOSk5WlNt0MNa++5c/N1JdIMSYH7fsSmj9ETSUHT7d1ztYO 40CSmQDSJF2sht8W/HoX23utI8=</HostId></Error>The AccessDenied error is expected - I have to supply the AWS credentials. However, if it was a bucket with public access, it should work fine.
DB User and S3 Credentials
Let us now create a new database user whose schema owns S3 credentials and any other application objects:SQL> grant dba to tc identified by tc; Grant succeeded.Create new credentials using access and secret keys:
SQL> conn tc/tc@localhost/tc Connected. SQL> SQL> exec dbms_credential.create_credential( - > credential_name => 'S3_CREDENTIAL', - > username => 'A******************3', - > password => '1**************************************E') PL/SQL procedure successfully completed.
SSL Wallet
It is about time to add something to the S3 bucket. For that, I uploaded thesales_2020.csv
file to my S3 bucket:
2020-01-01,1,10,100 2020-02-01,2,5,200 2020-03-01,2,10,400Yet I am still not able to select anything using my S3 location:
SQL> select * 2 from external ( 3 ( sold_date date, 4 product_id number, 5 quantity_sold number(10,2), 6 amount_sold number(10,2) 7 ) 8 type oracle_loader 9 default directory data_pump_dir 10 access parameters ( 11 records delimited by newline 12 nologfile 13 nobadfile 14 nodiscardfile 15 readsize=10000000 16 credential 'S3_CREDENTIAL' 17 fields terminated by ',' 18 date_format date mask 'yyyy-mm-dd') 19 location ('https://mikhail-bucket-20210104.s3.amazonaws.com/sales_2020.csv') 20 reject limit unlimited); select * * ERROR at line 1: ORA-29913: error in executing ODCIEXTTABLEOPEN callout ORA-20000: Database property SSL_WALLET not found ORA-06512: at "SYS.DBMS_CLOUD", line 917 ORA-06512: at "SYS.DBMS_CLOUD_INTERNAL", line 3823 ORA-06512: at line 1The package tries to find that property
SSL_WALLET
in DATABASE_PROPERTIES
. At present I use a quick and dirty workaround by adding a new row into SYS.PROPS$
in CDB$ROOT
:
SQL> insert into props$(name, value$, comment$) values ('SSL_WALLET', '/u01/app/oracle/wallet', 'SSL Wallet'); 1 row created. SQL> commit;There is also an undocumented system parameter
ssl_wallet
which I assume was intended to have something to do with this. The DBMS_CLOUD
code, though, queries DATABASE_PROPERTIES
directly. Although I assume the code can be rewritten somehow to make use of the ssl_wallet
parameter, my intention is to use that package as is.
Having added new
SSL_WALLET
property, I am now finally able to access the S3 bucket:
SQL> select * 2 from external ( 3 ( sold_date date, 4 product_id number, 5 quantity_sold number(10,2), 6 amount_sold number(10,2) 7 ) 8 type oracle_loader 9 default directory data_pump_dir 10 access parameters ( 11 records delimited by newline 12 nologfile 13 nobadfile 14 nodiscardfile 15 readsize=10000000 16 credential 'S3_CREDENTIAL' 17 fields terminated by ',' 18 date_format date mask 'yyyy-mm-dd') 19 location ('https://mikhail-bucket-20210104.s3.amazonaws.com/sales_2020.csv') 20 reject limit unlimited); SOLD_DATE PRODUCT_ID QUANTITY_SOLD AMOUNT_SOLD --------- ---------- ------------- ----------- 01-JAN-20 1 10 100 01-FEB-20 2 5 200 01-MAR-20 2 10 400
Hybrid Partitioned Table
As an extra exercise, I created a hybrid partitioned table - it has both external and traditional partitions:SQL> create table sales_hybrid 2 ( sold_date date, 3 product_id number, 4 quantity_sold number(10,2), 5 amount_sold number(10,2) 6 ) 7 external partition attributes ( 8 type oracle_loader 9 default directory data_pump_dir 10 access parameters ( 11 records delimited by newline 12 nologfile 13 nobadfile 14 nodiscardfile 15 credential 'S3_CREDENTIAL' 16 fields terminated by ',' 17 date_format date mask 'yyyy-mm-dd' 18 ) 19 ) 20 partition by range(sold_date) ( 21 partition sales_2020 values less than (date'2021-01-01') 22 external location ('https://mikhail-bucket-20210104.s3.amazonaws.com/sales_2020.csv'), 23 partition sales_2021 values less than (date'2022-01-01') 24 ) 25 ; Table created. SQL> SQL> insert into sales_hybrid values(date'2021-01-01',1,10,150); 1 row created. SQL> select * from sales_hybrid; SOLD_DATE PRODUCT_ID QUANTITY_SOLD AMOUNT_SOLD --------- ---------- ------------- ----------- 01-JAN-20 1 10 100 01-FEB-20 2 5 200 01-MAR-20 2 10 400 01-JAN-21 1 10 150Moreover, we can even access compressed files:
SQL> drop table sales_hybrid; Table dropped. SQL> create table sales_hybrid 2 ( sold_date date, 3 product_id number, 4 quantity_sold number(10,2), 5 amount_sold number(10,2) 6 ) 7 external partition attributes ( 8 type oracle_loader 9 default directory data_pump_dir 10 access parameters ( 11 records delimited by newline 12 compression gzip 13 nologfile 14 nobadfile 15 nodiscardfile 16 credential 'S3_CREDENTIAL' 17 fields terminated by ',' 18 date_format date mask 'yyyy-mm-dd' 19 ) 20 ) 21 partition by range(sold_date) ( 22 partition sales_2020 values less than (date'2021-01-01') 23 external location ('https://mikhail-bucket-20210104.s3.amazonaws.com/sales_2020.csv.gz'), 24 partition sales_2021 values less than (date'2022-01-01') 25 ) 26 ; Table created. SQL> SQL> insert into sales_hybrid values(date'2021-01-01',1,10,150); 1 row created. SQL> SQL> select * from sales_hybrid; SOLD_DATE PRODUCT_ID QUANTITY_SOLD AMOUNT_SOLD --------- ---------- ------------- ----------- 01-JAN-20 1 10 100 01-FEB-20 2 5 200 01-MAR-20 2 10 400 01-JAN-21 1 10 150gzip, zlib, bzip2 compression schemes are supported per the documentation. We might as well utilize the new ORA_PARTITION_VALIDATION SQL function to see if the rows conform to the partition definition:
SQL> select ora_partition_validation(rowid), sh.* from sales_hybrid sh; ORA_PARTITION_VALIDATION(ROWID) SOLD_DATE PRODUCT_ID QUANTITY_SOLD AMOUNT_SOLD ------------------------------- --------- ---------- ------------- ----------- 1 01-JAN-20 1 10 100 1 01-FEB-20 2 5 200 1 01-MAR-20 2 10 400 1 01-JAN-21 1 10 150
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.