Oracle Machine Learning Python Users Guide
Oracle Machine Learning Python Users Guide
Oracle Machine Learning Python Users Guide
User's Guide
Release 1.0
E97014-35
July 2024
Oracle Machine Learning for Python User's Guide, Release 1.0
E97014-35
Contributors: Andi Wang , Boriana Milenova , David McDermid , Feng Li , Mandeep Kaur , Mark Hornick, Qin Wang ,
Sherry Lamonica , Venkatanathan Varadarajan , Yu Xiang
This software and related documentation are provided under a license agreement containing restrictions on use and
disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or
allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit,
perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation
of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find
any errors, please report them to us in writing.
If this is software, software documentation, data (as defined in the Federal Acquisition Regulation), or related
documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then
the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any
programs embedded, installed, or activated on delivered hardware, and modifications of such programs) and Oracle
computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial
computer software," "commercial computer software documentation," or "limited rights data" pursuant to the applicable
Federal Acquisition Regulation and agency-specific supplemental regulations. As such, the use, reproduction,
duplication, release, display, disclosure, modification, preparation of derivative works, and/or adaptation of i) Oracle
programs (including any operating system, integrated software, any programs embedded, installed, or activated on
delivered hardware, and modifications of such programs), ii) Oracle computer documentation and/or iii) other Oracle
data, is subject to the rights and limitations specified in the license contained in the applicable contract. The terms
governing the U.S. Government's use of Oracle cloud services are defined by the applicable contract for such services.
No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not
developed or intended for use in any inherently dangerous applications, including applications that may create a risk of
personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all
appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its
affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle®, Java, MySQL, and NetSuite are registered trademarks of Oracle and/or its affiliates. Other names may be
trademarks of their respective owners.
Intel and Intel Inside are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used
under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Epyc, and the AMD logo
are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open
Group.
This software or hardware and documentation may provide access to or information about content, products, and
services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all
warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an
applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss,
costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth
in an applicable agreement between you and Oracle.
Contents
Preface
Audience viii
Documentation Accessibility viii
Related Resources viii
Conventions ix
2 Install OML4Py Client for Linux for Use With Autonomous Database on
Serverless Exadata Infrastructure
iii
3.5.3 Uninstall the OML4Py Client for On-Premises Databases 3-22
iv
7.1.2 Select Data 7-3
7.1.3 Combine Data 7-8
7.1.4 Clean Data 7-13
7.1.5 Split Data 7-15
7.2 Explore Data 7-17
7.2.1 About the Exploratory Data Analysis Methods 7-17
7.2.2 Correlate Data 7-19
7.2.3 Cross-Tabulate Data 7-20
7.2.4 Mutate Data 7-23
7.2.5 Sort Data 7-26
7.2.6 Summarize Data 7-28
7.3 Render Graphics 7-31
v
9.5 Model Selection 9-15
vi
10.6.1.4 pyqSetAuthToken Procedure 10-59
10.6.1.5 pyqIsTokenSet Function 10-59
10.6.2 Embedded Python Execution Functions (Autonomous Database) 10-60
10.6.2.1 pyqListEnvs Function (Autonomous Database) 10-61
10.6.2.2 pyqEval Function (Autonomous Database) 10-61
10.6.2.3 pyqTableEval Function (Autonomous Database) 10-64
10.6.2.4 pyqRowEval Function (Autonomous Database) 10-67
10.6.2.5 pyqGroupEval Function (Autonomous Database) 10-71
10.6.2.6 pyqIndexEval Function (Autonomous Database) 10-75
10.6.2.7 pyqGrant Function (Autonomous Database) 10-95
10.6.2.8 pyqRevoke Function (Autonomous Database) 10-96
10.6.2.9 pyqScriptCreate Procedure (Autonomous Database) 10-97
10.6.2.10 pyqScriptDrop Procedure (Autonomous Database) 10-100
10.6.3 Asynchronous Jobs (Autonomous Database) 10-100
10.6.3.1 oml_async_flag Argument 10-101
10.6.3.2 pyqJobStatus Function 10-102
10.6.3.3 pyqJobResult Function 10-103
10.6.3.4 Asynchronous Job Example 10-104
10.6.4 Special Control Arguments (Autonomous Database) 10-109
10.6.5 Output Formats (Autonomous Database) 10-110
Index
vii
Preface
Preface
This publication describes Oracle Machine Learning for Python (OML4Py) and how to use it.
.
• Audience
• Documentation Accessibility
• Related Resources
• Conventions
Audience
This document is intended for those who want to run Python commands for statistical, machine
learning, and graphical analysis on data stored in or accessible through Oracle Autonomous
Database or Oracle Database on premises using a Python API. Use of Oracle Machine
Learning for Python requires knowledge of Python and of Oracle Autonomous Database or
Oracle Database on premises.
Documentation Accessibility
For information about Oracle's commitment to accessibility, visit the Oracle Accessibility
Program website at http://www.oracle.com/pls/topic/lookup?ctx=acc&id=docacc.
Related Resources
Related documentation is in the following publications:
• Oracle Machine Learning for Python API Reference
• Oracle Machine Learning for Python Known Issues
• Oracle Machine Learning for Python Licensing Information User Manual
• REST API for Embedded Python Execution
• Get Started with Notebooks for Data Analysis and Data Visualization in Using Oracle
Machine Learning Notebooks
• Oracle Machine Learning AutoML User Interface
• REST API for Oracle Machine Learning Services
viii
Preface
Conventions
The following text conventions are used in this document:
Convention Meaning
boldface Boldface type indicates graphical user interface elements associated with an
action, or terms defined in text or the glossary.
italic Italic type indicates book titles, emphasis, or placeholder variables for which
you supply particular values.
monospace Monospace type indicates commands within a paragraph, URLs, code in
examples, text that appears on the screen, or text that you enter.
ix
1
About Oracle Machine Learning for Python
The following topics describe Oracle Machine Learning for Python (OML4Py) and its
advantages for the Python user.
Topics:
• What Is Oracle Machine Learning for Python?
Oracle Machine Learning for Python (OML4Py) enables you to run Python commands for
data transformations and for statistical, machine learning, and graphical analysis on data
stored in or accessible through an Oracle database using a Python API. The OML4Py
supports running user-defined Python functions through the database spawned and
controlled Python engines, with optional built-in data-parallelism and task-parallelism. This
embedded execution functionality enables invoking user-defined functions from SQL, and
on ADB, REST. The OML4Py supports Automated Machine Learning (AutoML) for
algorithm and feature selection, and model tuning and selection. You can augment the
Python included functionality with third-party packages from the Python ecosystem.
• Advantages of Oracle Machine Learning for Python
Using OML4Py to prepare and analyze data in or accessible to an Oracle database has
many advantages for a Python user.
• Transparently Convert Python to SQL
With the transparency layer classes, you can convert select Python objects to Oracle
database objects and also invoke a range of familiar Python functions that are overloaded
to invoke the corresponding SQL on tables in the database.
• About the Python Components and Libraries in OML4Py
OML4Py requires an installation of Python, a number of Python libraries, as well as the
OML4Py components.
1-1
Chapter 1
Advantages of Oracle Machine Learning for Python
1-2
Chapter 1
Advantages of Oracle Machine Learning for Python
1-3
Chapter 1
Transparently Convert Python to SQL
– Perform parallel simulations, for example, Monte Carlo analysis, using the
oml.index_apply function
– Generate JSON images, PNG images and XML representations of both structured and
image data, which can be used by Python clients and SQL-based applications. PNG
images and structured data can be used for Python clients and applications that use
REST APIs.
See Also: About Embedded Python Execution
Function Description
oml.create Creates a table in a the database schema from a Python data set.
oml_object.pull Creates a local Python object that contains a copy of data referenced by the
oml object.
oml.push Pushes data from a Python session into an object in a database schema.
oml.sync Creates a DataFrame proxy object in Python that represents a database
table or view.
oml.dir Return the names of oml objects in the Python session workspace.
oml.drop Drops a persistent database table or view.
Transparency layer proxy classes map SQL data types or objects to corresponding Python
types. The classes provide Python functions and operators that are the same as those on the
mapped Python types. The following table lists the transparency layer data type classes.
Class Description
oml.Boolean A boolean series data class that represents a single column of 0, 1, and NULL
values in database data.
oml.Bytes A binary series data class that represents a single column of RAW or BLOB
database data types.
oml.Float A numeric series data class that represents a single column of NUMBER,
BINARY_DOUBLE, or BINARY_FLOAT database data types.
1-4
Chapter 1
About the Python Components and Libraries in OML4Py
Class Description
oml.String A character series data class that represents a single column of VARCHAR2, CHAR,
or CLOB database data types.
oml.DataFrame A tabular DataFrame class that represents multiple columns of oml.Boolean,
oml.Bytes, oml.Float, and oml.String data.
The following table lists the mappings of OML4Py data types for both the reading and writing of
data between Python and the database.
1-5
Chapter 1
About the Python Components and Libraries in OML4Py
• cycler 0.10.0
• joblib 1.1.0
• kiwisolver 1.1.0
• matplotlib 3.3.3
• numpy 1.21.5
• pandas 1.3.4
• Pillow-8.2.0
• pyparsing 2.4.0
• python-dateutil 2.8.1
• pytz 2019.3
• scikit-learn 1.0.1
• scipy 1.7.3
• six 1.13.0
• threadpoolctl 2.1.0
All the above libraries are included with Python in the current release of Oracle Autonomous
Database.
For an installation of OML4Py in an on-premises Oracle Database, you must install Python and
additionally the libraries listed here. See Install OML4Py for On-Premises Databases.
1-6
2
Install OML4Py Client for Linux for Use With
Autonomous Database on Serverless Exadata
Infrastructure
You can install and use the OML4Py client for Linux to work with OML4Py in an Oracle
Autonomous Database on Serverless Exadata infrastructure.
OML4Py on premises runs on 64-bit platforms only. For supported platforms see OML4Py On
Premises System Requirements.
The following instructions tell you how to download install Python, configure your environment,
install manage your client credentials, install Oracle Instant Client, and install the OML4Py
client:
1. Download the Python 3.9.5 source and untar it:
wget https://www.python.org/ftp/python/3.9.5/Python-3.9.5.tar.xz
tar xvf Python-3.9.5.tar.xz
Note:
RPMs must be installed under sudo, or root.
3. To build Python, enter the following commands, where PREFIX is the directory in which you
installed Python-3.9.5. Use make altinstall to avoid overriding the system default's
Python installation.
export PREFIX=`pwd`/Python-3.9.5
cd $PREFIX
./configure --prefix=$PREFIX --enable-shared
2-1
Chapter 2
4. Set environment variable PYTHONHOME and add it to your PATH, and set environment variable
LD_LIBRARY_PATH:
export PYTHONHOME=$PREFIX
export PATH=$PYTHONHOME/bin:$PATH
export LD_LIBRARY_PATH=$PYTHONHOME/lib:$LD_LIBRARY_PATH
Create a symbolic link in your $PYTHONHOME/bin directory. You need to link it to your
python3.9 executable, which you can do with the following commands:
cd $PYTHONHOME/bin
ln -s python3.9 python3
python3
pip will return warnings during package installation if the latest version is not installed. You
can upgrade the version of pip to avoid these warnings:
wget https://download.oracle.com/otn_software/linux/instantclient/1914000/
oracle-instantclient19.14-basic-19.14.0.0.0-1.x86_64.rpm
rpm -ivh oracle-instantclient19.14-basic-19.14.0.0.0-1.x86_64.rpm
export LD_LIBRARY_PATH=/usr/lib/oracle/19.14/client64/lib:$LD_LIBRARY_PATH
If you do not have root access to install an RPM on the client system.
wget https://download.oracle.com/otn_software/linux/instantclient/1914000/
instantclient-basic-linux.x64-19.14.0.0.0dbru.zip
unzip instantclient-basic-linux.x64-19.14.0.0.0dbru.zip
export LD_LIBRARY_PATH=/path/to/instantclient_19_4:$LD_LIBRARY_PATH
6. Download the client credentials (wallet) from your Autonomous database. Create a
directory for the Wallet contents. Unzip the wallet zip file to the newly created directory:
2-2
Chapter 2
Note:
An mTLS connection using the client Wallet is required. TLS connections are not
currently supported.
mkdir -p mywalletdir
unzip Wallet.name.zip -d mywalletdir
cd mywalletdir/
ls
7. Update sqlnet.ora with the wallet location. If you're working behind a proxy firewall, set
the SQLNET.USE_HTTPS_PROXY environment variable to on:
8. Add proxy address information to all service levels in tnsnames.ora, and add the
connection pools for all service levels. If you are behind a firewall, enter the proxy address
and port number to all service levels in tnsnames.ora. You will also need to add three new
entries for the AutoML connection pools as shown below.
Note:
If the proxy server contains a firewall to terminate connections within a set time
period, the database connection will also be terminated.
For example, myadb_medium_pool is another alias for the connection string with
SERVER=POOLED added to the corresponding one for myadb_medium.
myadb_low = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com))
(security=(ssl_server_cert_dn="CN=adb.us-sanjose-1.oraclecloud.com,OU=Oracle
ADB SANJOSE,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))
myadb_medium = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com))
(security=(ssl_server_cert_dn="CN=adb.us-sanjose-1.oraclecloud.com,OU=Oracle
ADB SANJOSE,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))
myadb_high = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
2-3
Chapter 2
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com))
(security=(ssl_server_cert_dn="CN=adb.us-sanjose-1.oraclecloud.com,OU=Oracle
ADB SANJOSE,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))
myadb_low_pool = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com)
(SERVER=POOLED))(security=(ssl_server_cert_dn="CN=adb.us-
sanjose-1.oraclecloud.com,OU=Oracle ADB SANJOSE,O=Oracle Corporation,L=Redwood
City,ST=California,C=US")))
myadb_medium_pool = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com)
(SERVER=POOLED))(security=(ssl_server_cert_dn="CN=adb.us-
sanjose-1.oraclecloud.com,OU=Oracle ADB SANJOSE,O=Oracle Corporation,L=Redwood
City,ST=California,C=US")))
myadb_high_pool = (description= (retry_count=20)(retry_delay=3)
(address=(https_proxy=your proxy address here)(https_proxy_port=80)
(protocol=tcps)(port=1522)(host=adb.us-sanjose-1.oraclecloud.com))
(connect_data=(service_name=qtraya2braestch_myadb_medium.adb.oraclecloud.com)
(SERVER=POOLED))(security=(ssl_server_cert_dn="CN=adb.us-
sanjose-1.oraclecloud.com,OU=Oracle ADB SANJOSE,O=Oracle Corporation,L=Redwood
City,ST=California,C=US")))
9. Set TNS_ADMIN environment variable to the wallet directory:
export TNS_ADMIN=mywalletdir
10. Install OML4Py library dependencies. The versions listed here are the versions Oracle has
tested and supports:
2-4
Chapter 2
unzip oml4py-client-linux-x86_64-1.0.zip
perl -Iclient client/client.pl
Copyright (c) 2018, 2022 Oracle and/or its affiliates. All rights
reserved.
Checking platform .................. Pass
Checking Python .................... Pass
Checking dependencies .............. Pass
Checking OML4P version ............. Pass
Current configuration
Python Version ................... 3.9.5
PYTHONHOME ....................... /opt/Python-3.9.5
Existing OML4P module version .... None
Proceed? [yes]
Processing ./client/oml-1.0-cp39-cp39-linux_x86_64.whl
Installing collected packages: oml
Successfully installed oml-1.0
2-5
Chapter 2
Done
python3
import oml
• Create a database connection. The OML client connects using the wallet. Set the dsn
and automl arguments to the tnsnames alias in the wallet:
oml.connect(user="oml_user", password="oml_user_password",
dsn="myadb_medium", automl="myadb_medium_pool")
To provide empty strings for the user and password parameters to connect without
exposing your Oracle Machine Learning user credentials in clear text:
oml.connect(user="", password="", dsn="myadb_medium",
automl="myadb_medium_pool")
2-6
3
Install OML4Py for On-Premises Databases
The following topics tell how to install and uninstall the server and client components required
for using OML4Py with an on-premises Oracle Database.
Topics:
• OML4Py On Premises System Requirements
OML4Py on premises runs on 64-bit platforms only.
• Build and Install Python for Linux for On-Premises Databases
Instructions for installing Python for Linux for an on-premises Oracle database.
• Install the Required Supporting Packages for Linux for On-Premises Databases
Both the OML4Py server and client installations for an on-premises Oracle database
require that you also install a set of supporting Python packages, as described below.
• Install OML4Py Server for On-Premises Oracle Database
The following instructions tell how to install and uninstall the OML4Py server components
for an on-premises Oracle Database.
• Install OML4Py Client for On-Premises Databases
Instructions for installing and uninstalling the on-premises OML4Py client.
3.2 Build and Install Python for Linux for On-Premises Databases
Instructions for installing Python for Linux for an on-premises Oracle database.
The Python installation on the database server must be executed by the Oracle user and not
sudo, root, or any other user. This is not a requirement on the OML4Py client.
3-1
Chapter 3
Build and Install Python for Linux for On-Premises Databases
wget https://www.python.org/ftp/python/3.9.5/Python-3.9.5.tgz
mkdir -p $ORACLE_HOME/python
tar -xvzf Python-3.9.5.tgz --strip-components=1 -C $ORACLE_HOME/python
The contents of the Gzipped source tarball will be copied directly to $ORACLE_HOME/python
3. Go to the new directory:
cd $ORACLE_HOME/python
Note:
RPMs must be installed under sudo, or root.
5. To build Python 3.9.5, enter the following commands, where PREFIX is the directory in
which you installed Python-3.9.5. The command on the Oracle Machine Learning for
Python server will be:
cd $ORACLE_HOME/python
./configure --enable-shared --prefix=$ORACLE_HOME/python
Note:
Be sure to use the --enable-shared flag if you are going to use Embedded
Python Execution; otherwise, using an Embedded Python Execution function
results in an extproc error.
Be sure to invoke make altinstall instead of make install to avoid overwriting
the system Python.
3-2
Chapter 3
Build and Install Python for Linux for On-Premises Databases
6. Set environment variable PYTHONHOME and add it to your PATH, and set environment variable
LD_LIBRARY_PATH:
export PYTHONHOME=$ORACLE_HOME/python
export PATH=$PYTHONHOME/bin:$PATH
export LD_LIBRARY_PATH=$PYTHONHOME/lib:$LD_LIBRARY_PATH
Note:
In order to use Python for OML4Py, the variables must be set, and these
variables must appear before system Python in PATH and LD_LIBRARY_PATH.
pip will return warnings during package installation if the latest version is not installed. You
can upgrade the version of pip to avoid these warnings:
cd $ORACLE_HOME/python/bin
ln -s python3.9 python3
You can now start Python by running the command python3. To verify the directory where
Python is installed, use the sys.executable command from the sys package. For example:
python3
import sys
print(sys.executable)
/u01/app/oracle/product/19.3/dbhome_1/python/bin/python3
3-3
Chapter 3
Install the Required Supporting Packages for Linux for On-Premises Databases
3.3 Install the Required Supporting Packages for Linux for On-
Premises Databases
Both the OML4Py server and client installations for an on-premises Oracle database require
that you also install a set of supporting Python packages, as described below.
This command installs the cx_Oracle package using an example proxy server:
Note:
The proxy server is only necessary if the user is behind a firewall.
3-4
Chapter 3
Install the Required Supporting Packages for Linux for On-Premises Databases
the following command, specifying the package and target directory, $ORACLE_HOME/oml4py/
modules:
This command installs the cx_Oracle package using an example proxy server:
$ python3
import numpy
import pandas
import scipy
import matplotlib
import cx_Oracle
import sklearn
If all the packages are installed successfully, then no errors are returned.
3-5
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
Topics:
• Install OML4Py Server for Linux for On-Premises Oracle Database 19c
Instructions for installing the OML4Py server for Linux for an on-premises Oracle Database
19c.
• Install OML4Py Server for Linux for On-Premises Oracle Database 21c
Instructions for installing the OML4Py server for Linux for an on-premises Oracle Database
21c.
• Verify OML4Py Server Installation for On-Premises Database
Verify the installation of the OML4Py server and client components for an on-premises
database.
• Grant Users the Required Privileges for On-Premises Database
Instructions for granting the privileges required for using OML4Py with an on-premises
database.
• Create New Users for On-Premises Oracle Database
The pyquser.sql script is a convenient way to create a new OML4Py user for on on-
premises database.
• Uninstall the OML4Py Server from an On-Premises Database 19c
Instructions for uninstalling the on-premises OML4Py server components from an on-
premises Oracle Database 19c.
3.4.1 Install OML4Py Server for Linux for On-Premises Oracle Database
19c
Instructions for installing the OML4Py server for Linux for an on-premises Oracle Database
19c.
To install the OML4Py server for Linux for an on-premises Oracle database, run the server
installation Perl script.
Prerequisites
To install the on-premises OML4Py server, the following are required:
• A connection to the internet.
• Python 3.9.5. For instructions on installing Python 3.9.5 see Build and Install Python for
Linux for On-Premises Databases.
• OML4Py supporting packages. For instructions on installing the required supporting
packages see Install the Required Supporting Packages for Linux for On-Premises
Databases.
• Perl 5.8 or higher installed on your system.
3-6
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
Note:
Perl requires the presence of the perl-Env package.
• To verify if the perl-Env package exists on the system, type the command :
If it is installed, the return value will contain the version of the perl-Env RPM installed on
your system:
If perl-Env is not installed on the system, there will be no return value, and you can install
the package as root or sudo using the command:
• Write permission on the directories to which you download and install the server
components.
mkdir $ORACLE_HOME/oml4py
3-7
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
Copyright (c) 2018, 2022 Oracle and/or its affiliates. All rights reserved.
By default, the installation script installs both the Embedded Python Execution and AutoML
components. If you do not want to install these components, then you can use the --no-embed
and/or the --no-automl flag.
If you do not specify a permanent tablespace or a temporary tablespace in the Perl command,
then the installation script prompts you for them.
If you only want to install the oml modules and Embedded Python Execution libraries with no
database configuration, use the --no-db flag. The --no-db flag is used when OML4Py is
installed in a database with multiple nodes, such as Oracle RAC. The OML4Py server requires
a complete database configuration on the first node, but the oml module and Embedded
Python Execution libraries must be installed on each node.
export PYTHONPATH=$ORACLE_HOME/oml4py/modules
3-8
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
2. From the $ORACLE_HOME/oml4py directory, run the server installation script. The following
command runs the script in interactive mode:
Enter temporary and permanent tablespaces for the PYQSYS user when the script
prompts you for them.
3. When the installation script displays Proceed? , enter y or yes. The output of a successful
installation is as follows:
Copyright (c) 2018, 2022 Oracle and/or its affiliates. All rights reserved.
Current configuration
ORACLE_HOME ...................... /u01/app/oracle/product/19.3/dbhome_1
ORACLE_SID ....................... orcl
PDB .............................. ORCLPDB
Python Version ................... 3.9.5
PYTHONHOME ....................... /u01/app/oracle/product/19.3/dbhome_1/python
Proceed? [yes]yes
3-9
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
Done
An OML4Py user is a database user account that has privileges for performing machine
learning activities. To learn more about how to create a user for Oracle Machine learning
for python click Create New Users for On-Premises Oracle Database
$ sqlplus oml_user/oml_user_password$PDB1
Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0
Session altered.
NAME
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
PYTHONHOME
/u01/app/oracle/product/19.3/dbhome_1/python
PYTHONPATH
/u01/app/oracle/product/19.3/dbhome_1/oml4py/modules
VERSION
1.0
NAME
3-10
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
--------------------------------------------------------------------------------
VALUE
--------------------------------------------------------------------------------
PLATFORM
ODB
DSWLIST
oml.*;pandas.*;numpy.*;matplotlib.*;sklearn.*
3. To verify the installation of the OML4Py server for an on-premises database see Verify
OML4Py Server Installation for On-Premises Database.
3.4.2 Install OML4Py Server for Linux for On-Premises Oracle Database
21c
Instructions for installing the OML4Py server for Linux for an on-premises Oracle Database
21c.
You can install OML4Py by using a Python script included in your 21c database or by using the
Database Configuration Assistant (DBCA).
Note:
Perl requires the perl-Env package. You can install the package as root with the
command yum install perl-Env .
To check for the existence of perl-Env, run the following command. The version will vary
depending on your Operating System and version:
• Write permission on the directories to which you download and install the server
components.
Note:
The following environment variables must be set up.
3-11
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
export PYTHONHOME=PREFIX
export PATH=$PYTHONHOME/bin:$ORACLE_HOME/bin:$PATH
export ORACLE_HOME=ORACLE_HOME_HERE
export LD_LIBRARY_PATH=$PYTHONHOME/lib:$ORACLE_HOME/lib:$LD_LIBRARY_PATH
To install the OML4Py server for Linux for an on-premises Oracle Database 21c, run the server
installation Python script pyqcfg.sql.
1. At your operating system prompt, start SQL*Plus and log in to your Oracle pluggable
database (PDB) directly.
2. Run the pyqcfg.sql script. The script is under $ORACLE_HOME/oml4py/server.
To capture the log, spool the installation steps to an external file. The following example
uses the PDB PDB1 and gives example values for the script arguments.
sqlplus / as sysdba
spool install.txt
alter session set container=PDB1;
ALTER PROFILE DEFAULT LIMIT PASSWORD_VERIFY_FUNCTION NULL;
@$ORACLE_HOME/oml4py/server/pyqcfg.sql
dbca -configureOML4PY
3-12
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
1. In your local Python session, connect to the OML4Py server and invoke the same function
by name. In the following example, replace the values for the parameters with those for
your database.
import oml
oml.connect(user='oml_user', password='oml_user_password', host='myhost',
port=1521, sid='mysid')
2. Create a user-defined Python function and store it in the OML4Py script repository.
res = oml.do_eval(func='TEST')
res
4. When you are finished testing, you can drop the test.
oml.script.drop("TEST")
User Privileges
After installing the OML4Py server on an on-premises Oracle database server, grant the
following privileges to any OML4Py user.
• CREATE SESSION
• CREATE TABLE
• CREATE VIEW
• CREATE PROCEDURE
• CREATE MINING MODEL
• EXECUTE ON CTXSYS.CTX_DDL ( required for using Oracle Text Processing capability
in the algorithm classes in the oml.algo package )
3-13
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
To grant all of these privileges, on the on-premises Oracle database server start SQL as a
database administrator and run the following SQL statement, where oml_user is the OML4Py
user:
• Username
• User's permanent tablespace
• User's temporary tablespace
• Permanent tablespace quota
• PYQADMIN role
When you run the script, it prompts you for a password for the user.
3-14
Chapter 3
Install OML4Py Server for On-Premises Oracle Database
sqlplus / as sysdba
@pyquser.sql oml_user USERS TEMP unlimited pyqadmin
sqlplus / as sysdba
alter session set container=<PDBNAME>
@pyquser.sql oml_user USERS TEMP unlimited pyqadmin
User created.
3-15
Chapter 3
Install OML4Py Client for On-Premises Databases
This example creates the user oml_user2 with 20 megabyte quota on the USERS tablespace,
the temporary tablespace TEMP, and without the PYQADMIN role.
sqlplus / as sysdba
@pyquser.sql oml_user2 USERS TEMP 20M FALSE
echo $PYTHONHOME
2. Verify that PYTHONPATH environment variable is set to the directory in which the oml
modules are installed.
echo $PYTHONPATH
export PYTHONPATH=$ORACLE_HOME/oml4py/modules
3. Change directories to the directory containing the server installation zip file.
cd $ORACLE_HOME/oml4py
Topics:
• Install Oracle Instant Client and the OML4Py Client for Linux
Instructions for installing Oracle Instant Client and the OML4Py client for Linux for an on-
premises Oracle database.
3-16
Chapter 3
Install OML4Py Client for On-Premises Databases
3.5.1 Install Oracle Instant Client and the OML4Py Client for Linux
Instructions for installing Oracle Instant Client and the OML4Py client for Linux for an on-
premises Oracle database.
To connect the OML4Py client for Linux to an on-premises Oracle database, you must have
Oracle Instant Client installed on your local system.
• Install Oracle Instant Client for Linux for On-Premises Databases
Instructions for installing Oracle Instant Client for Linux for use with an on-premises Oracle
database.
• Install OML4Py Client for Linux for On-Premises Databases
Instructions for installing the OML4Py client for Linux for use with an on-premises Oracle
database.
3.5.1.1 Install Oracle Instant Client for Linux for On-Premises Databases
Instructions for installing Oracle Instant Client for Linux for use with an on-premises Oracle
database.
The OML4Py client requires Oracle Instant Client to connect to an Oracle database. See the
Oracle Support Note "Client / Server Interoperability Support Matrix for Different Oracle
Versions (Doc ID 207303.1)".
To install Oracle Instant Client, the following are required:
• A connection to the internet.
• Write permission on the directory in which you are installing the client.
To install Oracle Instant Client, do the following:
1. Download the Oracle Instant Client for your system. Go to the Oracle Instant Client
Downloads page and select Instant Client for Linux x86-64.
2. Locate the section for your version of Oracle Database. These instructions use the
19.14.0.0.0 version.
3. In the Base section, in the Download column, click the zip file for the Basic Package or
Basic Light Package and save the file in an accessible directory on your system. These
instructions use the directory /opt/oracle.
4. Go to the folder that you selected and unzip the package. For example:
cd /opt/oracle
unzip instantclient-basic-linux.x64-19.14.0.0.0dbru.zip
Extracting the package creates the subdirectory instantclient_19_14, which contains the
Oracle Instant Client files.
3-17
Chapter 3
Install OML4Py Client for On-Premises Databases
5. The libaio package is also required. To see if libaoi resides on the system run the
following command.
The version will vary based on the Linux version. If nothing is returned from this command,
then the libaio RPM is not installed on the target system.
To install the libaio package with sudo or as the root user, run the following command:
Note:
In some Linux distributions, this package is called libaio1.
6. Add the directory that contains the Oracle Instant Client files to the beginning of your
LD_LIBRARY_PATH environment variable:
export LD_LIBRARY_PATH=/opt/oracle/instantclient_19_14:$LD_LIBRARY_PATH
Prerequisites
To download and install the on-premises OML4Py client, the following are required:
• A connection to the internet.
• Write permission on the directory in which you are installing the client.
• Perl 5.8 or higher installed on your system.
• Python 3.9.5. To know more about downloading and installing Python 3.9.5, see Build and
Install Python for Linux for On-Premises Databases
To use the OML4Py client to connect to an on-premises Oracle database, the following are
required:
• Oracle Instant Client must be installed on the client machine.
• The OML4Py server must be installed on the on-premises database server.
To download and extract the OML4Py client installation file, do the following:
1. Download the client installation zip file.
a. Go to the Oracle Machine Learning for Python Downloads page on the Oracle
Technology Network.
3-18
Chapter 3
Install OML4Py Client for On-Premises Databases
b. Accept the license agreement and select Oracle Machine Learning for Python
Downloads (v1.0).
c. Select Oracle Machine Learning for Python Client Install for Oracle Database on
Linux 64 bit.
d. Save the zip file to an accessible directory. These instructions use a directory named
oml4py, but you can download the zip file to any location accessible to the user
installing the oml4py client.
2. Go to the directory to which you downloaded the zip file and unzip the file.
cd oml4py
unzip oml4py-client-linux-x86_64-1.0.zip
The contents are extracted to a subdirectory named client, which contains these four
files:
• OML4PInstallShared.pm
• oml-1.0-cp39-cp39-linux_x86_64.whl
• client.pl
• oml4py.ver
Copyright (c) 2018, 2022 Oracle and/or its affiliates. All rights reserved.
Usage: client.pl [OPTION]...
Install, upgrade, or uninstall OML4P Client.
By default, the installation script installs the Embedded Python Execution and AutoML
modules. If you don't want to install these modules, then you can use the --no-embed and --
no-automl flags, respectively.
Also by default, the installation script checks for the existence and version of each of the
supporting packages that the OML4Py client requires. If a required package is missing or does
not meet the version requirement, the installation script displays an error message and exits.
You can skip the dependency checking in the client installation by using the --no-deps flag.
3-19
Chapter 3
Install OML4Py Client for On-Premises Databases
However, to use the oml module, you need to have installed acceptable versions of all of the
supporting packages.
For a list of the required dependencies, see Install the Required Supporting Packages for Linux
for On-Premises Databases.
Alternatively, the following command runs the Perl script with the target directory specified:
The --target flag is optional, if you don't want to install it to the current directory.
When the script displays Proceed?, enter y or yes.
If you use the --target <dir> argument to install the oml module to the specified
directory, then add that location to environment variable PYTHONPATH so that Python can
find the module:
export PYTHONPATH=path_to_target_dir
Copyright (c) 2018, 2022 Oracle and/or its affiliates. All rights reserved.
Checking platform .................. Pass
Checking Python .................... Pass
Checking dependencies .............. Pass
Checking OML4P version ............. Pass
Current configuration
Python Version ................... 3.9.5
PYTHONHOME ....................... /opt/Python-3.9.5
Existing OML4P module version .... None
Operation ........................ Install/Upgrade
Proceed? [yes]
Processing ./client/oml-1.0-cp39-cp39-linux_x86_64.whl
Installing collected packages: oml
Successfully installed oml-1.0
2. To verify that oml modules are successfully installed and are ready to use, start Python and
import oml. At the Linux prompt, enter python3.
python3
3-20
Chapter 3
Install OML4Py Client for On-Premises Databases
import oml
python3
import oml
oml.__path__
import oml
oml.connect(user='oml_user', password='oml_user_password', host=myhost,
port=1521, service_name='myservice')
After connecting, you can run any of the examples in this publication. For example, you could
run Example 6-8.
Note:
To use the Embedded Python Execution examples, you must have installed the
OML4Py client with the Embedded Python Execution option enabled.
To use the Automatic Machine Learning (AutoML) examples, you must specify a
running connection pool on the server in the automl argument in an oml.connect
invocation.
3-21
Chapter 3
Install OML4Py Client for On-Premises Databases
import oml
oml.connect(user='oml_user', password='oml_user_password', host='myhost',
port=1521, sid='mysid')
2. Create a user-defined Python function and store it in the OML4Py script repository.
res = oml.do_eval(func='TEST')
res
4. When you are finished testing, you can drop the test.
oml.script.drop("TEST")
Uninstalling oml-1.0:
Successfully uninstalled oml-1.0
3-22
4
Install OML4Py on Exadata
The following topics tell about OML4Py on Exadata and how to configure DCLI and install
python, OML4Py across Exadata.
Topics:
• About Oracle Machine Learning for Python on Exadata
Exadata is an ideal platform for OML4Py. The parallel resources of Python computations in
OML4Py take advantage of the massively parallel grid infrastructure of Exadata.
• Configure DCLI to install Python across Exadata compute nodes.
Using Distributed Command Line Interface (DCLI) can simplify the installation of OML4Py
on Exadata.
Note:
The version of OML4Py must be the same on the server and on each client
computer. Also, the version of Python must be the same on the server and on each
client computer. See table number 3-2 OML4Py On Premises System Requirements
for supported configurations.
4-1
Chapter 4
Configure DCLI to install Python across Exadata compute nodes.
dcli -h
Return values:
0 -- file or command was copied and executed successfully on all cells
1 -- one or more cells could not be reached or remote execution
returned non-zero status.
2 -- An error prevented any command execution
Examples:
dcli -g mycells -k
dcli -c stsd2s2,stsd2s3 vmstat
dcli -g mycells cellcli -e alter iormplan active
dcli -g mycells -x reConfig.scl
Options:
--version show program's version number and exit
--batchsize=MAXTHDS limit the number of target cells on which to run the
command or file copy in parallel
-c CELLS comma-separated list of cells
--ctimeout=CTIMEOUT Maximum time in seconds for initial cell connect
-d DESTFILE destination directory or file
-f FILE files to be copied
4-2
Chapter 4
Configure DCLI to install Python across Exadata compute nodes.
Configure the Exadata environment to enable automatic authentication for DCLI on each
compute node.
1. Generate an SSH public-private key for the root user. Execute the following command as
root on any node:
This command generates public and private key files in the .ssh subdirectory of the home
directory of the root user.
2. In a text editor, create a file that contains the names of all the compute nodes in the rack.
Specify each node name on a separate line. For example, the nodes file for a 2-node
cluster could contain entries like the following:
cat nodes
exadb01
exadb02
3. Run the DCLI command with the -k option to establish SSH trust across all the nodes. The
-k option causes DCLI to contact each node sequentially (not in parallel) and prompts you
to enter the password for each node.
DCLI with -k establishes SSH Trust and User Equivalence. Subsequent DCLI commands
will not prompt for passwords.
4-3
Chapter 4
Configure DCLI to install Python across Exadata compute nodes.
Instructions for installing Python and OML4Py across Exadata compute nodes using DCLI are
described in the following topics.
Topics:
• Install Python across Exadata compute nodes using DCLI
Instructions for installing Python across Exadata compute nodes using DCLI.
• Install OML4Py across Exadata compute nodes using DCLI
Instructions for installing OML4Py across Exadata compute nodes using DCLI.
wget https://www.python.org/ftp/python/3.9.5/Python-3.9.5.tar.xz
tar xvf Python-3.9.5.tar.xz
4. Create a symbolic link in your $PYTHONHOME/bin directory. You need to link it to your
python3.9 executable, which you can do with the following commands:
5. Set environment variable PYTHONHOME and add it to your PATH, and set environment variable
LD_LIBRARY_PATH:
4-4
Chapter 4
Configure DCLI to install Python across Exadata compute nodes.
lib:$LD_LIBRARY_PATH"
dcli -t -g nodes -l oracle "export PIP_REQUIRE_VIRTUALENV=false"
6. You can now start Python by running the command python3. For example:
2. Set the PYTHONPATH environment variable to the location of the OML4Py modules:
export PYTHONPATH=$ORACLE_HOME/oml4py/modules
4-5
Chapter 4
Configure DCLI to install Python across Exadata compute nodes.
To run the server script in non-interactive mode, pass the parameters for the pluggable
database, and permanent and temporary tablespaces to the script
Run the server script with the --no-db flag on all remaining compute nodes. This sets up
the OML4Py server configuration and skips the database configuration steps already
performed on the first node:
4-6
5
Install Third-Party Packages
Oracle Machine Learning Notebooks in the Autonomous Database provides a conda
interpreter to install third-party Python libraries in a conda environment for use within OML
Notebooks sessions and OML4Py embedded execution invocations. Conda is an open-source
package and environment management system that enables the use of environments
containing third-party Python libraries.
Administrators create conda environments and install packages that can then be accessed by
non-administrator users and loaded into their OML Notebooks session. The conda
environments can be used by OML4Py Python, SQL, and REST APIs.
Note:
• None of the OML features that come with ADB require the customer to install any
additional third-party software via the conda feature.
• When installing third-party software using the conda feature, vulnerability
management and license compliance of that software is the sole responsibility of
the customer who installed it, not Oracle.
Topics:
• Conda Commands
This topic contains common commands used by ADMIN while creating and testing conda
environments in Autonomous Databases. Conda is an open-source package and
environment management system that enables the use of environments containing third-
party Python libraries.
• Administrative Tasks for Creating and Saving a Conda Environment
In OML Notebooks, user ADMIN can manage the lifecycle of the OML user’s conda
environments, including creating and deleting environments and installing and deleting
packages.
• OML User Tasks for Downloading an Available Conda Environment
Once user ADMIN installs the environment in Object Storage in the Autonomous
Database, as an OML user, you can download, activate, and use it in Python paragraphs in
notebooks and with embedded execution.
• Using Conda Environments with Embedded Python Execution
This topic explains the usage of conda environments by running user-defined functions
(UDFs) in SQL and REST APIs for embedded Python execution.
5-1
Chapter 5
Conda Commands
management system that enables the use of environments containing third-party Python
libraries.
Refer to Conda Interpreter Commands for a table of supported conda commands.
Conda Help
To get help for conda commands, run the command name followed by the --help flag.
Note:
The conda command is not run explicitly because the %conda interpreter provides the
conda context.
%conda
--help
• Get help for a specific conda command. Run the following command to get help with the
install command:
%conda
install --help
Conda Info
The info command displays information about the conda installation, including the conda
version and available channels.
%conda
info
Conda Search
The search command allows the user to search for packages and display associated
information, including the package version and the channel where it resides.
• Search for a specific package. Run the following command to search for the package
scikit-learn.
%conda
search scikit-learn
%conda
search '*scikit*'
5-2
Chapter 5
Conda Commands
%conda
search 'numpy==1.12'
%conda
search 'numpy>=1.12'
%conda
search conda-forge::numpy
%conda
env-lcm --help
Options:
-v, --version Show the version and exit.
--help Show this message and exit.
Commands:
delete Delete a saved conda environment
download Download a saved conda environment
import Create or update a conda environment from saved metadata
list-local-envs List locally available environments for use
list-saved-envs List saved conda environments
upload Save conda environment for later use
5-3
Chapter 5
Conda Commands
1. Use the create command to create an environment myenv and install the Python keras
package.
2. Verify that the new environment is created, and activate the environment.
3. Install, then uninstall an additional Python package, pytorch, in the environment.
4. Deactivate and remove the environment.
Note:
The ADMIN user can access the conda environment from Python and R, but does
not have the capability to run embedded Python and R execution commands.
For help with the conda create command, enter create --help in a %conda paragraph.
List Environments
Start by listing the environments available by default. Conda contains default environments
with some core system libraries and conda dependencies. The active environment is marked
with an asterisk (*).
%conda
env list
# conda environments:
#
base * /usr
conda-pack-env /usr/envs/conda-pack-env
%conda
%conda
env list
# conda environments:
#
myenv /u01/.conda/envs/myenv
5-4
Chapter 5
Conda Commands
base * /usr
conda-pack-env /usr/envs/conda-pack-env
%conda
activate myenv
%conda
env list
# conda environments:
#
myenv * /u01/.conda/envs/myenv
base /usr
conda-pack-env /usr/envs/conda-pack-env
Note:
When conda installs a package into an environment, it also installs any required
dependencies. As shown here, it’s possible to install packages to an existing
environment. As a best practice, to avoid dependency conflicts, simultaneously install
all the packages you need in a specific environment.
%conda
install pytorch
5-5
Chapter 5
Conda Commands
List the packages installed in the current environment, and confirm that keras and pytorch are
installed.
%conda
list
The output above has been truncated and does not show the complete list of packages.
Uninstall Package
Libraries can be uninstalled from an environment using the uninstall command. Let’s uninstall
the pytorch package from the current environment.
%conda
uninstall pytorch
5-6
Chapter 5
Conda Commands
List packages in current environment and verify that the pytorch package was uninstalled.
%conda
list
The output shown below does not contain the pytorch package.
5-7
Chapter 5
Conda Commands
xz 5.2.6 h5eee18b_0
zlib 1.2.13 h5eee18b_0
Removing Environments
If you don’t intend to upload the environment to Object Storage for the OML users in the
database, you can simply exit the notebook session and it will go out of scope. Alternatively, it
can be explicitly removed using the env remove command. Remove the myenv environment
and verify it was removed. A best practice is to deactivate the environment prior to removal.
For help on the env remove command, type env remove --help in the %conda interpreter.
%conda
deactivate
%conda
env list
# conda environments:
#
myrenv /u01/.conda/envs/myrenv
base * /usr
conda-pack-env /usr/envs/conda-pack-env
5-8
Chapter 5
Administrative Tasks for Creating and Saving a Conda Environment
%conda
activate mychannelenv
%conda
%conda
install scipy=0.15.0
Note:
When conda installs a package into an environment, it also installs any required
dependencies. As a best practice, to avoid dependency conflicts, simultaneously
install all the packages you need in a specific environment.
5-9
Chapter 5
Administrative Tasks for Creating and Saving a Conda Environment
Note:
Specify python=3.12 when creating a conda environment for a 3rd-party package to
avoid inconsistencies.
%conda
create -n sbenv -c conda-forge --strict-channel-priority python=3.12.1
seaborn
%conda
The environment is now available for an OML user to download. The uploaded environment
will persist in Object Storage until it is deleted. The application tag is required for use with
embedded execution. For example, OML4Py embedded Python execution works with conda
environments containing the OML4Py tag, and OML4R embedded R execution works with
conda environments containing the OML4R tag.
There is one Object Storage bucket for each data center region. The conda environments are
saved to a folder in Object Storage corresponding to the tenancy and database. The folder is
managed by Autonomous Database and only available to users through OML notebooks.
There is an 8G maximum size for a single conda environment, and no size limit on Object
Storage.
Logged in as a non-administrator user, specify the conda interpreter in a notebook paragraph
using %conda. Get the list of environments saved in Object Storage using the list-saved-envs
command.
%conda
list-saved-envs
Provide the environment name as an argument to the -e parameter and request a list of
packages installed in the environment.
%conda
5-10
Chapter 5
Administrative Tasks for Creating and Saving a Conda Environment
{
"name": "sbenv",
"size": "1.7 GiB",
"description": "Conda environment with seaborn",
"tags": {
"application": "OML4PY"
},
"number_of_installed_packages": 78,
"installed_packages": [
"blas-1.0-mkl",
"bottleneck-1.3.5-py39h7deecbd_0",
"brotli-1.0.9-h5eee18b_7",
"brotli-bin-1.0.9-h5eee18b_7",
"ca-certificates-2022.07.19-h06a4308_0",
"certifi-2022.9.14-py39h06a4308_0",
"cycler-0.11.0-pyhd3eb1b0_0",
"dbus-1.13.18-hb2f20db_0",
"expat-2.4.4-h295c915_0",
"fftw-3.3.9-h27cfd23_1",
"fontconfig-2.13.1-h6c09931_0",
"fonttools-4.25.0-pyhd3eb1b0_0",
"freetype-2.11.0-h70c0345_0",
"giflib-5.2.1-h7b6447c_0",
"glib-2.69.1-h4ff587b_1",
"gst-plugins-base-1.14.0-h8213a91_2",
"gstreamer-1.14.0-h28cd5cc_2",
"icu-58.2-he6710b0_3",
"intel-openmp-2021.4.0-h06a4308_3561",
"jpeg-9e-h7f8727e_0",
"kiwisolver-1.4.2-py39h295c915_0",
"lcms2-2.12-h3be6417_0",
"ld_impl_linux-64-2.38-h1181459_1",
"lerc-3.0-h295c915_0",
"libbrotlicommon-1.0.9-h5eee18b_7",
"libbrotlidec-1.0.9-h5eee18b_7",
"libbrotlienc-1.0.9-h5eee18b_7",
"libdeflate-1.8-h7f8727e_5",
"libffi-3.3-he6710b0_2",
"libgcc-ng-11.2.0-h1234567_1",
"libgfortran-ng-11.2.0-h00389a5_1",
"libgfortran5-11.2.0-h1234567_1",
"libpng-1.6.37-hbc83047_0",
"libstdcxx-ng-11.2.0-h1234567_1",
"libtiff-4.4.0-hecacb30_0",
"libuuid-1.0.3-h7f8727e_2",
"libwebp-1.2.2-h55f646e_0",
"libwebp-base-1.2.2-h7f8727e_0",
"libxcb-1.15-h7f8727e_0",
"libxml2-2.9.14-h74e7548_0",
"lz4-c-1.9.3-h295c915_1",
"matplotlib-3.5.2-py39h06a4308_0",
"matplotlib-base-3.5.2-py39hf590b9c_0",
"mkl-2021.4.0-h06a4308_640",
"mkl-service-2.4.0-py39h7f8727e_0",
5-11
Chapter 5
Administrative Tasks for Creating and Saving a Conda Environment
"mkl_fft-1.3.1-py39hd3c417c_0",
"mkl_random-1.2.2-py39h51133e4_0",
"munkres-1.1.4-py_0",
"ncurses-6.3-h5eee18b_3",
"numexpr-2.8.3-py39h807cd23_0",
"numpy-1.22.3-py39he7a7128_0",
"numpy-base-1.22.3-py39hf524024_0",
"openssl-1.1.1q-h7f8727e_0",
"packaging-21.3-pyhd3eb1b0_0",
"pandas-1.4.4-py39h6a678d5_0",
"pcre-8.45-h295c915_0",
"pillow-9.2.0-py39hace64e9_1",
"pip-22.1.2-py39h06a4308_0",
"pyparsing-3.0.9-py39h06a4308_0",
"pyqt-5.9.2-py39h2531618_6",
"python-3.9.0-hdb3f193_2",
"python-dateutil-2.8.2-pyhd3eb1b0_0",
"pytz-2022.1-py39h06a4308_0",
"qt-5.9.7-h5867ecd_1",
"readline-8.1.2-h7f8727e_1",
"scipy-1.7.3-py39h6c91a56_2",
"seaborn-0.11.2-pyhd3eb1b0_0",
"setuptools-63.4.1-py39h06a4308_0",
"sip-4.19.13-py39h295c915_0",
"six-1.16.0-pyhd3eb1b0_1",
"sqlite-3.39.2-h5082296_0",
"tk-8.6.12-h1ccaba5_0",
"tornado-6.2-py39h5eee18b_0",
"tzdata-2022c-h04d1e81_0",
"wheel-0.37.1-pyhd3eb1b0_0",
"xz-5.2.5-h7f8727e_1",
"zlib-1.2.12-h5eee18b_3",
"zstd-1.5.2-ha4553b6_0"
]
}
Note:
Only user ADMIN can delete an environment saved in an Object Storage.
%conda
delete sbenv
5-12
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
%conda
list-saved-envs
%conda
list-saved-envs -e sbenv
{
"name": "sbenv",
"size": "1.2 GiB",
"description": "Conda environment with seaborn",
"tags": {
"application": "OML4PY"
},
"number_of_installed_packages": 60
}
Note:
The paragraph that contains the download command must be the first paragraph in
the notebook.
%conda
5-13
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
download sbenv
activate sbenv
%conda
list
5-14
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
%python
import pandas as pd
import seaborn as sb
from matplotlib import pyplot as plt
5-15
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
df = sb.load_dataset('iris')
sb.set_style("ticks")
sb.pairplot(df,hue = 'species',diag_kind = "kde",kind = "scatter",palette =
"husl")
plt.show()
Example 5-2 Create a string representation of the function and save it to the OML4Py
script repository
With OML4Py, functions are saved to the script repository using their string definition
representation so they can be run in embedded Python execution. Create a function sb_plot,
after verifying the function behaves as expected, provide it as a string (within triple quotes for
formatting), and save it to the OML4Py script repository. Use the oml.script.create function
to store a single user-defined Python function in the OML4Py script repository. The parameter
5-16
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
"sb_plot" is a string that specifies the name of the user-defined function. The parameter
func=sb_plot is the Python function to run.
%python
oml.script.create("sb_plot", func=sb_plot)
Use the Python API for embedded Python execution to run the user-defined Python function
you saved in the script repository.
%python
oml.do_eval(func="sb_plot", graphics=True)
5-17
Chapter 5
OML User Tasks for Downloading an Available Conda Environment
Note:
At a given time, only one active environment is supported. So, a newly activated
environment would replace an old environment. As a best practice, deactivate an
environment before logging off.
%conda
deactivate
5-18
Chapter 5
Using Conda Environments with Embedded Python Execution
Running UDFs in the SQL and REST APIs for embedded Python execution
The conda environments can be used by OML4Py Python, SQL, and REST APIs. To use the
SQL and REST API for embedded Python execution, the following information is needed.
1. The token URL from the OML service console in Autonomous Database. For more
information on how to obtain the token URL and set the access token see Access and
Authorization Procedures and Functions (Autonomous Database).
2. A script containing a user-defined Python function in the Oracle Machine Learning for
Python (OML4Py) script repository. For information on creating a script and saving it to the
script repository, see About Embedded Python Execution and the Script Repository.
Note:
To use a conda environment when calling OML4Py script execution endpoints,
specify the conda environment in the env_name field when using SQL, and the
envName field when using REST.
Run the Python UDF using the SQL API for embedded Python execution -
Asynchronous mode
Run a SELECT statement that calls the pyqEval function. The PAR_LST argument specifies the
special control argument oml_graphics_flag to true so that the web server can capture
images rendered in the invoked script, the oml_async_flag is set to true to submit the job
asynchronously. In the OUT_FMT argument, the string 'PNG', specifies that the table returns the
response in a table with fixed columns (including an image bytes column). The SCR_NAME
parameter specifies the function sb_plot stored in the script repository. The ENV_NAME specifies
the environment name mysbenv in which the script is called.
%script
NAME VALUE
---------------------------
5-19
Chapter 5
Using Conda Environments with Embedded Python Execution
https://gcc59e2cf7a6f5f-oml4.adb-compdev1.us-
phoenix-1.oraclecloudapps.com/oml/api/py-scripts/v1/jobs/b82947a7-ec3a-4ca6-
bf86-54b3f2b3a4b0
%script
NAME VALUE
---------------------------
https://gcc59e2cf7a6f5f-oml4.adb-compdev1.us-
phoenix-1.oraclecloudapps.com/oml/api/py-scripts/v1/jobs/b82947a7-ec3a-4ca6-
bf86-54b3f2b3a4b0/result
%script
5-20
Chapter 5
Using Conda Environments with Embedded Python Execution
Run the Python UDF using the REST API for embedded Python execution
The following example runs the script named sb_plot in the OML4Py REST API for embedded
Python execution. The environment name parameter envName is set to mysbenv. The
graphicsFlag parameter is set to true to return the PNG image and the data from the function
in JSON format.
5-21
6
Get Started with Oracle Machine Learning for
Python
Learn how to use OML4Py in Oracle Machine Learning Notebooks and how to move data
between the local Python session and the database.
These actions are described in the following topics.
Topics:
• Use OML4Py with Oracle Autonomous Database
OML4Py is available through the Python interpreter in Oracle Machine Learning Notebooks
in Oracle Autonomous Database.
• Use OML4Py with an On-Premises Oracle Database
After the OML4Py server and client components have been installed on your on-premises
Oracle database server and you have installed the OML4Py client on your local system,
you can connect your client Python session to the OML4Py server.
• Move Data Between the Database and a Python Session
With OML4Py functions, you can interact with data structures in a database schema.
• Save Python Objects in the Database
You can save Python objects in OML4Py datastores, which persist in the database.
6-1
Chapter 6
Use OML4Py with an On-Premises Oracle Database
Note:
Before you can create an AutoML connection, a database administrator must first
activate the database-resident connection pool in your on-premises Oracle database
by issuing the following SQL statement:
EXECUTE DBMS_CONNECTION_POOL.START_POOL();
Once started, the connection pool remains in this state until a database administrator
explicitly stops it by issuing the following command:
EXECUTE DBMS_CONNECTION_POOL.STOP_POOL();
Note:
Because an AutoML connection requires more database resources than an
oml.connect connection without AutoML does, you should create an AutoML
connection only if you are going to use the AutoML classes.
6-2
Chapter 6
Use OML4Py with an On-Premises Oracle Database
Note:
• Only one type of connection can be active during a Python session: either a
connection with AutoML enabled or one without it enabled. You can, however,
terminate one type of connection and initiate the other type during the same
Python session. Terminating either type of connection results in the automatic
clean up of any temporary objects created in the session during that connection.
If you want to save any objects that you created in one type of connection before
changing to the other type, then save the objects in an OML4Py datastore before
invoking oml.connect again. You can then reload the objects after reconnecting.
• The oml.connect function uses the cx_Oracle Python package for database
connectivity. In some cases, you might want to use the cx_Oracle.connect
function of that package to connect to a database. That function has advantages
such as the following:
– Allows multiple connections to a multiple databases, which might be useful in
an running Embedded Python Execution functions
– Permits some SQL data manipulation language (DML) operations that are
not available in an oml.connect connection
For information on the cx_Oracle.connect function, see Connecting to Oracle
Database in the cx_Oracle documentation.
Function Description
oml.connect Establishes an OML4Py connection to an Oracle database.
oml.disconnect Terminates the Oracle database connection.
oml.isconnected Indicates whether an active Oracle database connection exists.
oml.check_embed Indicates whether Embedded Python Execution is enabled in the
connected Oracle database.
waltcon = (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1521))
(CONNECT_DATA=(SERVICE_NAME=myserv.example.com)))
6-3
Chapter 6
Use OML4Py with an On-Premises Oracle Database
To be able to use an Oracle wallet to create an OML4Py connection in which you can use
Automatic Machine Learning (AutoML), the wallet must also have a credential that has a
tnsnames.ora entry for a server connection pool such as the following:
Note:
Before you can create an AutoML connection, a database administrator must first
activate the database-resident connection pool in your on-premises Oracle database
by issuing the following SQL statement:
EXECUTE DBMS_CONNECTION_POOL.START_POOL();
Once started, the connection pool remains in this state until a database administrator
explicitly stops it by issuing the following command:
EXECUTE DBMS_CONNECTION_POOL.STOP_POOL();
For examples of creating a connection using an Oracle wallet, see Example 6-6 and
Example 6-7.
To create a basic connection to the database, you can specify arguments to the oml.connect
function in the following mutually exclusive combinations:
• user, password, dsn
• user, password, host, port, sid
• user, password, host, port, service_name
The arguments specify the following values
Parameter Description
user A string specifying a username.
password A string specifying the password for the user.
host A string specifying the name of the host machine on which the OML4Py server
is installed.
6-4
Chapter 6
Use OML4Py with an On-Premises Oracle Database
Parameter Description
port An int or a string specifying the Oracle database port number on the host
machine.
sid A string specifying the system identifier (SID) of the Oracle database.
service_name A string specifying the service name of the Oracle database.
dsn A string specifying a data source name, which can be a TNS entry for the
database or a TNS alias in an Oracle Wallet.
encoding A string specifying the encoding to use for regular database strings.
nencoding A string specifying the encoding to use for national character set database
strings.
automl A string or a boolean specifying whether to enable an Automatic Machine
Learning (AutoML) connection, which uses the database-resident connection
pool.
If there is a connection pool running for a host, port, SID (or service name),
then you can specify that host, port, SID (or service name) and automl=True.
If the dsn argument is a data source name, then the automl argument must
be a data source name for a running connection pool.
If the dsn argument is a TNS alias, then the automl argument must be a TNS
alias for a connection pool specified in an Oracle Wallet.
Note:
Before you can create an AutoML connection, a database administrator must first
activate the database-resident connection pool in your on-premises Oracle database
by issuing the following SQL statement:
EXECUTE DBMS_CONNECTION_POOL.START_POOL();
Once started, the connection pool remains in this state until a database administrator
explicitly stops it by issuing the following command:
EXECUTE DBMS_CONNECTION_POOL.STOP_POOL();
Only one active OML4Py connection can exist at a time during a Python session. If you call
oml.connect when an active connection already exists, then the oml.disconnect function is
implicitly invoked, any temporary objects that you created in the previous session are
discarded, and the new connection is established. Before attempting to connect, you can
discover whether an active connection exists by using the oml.isconnected function.
You explicitly end a connection with the oml.disconnect function. If you do not invoke
oml.disconnect, then the connection is automatically terminated when the Python session
ends.
6-5
Chapter 6
Use OML4Py with an On-Premises Oracle Database
Examples
In the following examples, the values of the some of the arguments to the oml.connect
function are string variables that are not declared in the example. To use any of the following
examples, replace the username, password, port, and variable argument values with the
values for your user and database.
Example 6-1 Connecting with a Host, Port, and SID
This example uses the host, port, and sid arguments. It also shows the use of the
oml.isconnected, oml.check_embed, and oml.disconnect functions.
import oml
6-6
Chapter 6
Use OML4Py with an On-Premises Oracle Database
import oml
import oml
mydsn = "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)(PORT=1521))\
(CONNECT_DATA=(SID=mysid)))"
oml.connect(user='oml_user', password='oml_user_password', dsn=mydsn)
import oml
myinst = "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)\
(PORT=1521))\
(CONNECT_DATA=(SERVICE_NAME=myservice.example.com)))"
oml.connect(user='oml_user', password='oml_user_password', dsn=myinst)
Example 6-5 Creating a Connection with a DSN and with AutoML Enabled
This example creates an OML4Py connection with AutoML enabled. The example connects to
a local database.
import oml
mydsn = "(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=myhost)\
(PORT=1521))(CONNECT_DATA=(SID=mysid)))"
dsn_pool = "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=myhost)\
(PORT=1521))\
(CONNECT_DATA=(SERVICE_NAME=myservice.example.com)\
(SERVER=POOLED)))"
oml.connect(user='oml_user', password='oml_user_password',
dsn=mydsn, automl=dsn_pool)
6-7
Chapter 6
Move Data Between the Database and a Python Session
import oml
See Also:
About Oracle Wallets
import oml
Topics:
• About Moving Data Between the Database and a Python Session
Using the functions described in this topic, you can move data between the your local
Python session and an Oracle database schema.
• Push Local Python Data to the Database
Use the oml.push function to push data from your local Python session to a temporary
table in your Oracle database schema.
• Pull Data from the Database to a Local Python Session
Use the pull method of an oml proxy object to create a Python object in your local Python
session.
• Create a Python Proxy Object for a Database Object
Use the oml.sync function to create a Python object as a proxy for a database table, view,
or SQL statement.
6-8
Chapter 6
Move Data Between the Database and a Python Session
6.3.1 About Moving Data Between the Database and a Python Session
Using the functions described in this topic, you can move data between the your local Python
session and an Oracle database schema.
The following functions create proxy oml Python objects from database objects, create
database tables from Python objects, list the objects in the workspace, and drop tables and
views.
Function Definition
oml.create Creates a persistent database table from a Python data set.
oml.cursor Returns a cx_Oracle cursor object for the current OML4Py database
connection.
oml.dir Returns the names of the oml objects in the workspace.
oml.drop Drops a persistent database table or view.
oml_object.pull Creates a local Python object that contains a copy of the database data
referenced by the oml object.
oml.push Pushes data from the OML Notebooks Python session memory into a
temporary table in the database.
oml.sync Creates an oml.DataFrame proxy object in Python that represents a
database table, view, or query.
With the oml.push function, you can create a temporary database table, and its corresponding
proxy oml.DataFrame object, from a Python object in your local Python session. The temporary
table is automatically deleted when the OML Notebook or OML4Py client connection to the
database ends unless you have saved its proxy object to a datastore before disconnecting.
With the pull method of an oml object, you can create a local Python object that contains a
copy of the database data represented by an oml proxy object.
The oml.push function implicitly coerces Python data types to oml data types and the pull
method on oml objects coerces oml data types to Python data types.
With the oml.create function, you can create a persistent database table and a corresponding
oml.DataFrame proxy object from a Python data set.
With the oml.sync function, you can synchronize the metadata of a database table or view with
the oml object representing the database object.
With the oml.cursor function, you can create a cx_Oracle cursor object for the current
database connection. You can user the cursor to run queries against the database, as shown
in Example 6-13.
6-9
Chapter 6
Move Data Between the Database and a Python Session
that references the table in the Python session. The table exists as long as an oml object exists
that references it, either in the Python session memory or in an OML4Py datastore.
The syntax of the oml.push function is the following:
The x argument may be a pandas.DataFrame or a list of tuples of equal size that contain the
data for the table. For a list of tuples, each tuple represents a row in the table and the column
names are set to COL1, COL2, and so on.
The SQL data types of the columns are determined by the following:
• OML4Py determines default column types by looking at 20 random rows sampled from the
table. For tables with less than 20 rows, it uses all rows in determining the column type.
If the values in a column are all None, or if a column has inconsistent data types that are
not None in the sampled rows, then a default column type cannot be determined and a
ValueError is raised unless a SQL type for the column is specified by the dbtypes
argument.
• For numeric columns, the oranumber argument, which is a bool, determines the SQL data
type. If True (the default), then the SQL data type is NUMBER. If False, then the data type is
BINARY_DOUBLE.
If the data in x contains NaN values, then you should set oranumber to False.
• For string columns, the default type is VARCHAR2(4000).
• For bytes columns, the default type is BLOB.
With the dbtypes argument, you can specify the SQL data types for the table columns. The
values of dbtypes may be either a dict that maps str to str values or a list of str values. For
a dict, the keys are the names of the columns.
import oml
import pandas as pd
# Push the data set to a database table with the specified dbtypes
# for each column.
oml_df = oml.push(pd_df, dbtypes = {'numeric': 'BINARY_DOUBLE',
'string':'CHAR(1)',
'bytes':'RAW(1)'})
6-10
Chapter 6
Move Data Between the Database and a Python Session
df = oml_df.pull()
type(oml_df2)
6-11
Chapter 6
Move Data Between the Database and a Python Session
The pull method of an oml object returns a Python object of the same type. The object
contains a copy of the database data referenced by the oml object. The Python object exists in-
memory in the Python session in OML Notebooks or in your OML4Py client Python session..
Note:
You can pull data to a local pandas.DataFrame only if the data can fit into the local
Python session memory. Also, even if the data fits in memory but is still very large,
you may not be able to perform many, or any, Python functions in the local Python
session.
import oml
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = [‘SEPAL_LENGTH’,‘SEPAL_WIDTH’,
‘PETAL_LENGTH’,‘PETAL_WIDTH’])
y = pd.DataFrame(list(map(lambda x: {0: ‘setosa’, 1: ‘versicolor’, 2:
‘virginica’}[x], iris.target)), columns = [‘SPECIES’])
iris_df = pd.concat([x, y], axis=1)
6-12
Chapter 6
Move Data Between the Database and a Python Session
>>>
>>> # Display the data type of oml_iris.
... type(oml_iris)
<class 'oml.core.frame.DataFrame'>
>>>
>>> # Pull the data from oml_iris into local memory.
... iris = oml_iris.pull()
>>>
>>> # Display the data type of iris.
... type(iris)
<class 'pandas.core.frame.DataFrame'>
>>>
>>> # Drop the IRIS database table.
... oml.drop('IRIS')
You can use the proxy oml.DataFrame object to select data from the table. When you run a
Python function that selects data from the table, the function returns the current data from the
database object. However, if some application has added a column to the table, or has
otherwise changed the metadata of the database object, the oml.DataFrame proxy object does
not reflect such a change until you again invoke oml.sync for the database object.
Tip:
To conserve memory resources and save time, you should only create proxies for the
tables that you want to use in your Python session.
You can use the oml.dir function to list the oml.DataFrame proxy objects in the environment
for a schema.
The syntax of the oml.sync function is the following:
The schema argument in oml.sync specifies the name of the schema where the database
object exists. If schema=None, which is the default, then the current schema is used.
6-13
Chapter 6
Move Data Between the Database and a Python Session
To create an oml.DataFrame object for a table, use the table parameter. To create one for a
view, use the view parameter. To create one for a SQL SELECT statement, use the query
parameter. You can only specify one of these parameters in an oml.sync invocation: the
argument for one of the parameters must be a string and the argument for each of the other
two parameters must be None.
Creating a proxy object for a query enables you to create an oml.DataFrame object without
creating a view in the database. This can be useful when you do not have the CREATE VIEW
system privilege for the current schema. You cannot use the schema parameter and the query
parameter in the same ore.sync invocation.
With the regex_match argument, you can specify whether the value of the table or view
argument is a regular expression. If regex_match=True, then oml.sync creates oml.DataFrame
objects for each database object that matches the pattern. The matched tables or views are
returned in a dict with the table or view names as keys.
import oml
oml_coffee.head()
6-14
Chapter 6
Move Data Between the Database and a Python Session
oml_cdat.keys()
oml_cdat['COFFEE'].columns
oml_cdat['COLOR'].columns
6-15
Chapter 6
Move Data Between the Database and a Python Session
Note:
When creating a table in Oracle Machine Learning for Python, if you use lowercase
or mixed case for the name of the table, then you must use the same lowercase or
mixed case name in double quotation marks when using the table in a SQL query or
function. If, instead, you use an all uppercase name when creating the table, then the
table name is case-insensitive: you can use uppercase, lowercase, or mixed case
when using the table without using double quotation marks. The same is true for
naming columns in a table.
You can delete the persistent table in a database schema with the oml.drop function.
Caution:
Use the oml.drop function to delete a persistent database table. Use the del
statement to remove an oml.DataFrame proxy object and its associated temporary
table; del does not delete a persistent table.
The x argument is a pandas.DataFrame or a list of tuples of equal size that contain the data for
the table. For a list of tuples, each tuple represents a row in the table and the column names
are set to COL1, COL2, and so on. The table argument is a string that specifies a name for
the table.
The SQL data types of the columns are determined by the following:
• OML4Py determines default column types by looking at 20 random rows sampled from the
table. For tables with less than 20 rows, it uses all rows in determining the column type.
If the values in a column are all None, or if a column has inconsistent data types that are
not None in the sampled rows, then a default column type cannot be determined and a
ValueError is raised unless a SQL type for the column is specified by the dbtypes
argument.
• For numeric columns, the oranumber argument, which is a bool, determines the SQL data
type. If True (the default), then the SQL data type is NUMBER. If False, then the data type
is BINARY DOUBLE.
If the data in x contains NaN values, then you should set oranumber to False.
6-16
Chapter 6
Move Data Between the Database and a Python Session
The append argument is a bool that specifies whether to append the x data to an existing table.
import oml
import pandas as pd
6-17
Chapter 6
Move Data Between the Database and a Python Session
'tbl3'")
cr.fetchall()
6-18
Chapter 6
Move Data Between the Database and a Python Session
>>>
>>> # Create a table with oranumber set to False.
... oml_df2 = oml.create(df, table = 'tbl2', oranumber = False)
>>>
>>> # Show the SQL data types of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name =
'tbl2'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('BLOB',)]
>>>
>>> # Create a table with dbtypes specified as a dict mapping column names
... # to SQL data types.
... oml_df3 = oml.create(df, table = 'tbl3',
... dbtypes = {'numeric': 'BINARY_DOUBLE',
... 'bytes':'RAW(1)'})
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name =
'tbl3'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('VARCHAR2',), ('RAW',)]
>>>
>>> # Create a table with dbtypes specified as a list of SQL data types
... # matching the order of the columns.
... oml_df4 = oml.create(df, table = 'tbl4',
... dbtypes = ['BINARY_DOUBLE','CHAR(1)', 'RAW(1)'])
>>>
>>> # Show the SQL data type of the columns
... _ = cr.execute("select data_type from all_tab_columns where table_name =
'tbl4'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Create a table from a list of tuples.
... lst = [(1, None, b'a'), (1.4, None, b'b'), (-4, 'a', b'c'),
... (3.145, 'a', b'c'), (5, 'a', b'd'), (None, 'b', b'e')]
>>> oml_df5 = oml.create(lst, table ='tbl5',
... dbtypes = ['BINARY_DOUBLE','CHAR(1)','RAW(1)'])
>>>
>>> # Show the SQL data type of the columns.
... _ = cr.execute("select data_type from all_tab_columns where table_name =
'tbl5'")
>>> cr.fetchall()
[('BINARY_DOUBLE',), ('CHAR',), ('RAW',)]
>>>
>>> # Close the cursor.
... cr.close()
>>>
>>> # Drop the tables
... oml.drop('tbl1')
>>> oml.drop('tbl2')
>>> oml.drop('tbl3')
>>> oml.drop('tbl4')
>>> oml.drop('tbl5')
6-19
Chapter 6
Save Python Objects in the Database
Topics:
• About OML4Py Datastores
In an OML4Py datastore, you can store Python objects, which you can then use in
subsequent Python sessions; you can also make them available to other users or
programs.
• Save Objects to a Datastore
The oml.ds.save function saves one or more Python objects to a datastore.
• Load Saved Objects From a Datastore
The oml.ds.load function loads one or more Python objects from a datastore into a
Python session.
• Get Information About Datastores
The oml.ds.dir function provides information about datastores.
• Get Information About Datastore Objects
The oml.ds.describe function provides information about the objects in a datastore.
• Delete Datastore Objects
The oml.ds.delete function deletes datastores or objects in a datastore.
• Manage Access to Stored Objects
The oml.grant and oml.revoke functions grant or revoke the read privilege to datastores
or to user-defined Python functions in the script repository.
6-20
Chapter 6
Save Python Objects in the Database
Function Description
oml.ds.delete Deletes one or more datastores or Python objects from a datastore.
oml.ds.dir Lists the datastores available to the current user.
oml.ds.load Loads Python objects from a datastore into the user’s session.
oml.ds.save Saves Python objects to a named datastore in the user’s database schema.
The following table lists the Python functions for managing access to datastores and datastore
objects.
Function Description
oml.grant Grants read privilege permission to another user to a datastore or a user-
defined Python function in the script repository owned by the current user.
oml.revoke Revokes the read privilege permission that was granted to another user to a
datastore or a user-defined Python function in the script repository owned by
the current user.
The objs argument is a dict that contains the name and object pairs to save to the datastore
specified by the name argument.
With the description argument, you can provide some descriptive text that appears when you
get information about the datastore. The description parameter has no effect when used with
the append parameter.
With the grantable argument, you can specify whether the read privilege to the datastore may
be granted to other users.
If you set the overwrite argument to TRUE, then you can replace an existing datastore with
another datastore of the same name.
If you set the append argument to TRUE, then you can add objects to an existing datastore. The
overwrite and append arguments are mutually exclusive.
6-21
Chapter 6
Save Python Objects in the Database
If you set compression to True, then the serialized Python objects are compressed in the
datastore.
Example 6-14 Saving Python Objects to a Datastore
This example demonstrates creating datastores.
import oml
from sklearn import datasets
from sklearn import linear_model
import pandas as pd
# Load three data sets and create oml.DataFrame objects for them.
wine = datasets.load_wine()
x = pd.DataFrame(wine.data, columns = wine.feature_names)
y = pd.DataFrame(wine.target, columns = ['Class'])
diabetes = datasets.load_diabetes()
x = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
y = pd.DataFrame(diabetes.target, columns=['disease_progression'])
oml_diabetes = oml.create(pd.concat([x, y], axis=1),
table = "DIABETES")
oml_diabetes.columns
boston = datasets.load_boston()
x = pd.DataFrame(boston.data, columns = boston.feature_names.tolist())
y = pd.DataFrame(boston.target, columns = ['Value'])
oml_boston = oml.create(pd.concat([x, y], axis=1), table = "BOSTON")
oml_boston.columns
6-22
Chapter 6
Save Python Objects in the Database
# Save the native Python object and the oml proxy object to a datastore
# and allow the read privilege to be granted to them.
oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
name="ds_pymodel", grantable=True)
# List the datastores to which the read privilege has been granted.
oml.ds.dir(dstype="grant")
6-23
Chapter 6
Save Python Objects in the Database
... oml.ds.save(objs={'oml_diabetes':oml_diabetes},
... name="ds_pydata", append=True)
>>>
>>> # Save the oml_wine proxy object to another datastore.
... oml.ds.save(objs={'oml_wine':oml_wine},
... name="ds_wine_data", description = "wine dataset")
>>>
>>> # Create regression models using sklearn and oml.
... # The regr1 linear model is a native Python object.
... regr1 = linear_model.LinearRegression()
>>> regr1.fit(boston.data, boston.target)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
>>> # The regr2 GLM model is an oml proxy object.
... regr2 = oml.glm("regression")
>>> X = oml_boston.drop('Value')
>>> y = oml_boston['Value']
>>> regr2 = regr2.fit(X, y)
>>>
>>> # Save the native Python object and the oml proxy object to a datastore
... # and allow the read privilege to be granted to them.
... oml.ds.save(objs={'regr1':regr1, 'regr2':regr2},
... name="ds_pymodel", grantable=True)
>>>
>>> # Grant the read privilege to the ds_pymodel datastore to every user.
... oml.grant(name="ds_pymodel", typ="datastore", user=None)
>>>
>>> # List the datastores to which the read privilege has been granted.
... oml.ds.dir(dstype="grant")
datastore_name grantee
0 ds_pymodel PUBLIC
The name argument specifies the datastore that contains the objects to load.
With the objs argument, you identify a specific object or a list of objects to load.
With the boolean to_globals parameter, you can specify whether the objects are loaded to a
global workspace or to a dictionary object. If the argument to to_globals is True, then
oml.ds.load function loads the objects into the global workspace. If the argument is False,
then the function returns a dict object that contains pairs of object names and values.
The oml.ds.load function raises a ValueError if the name argument is an empty string or if the
owner of the datastore is not the current user and the read privilege for the datastore has not
been granted to the current user.
6-24
Chapter 6
Save Python Objects in the Database
import oml
# Load the named Python object from the datastore to the global workspace.
oml.ds.load(name="ds_pymodel", objs=["regr2"])
# Load the named Python object from the datastore to the user's workspace.
oml.ds.load(name="ds_pymodel", objs=["regr1"], to_globals=False)
Optionally, you can use the regex_match and dstype parameters to get information about
datastores with certain characteristics. The valid arguments for dstype are the following:
Argument Description
all Lists all of the datastores to which the current user has the read
privilege.
grant Lists the datastores for which the current user has granted read
privilege to other users.
granted Lists the datastores for which other users have granted read
privilege to the current user.
6-25
Chapter 6
Save Python Objects in the Database
Argument Description
grantable Lists the datastores that the current user can grant the read privilege
to.
user Lists the datastores created by current user.
private Lists the datastores that the current user cannot grant the read
privileges to.
The oml.ds.dir function returns a pandas.DataFrame object that contains different columns
depending on which dstype argument you use. The following table lists the arguments and the
columns returned for the values supplied.
import oml
# Show datastores to which other users have been granted the read
# privilege.
oml.ds.dir(dstype="grant")
6-26
Chapter 6
Save Python Objects in the Database
oml.ds.describe(name, owner=None))
The owner argument is a string that specifies the owner of the datastore or None (the default). If
you do not specify the owner, then the function returns information about the datastore if it is
owned by the current user.
The oml.ds.describe function returns a pandas.DataFrame object, each row of which
represents an object in the datastore. The columns of the DataFrame are the following:
• The current user is not the owner of the datastore and has not been granted read privilege
for the datastore.
• The datastore does not exist.
Example 6-17 Getting Information About Datastore Objects
This example demonstrates the using the oml.ds.describe function. For the creation of the
datastore used in this example, see Example 6-14.
import oml
6-27
Chapter 6
Save Python Objects in the Database
Use the oml.ds.delete function to delete one or more datastores in your database schema or
to delete objects in a datastore.
The syntax of oml.ds.delete is the following:
• A string that specifies the name of the datastore to modify or delete, or a regular
expression that matches the datastores to delete.
• A list of str objects that name the datastores from which to delete objects.
The objs parameter specifies the objects to delete from a datastore. The argument to the objs
parameter may be one of the following:
• A string that specifies the object to delete from one or more datastores, or a regular
expression that matches the objects to delete.
• None (the default), which deletes the entire datastore or datastores.
The regex_match parameter is a bool that indicates whether the name or objs arguments are
regular expressions. The default value is False. The regex_match parameter operates as
follows:
• If regex_match=False and if name is not None, and:
– If objs=None, then oml.ds.delete deletes the datastore or datastores specified in the
name argument.
– If you specify one or more datastores with the name argument and one or more
datastore objects with the objs argument, then oml.ds.delete deletes the specified
Python objects from the datastores.
6-28
Chapter 6
Save Python Objects in the Database
• If regex_match=True and:
– If objs=None, then oml.ds.delete deletes the datastores you specified in the name
argument.
– If the name argument is a string and you specify one or more datastore objects with the
objs argument, then oml.ds.delete deletes from the datastore the objects whose
names match the regular expression specified in the objs argument.
– If the name argument is a list of str objects, then the objs argument must be a list of
str objects of the same length as name, and oml.ds.delete deletes from the
datastores the objects whose names match the regular expressions specified in objs.
This function raises an error if the following occur:
• A specified datastore does not exist.
• Argument regex_match is False and argument name is a list of str objects larger than 1
and argument objs is not None.
• Argument regex_match is True and arguments name and objs are lists that are not the
same length.
Example 6-18 Deleting Datastore Objects
This example demonstrates the using the oml.ds.delete function. For the creation of the
datastores used in this example, see Example 6-14.
import oml
# Delete a datastore.
oml.ds.delete(name="ds_pydata")
6-29
Chapter 6
Save Python Objects in the Database
The name argument is a string that specifies the name of the user-defined Python function in
the script repository or the name of a datastore.
The typ parameter must be specified. The argument is a string that is either ‘datastore’ or
‘pyqscript’.
The user argument is a string that specifies the user to whom read privilege to the named
datastore or user-defined Python function is granted or from whom it is revoked, or None (the
default). If you specify None, then the read privilege is granted to or revoked from all users.
6-30
Chapter 6
Save Python Objects in the Database
more the datastores to which read privilege has been granted. For the creation of the
datastores used in this example, see Example 6-14.
import oml
# Show datastores to which other users have been granted read privilege.
oml.ds.dir(dstype="grant")
oml.ds.dir(dstype="grant")
# List the user-defined Python functions available only to the current user.
oml.script.dir(sctype='user')
# Grant the read privilege to the MYLM user-defined Python function to the
6-31
Chapter 6
Save Python Objects in the Database
user SH.
oml.grant(name="MYLM", typ="pyqscript", user="SH")
# List the user-defined Python functions to which read privilege has been
granted.
oml.script.dir(sctype="grant")
# Revoke the read privilege to the MYLM user-defined Python function from the
user SH.
oml.revoke(name="MYLM", typ="pyqscript", user="SH")
>>> # List the user-defined Python functions available only to the current
user.
oml.script.dir(sctype='user')
name script
0 MYLM def build_lm1(dat):\n from sklearn import lin...
>>>
>>># Grant the read privilege to the MYLM user-defined Python function to the
user SH.
...oml.grant(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the user-defined Python functions to which read privilege has been
granted.
... oml.script.dir(sctype="grant")
name grantee
0 MYLM SH
>>>
>>> # Revoke the read privilege to the MYLM user-defined Python function from
the user SH.
... oml.revoke(name="MYLM", typ="pyqscript", user="SH")
>>>
>>> # List the granted user-defined Python functions again to see if the
revocation was successful.
... oml.script.dir(sctype="grant")
Empty DataFrame
Columns: [name, grantee]
Index: []
6-32
7
Prepare and Explore Data
Use OML4Py methods to prepare data for analysis and to perform exploratory analysis of the
data.
Methods of the OML4Py data type classes make it easier for you to prepare very large
enterprise database-resident data for modeling. These methods are described in the following
topics.
Topics:
• Prepare Data
Using methods of OML4Py data type classes, you can prepare data for analysis in the
database, as described in the following topics.
• Explore Data
OML4Py provides methods that enable you to perform exploratory data analysis and
common statistical operations.
• Render Graphics
OML4Py provides functions for rendering graphical displays of data.
7-1
Chapter 7
Prepare Data
Python session where, for example, you can use third-party Python packages to perform other
operations.
The following table lists methods with which you can perform common data preparation tasks
and indicates whether the OML4Py data type class supports the method.
7-2
Chapter 7
Prepare Data
7-3
Chapter 7
Prepare Data
The examples in this section demonstrate selecting data from an oml.DataFrame object by
rows, by columns, and by value.
The examples use the oml_iris object created by the following code, which imports the
sklearn.datasets package and loads the iris data set. It creates the x and y variables, and
then creates the persistent database table IRIS and the oml.DataFrame object oml.iris as a
proxy for the table.
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data, columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x: {0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
7-4
Chapter 7
Prepare Data
>>>
>>> # Display the last 10 rows.
... oml_iris.tail(10)
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
0 6.7 3.1 5.6 2.4 virginica
1 6.9 3.1 5.1 2.3 virginica
2 5.8 2.7 5.1 1.9 virginica
3 6.8 3.2 5.9 2.3 virginica
4 6.7 3.3 5.7 2.5 virginica
5 6.7 3.0 5.2 2.3 virginica
6 6.3 2.5 5.0 1.9 virginica
7 6.5 3.0 5.2 2.0 virginica
8 6.2 3.4 5.4 2.3 virginica
9 5.9 3.0 5.1 1.8 virginica
7-5
Chapter 7
Prepare Data
displays its first three rows. Finally, the example selects columns from oml_iris by data types,
creates iris_projected3, and displays its first three rows.
# Select all rows with columns whose indices are in the range [1, 4).
iris_projected2 = oml_iris[:, 1:4]
iris_projected2.head(3)
7-6
Chapter 7
Prepare Data
["Sepal_Length", "Petal_Length"]]
len(oml_iris_filtered1)
oml_iris_filtered1.head(3)
# Select all rows in which petal length is less than 1.5 and
# sepal length is larger than 5.0.
oml_iris_filtered3 = oml_iris[(oml_iris["Petal_Length"] < 1.5) &
(oml_iris["Sepal_Length"] > 5.0), :]
len(oml_iris_filtered3)
oml_iris_filtered3.head()
>>> # Select sepal length and petal length where petal length
... # is less than 1.5.
... oml_iris_filtered1 = oml_iris[oml_iris["Petal_Length"] < 1.5,
... ["Sepal_Length", "Petal_Length"]]
>>> len(oml_iris_filtered1)
24
>>> oml_iris_filtered1.head(3)
Sepal_Length Petal_Length
0 5.1 1.4
1 4.9 1.4
2 4.7 1.3
>>>
>>> ### Using the AND and OR conditions in filtering.
... # Select all rows in which petal length is less than 1.5 or
... # sepal length is 5.0.
... oml_iris_filtered2 = oml_iris[(oml_iris["Petal_Length"] < 1.5) |
... (oml_iris["Sepal_Length"] == 5.0), :]
>>> len(oml_iris_filtered2)
30
>>> oml_iris_filtered2.head(3)
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
>>>
>>> # Select all rows in which petal length is less than 1.5
... # and sepal length is larger than 5.0.
... oml_iris_filtered3 = oml_iris[(oml_iris["Petal_Length"] < 1.5) &
... (oml_iris["Sepal_Length"] > 5.0), :]
>>> len(oml_iris_filtered3)
7
>>> oml_iris_filtered3.head()
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
0 5.1 3.5 1.4 0.2 setosa
7-7
Chapter 7
Prepare Data
import oml
import pandas as pd
7-8
Chapter 7
Prepare Data
>>>
>>> # Append an oml.Float series object to another.
... num1 = oml_df['id']
>>> num2 = oml_df['num']
>>> num1.append(num2)
[1, 2, 3, 4, 5, 4, 3, 6.7, 7.2, 5]
>>>
>>> # Append an oml.DataFrame object to another.
... x = oml_df[['id', 'val']] # 1st column oml.Float, 2nd column oml.String
>>> y = oml_df[['num', 'ch']] # 1st column oml.Float, 2nd column oml.String
>>> x.append(y)
id val
0 1.0 a
1 2.0 b
2 3.0 c
3 4.0 d
4 5.0 e
5 4.0 p
6 3.0 q
7 6.7 r
8 7.2 a
9 5.0 b
import oml
import pandas as pd
from collections import OrderedDict
7-9
Chapter 7
Prepare Data
w = (oml_df['num']*2).exp().round(decimals=2)
y.concat({'round(exp(2*num))':w})
>>> # Create two oml.DataFrame objects and combine the objects column-wise.
... x = oml_df[['id', 'val']]
>>> y = oml_df[['num', 'ch']]
>>> x.concat(y)
id val num ch
0 1 a 4.0 p
1 2 b 3.0 q
2 3 c 6.7 r
3 4 d 7.2 a
4 5 e 5.0 b
>>>
>>> # Create an oml.Float object with the rounded exponential of two times
... # the values in the num column of the oml_df object, then
... # concatenate it with the oml.DataFrame object y using a new column name.
... w = (oml_df['num']*2).exp().round(decimals=2)
>>> y.concat({'round(exp(2*num))':w})
num ch round(exp(2*num))
0 4.0 p 2980.96
1 3.0 q 403.43
2 6.7 r 660003.22
3 7.2 a 1794074.77
4 5.0 b 22026.47
>>>
>>> # Concatenate object x with multiple objects and turn on automatic
... # name conflict resolution.
... z = oml_df[:,'id']
>>> x.concat([z, w, y], auto_name=True)
id val id3 num num5 ch
0 1 a 1 2980.96 4.0 p
1 2 b 2 403.43 3.0 q
2 3 c 3 660003.22 6.7 r
3 4 d 4 1794074.77 7.2 a
7-10
Chapter 7
Prepare Data
4 5 e 5 22026.47 5.0 b
>>>
>>> # Concatenate multiple oml data objects and perform customized renaming.
... x.concat(OrderedDict([('ID',z), ('round(exp(2*num))',w), ('New_',y)]))
id val ID round(exp(2*num)) New_num New_ch
0 1 a 1 2980.96 4.0 p
1 2 b 2 403.43 3.0 q
2 3 c 3 660003.22 6.7 r
3 4 d 4 1794074.77 7.2 a
4 5 e 5 22026.47 5.0 b
import oml
import pandas as pd
x = oml_df[['id', 'val']]
y = oml_df[['num', 'ch']]
7-11
Chapter 7
Prepare Data
7-12
Chapter 7
Prepare Data
import pandas as pd
import oml
# Drop rows that have the same value in column 'string1' and 'string2'.
oml_df.drop_duplicates(subset=['string1', 'string2'])
7-13
Chapter 7
Prepare Data
0 -4.000 a z
1 -4.000 a z
2 5.432 a z
>>>
>>> # Drop rows in which all column values are missing.
... oml_df.dropna(how='all')
numeric string1 string2
0 1.000 None x
1 1.400 None None
2 -4.000 a z
3 -4.000 a z
4 5.432 a z
5 NaN b x
>>>
>>> # Drop rows in which any numeric column values are missing.
... oml_df.dropna(how='any', subset=['numeric'])
numeric string1 string2
0 1.000 None x
1 1.400 None None
2 -4.000 a z
3 -4.000 a z
4 5.432 a z
>>>
>>> # Drop duplicate rows.
... oml_df.drop_duplicates()
numeric string1 string2
0 5.432 a z
1 1.000 None x
2 -4.000 a z
3 NaN b x
4 1.400 None None
5 NaN None None
>>>
>>> # Drop rows that have the same value in columns 'string1' and 'string2'.
... oml_df.drop_duplicates(subset=['string1', 'string2'])
numeric string1 string2
0 -4.0 a z
1 1.4 None None
2 1.0 None x
3 NaN b x
>>>
>>> # Drop the column 'string2'.
... oml_df.drop('string2')
numeric string1
0 1.000 None
1 1.400 None
2 -4.000 a
3 -4.000 a
4 5.432 a
5 NaN b
6 NaN None
7-14
Chapter 7
Prepare Data
In analyzing large data sets, a typical operation is to randomly partition the data set into
subsets for training and testing purposes, which you can do with these methods. You can also
sample data with the split method.
import oml
import pandas as pd
from sklearn import datasets
digits = datasets.load_digits()
pd_digits = pd.DataFrame(digits.data,
columns=['IMG'+str(i) for i in
range(digits['data'].shape[1])])
pd_digits = pd.concat([pd_digits,
pd.Series(digits.target,
name = 'target')],
axis = 1)
oml_digits = oml.push(pd_digits)
# Verify that the different categories of digits (digits 0~9) are present
# in only one of the splits generated by hashing on the category column.
[split['target'].drop_duplicates().sort_values().pull()
for split in splits]
7-15
Chapter 7
Prepare Data
7-16
Chapter 7
Explore Data
7-17
Chapter 7
Explore Data
Table 7-2 (Cont.) Data Exploration Methods Supported by Data Type Classes
7-18
Chapter 7
Explore Data
Table 7-2 (Cont.) Data Exploration Methods Supported by Data Type Classes
For details about the function arguments, invoke help(oml.DataFrame.corr) or see Oracle
Machine Learning for Python API Reference.
Example 7-9 Performing Basic Correlation Calculations
This example first creates a temporary database table, with its corresponding proxy
oml.DataFrame object oml_df1, from the pandas.DataFrame object df. It then verifies the
correlation computed between columns A and B, which gives 1, as expected. The values in B
are twice the values in A element-wise. The example also changes a value field in df and
creates a NaN entry. It then creates a temporary database table, with the corresponding proxy
oml.DataFrame object oml_df2. Finally, it invokes the corr method on oml_df2 with skipna set
to True ( the default) and then False to compare the results.
import oml
import pandas as pd
7-19
Chapter 7
Explore Data
7-20
Chapter 7
Explore Data
The pivot_table method converts a data set into a pivot table. Due to the database 1000
column limit, pivot tables with more than 1000 columns are automatically truncated to display
the categories with the most entries for each column value.
For details about the method arguments, invoke help(oml.DataFrame.crosstab) or
help(oml.DataFrame.pivot_table), or see Oracle Machine Learning for Python API
Reference.
Example 7-10 Producing Cross-Tabulation and Pivot Tables
This example demonstrates the use of the crosstab and pivot_table methods.
import pandas as pd
import oml
x = pd.DataFrame({
'GENDER': ['M', 'M', 'F', 'M', 'F', 'M', 'F', 'F',
None, 'F', 'M', 'F'],
'HAND': ['L', 'R', 'R', 'L', 'R', None, 'L', 'R',
'R', 'R', 'R', 'R'],
'SPEED': [40.5, 30.4, 60.8, 51.2, 54, 29.3, 34.1,
39.6, 46.4, 12, 25.3, 37.5],
'ACCURACY': [.92, .94, .87, .9, .85, .97, .96, .93,
.89, .84, .91, .95]
})
x = oml.push(x)
# For each gender value and across all entries, find the ratio of entries
# with different hand values.
x.crosstab('GENDER', 'HAND', pivot = True, margins = True, normalize = 0)
# Find the mean speed across all gender and hand combinations.
x.pivot_table('GENDER', 'HAND', 'SPEED')
# Find the median accuracy and speed for every gender and hand combination.
x.pivot_table('GENDER', 'HAND', aggfunc = oml.DataFrame.median)
# Find the max and min speeds for every gender and hand combination and
# across all combinations.
x.pivot_table('GENDER', 'HAND', 'SPEED',
aggfunc = [oml.DataFrame.max, oml.DataFrame.min],
margins = True)
7-21
Chapter 7
Explore Data
7-22
Chapter 7
Explore Data
max(SPEED)_(All) \
0 None NaN 46.4 NaN
46.4
1 F 34.1 60.8 NaN
60.8
2 M 51.2 30.4 29.3
51.2
3 All 51.2 60.8 29.3
60.8
import pandas as pd
import oml
7-23
Chapter 7
Explore Data
7-24
Chapter 7
Explore Data
<class 'oml.core.float.Float'>
>>> oml_cart.concat({'Egg_pattern': egg_pattern})
Item_name Item_type Quantity Unit_price Price Egg_pattern
0 paper_towel grocery 1.0 1.19 1.19 0
1 ground_pork meat 2.6 2.79 7.25 0
2 tofu grocery 4.0 0.99 3.96 0
3 eggs dairy 1.0 2.49 2.49 1
4 pork_loin meat 1.9 3.19 6.06 0
5 whole_milk dairy 1.0 2.50 2.50 0
6 egg_custard bakery 1.0 3.99 3.99 1
>>>
>>> # Find the start index of substring 'pork' in the 'Item_name' column.
... pork_startInd = oml_cart['Item_name'].find('pork')
>>> type(pork_startInd)
<class 'oml.core.float.Float'>
>>> oml_cart.concat({'Pork_startInd': pork_startInd})
Item_name Item_type Quantity Unit_price Price Pork_startInd
0 paper_towel grocery 1.0 1.19 1.19 -1
1 ground_pork meat 2.6 2.79 7.25 7
2 tofu grocery 4.0 0.99 3.96 -1
3 eggs dairy 1.0 2.49 2.49 -1
4 pork_loin meat 1.9 3.19 6.06 0
5 whole_milk dairy 1.0 2.50 2.50 -1
6 egg_custard bakery 1.0 3.99 3.99 -1
>>>
>>> # Check whether items are of grocery category.
... is_grocery=oml_cart['Item_type']=='grocery'
>>> type(is_grocery)
<class 'oml.core.boolean.Boolean'>
>>> oml_cart.concat({'Is_grocery': is_grocery})
Item_name Item_type Quantity Unit_price Price Is_grocery
0 paper_towel grocery 1.0 1.19 1.19 True
1 ground_pork meat 2.6 2.79 7.25 False
2 tofu grocery 4.0 0.99 3.96 True
3 eggs dairy 1.0 2.49 2.49 False
4 pork_loin meat 1.9 3.19 6.06 False
5 whole_milk dairy 1.0 2.50 2.50 False
6 egg_custard bakery 1.0 3.99 3.99 False
>>>
>>> # Calculate the length of item names.
... name_length=oml_cart['Item_name'].len()
>>> type(name_length)
<class 'oml.core.float.Float'>
>>> oml_cart.concat({'Name_length': name_length})
Item_name Item_type Quantity Unit_price Price Name_length
0 paper_towel grocery 1.0 1.19 1.19 11
1 ground_pork meat 2.6 2.79 7.25 11
2 tofu grocery 4.0 0.99 3.96 4
3 eggs dairy 1.0 2.49 2.49 4
4 pork_loin meat 1.9 3.19 6.06 9
5 whole_milk dairy 1.0 2.50 2.50 10
6 egg_custard bakery 1.0 3.99 3.99 11
>>>
>>> # Get the ceiling, floor, exponential, logarithm and square root
... # of the 'Price' column.
... oml_cart['Price'].ceil()
7-25
Chapter 7
Explore Data
[2, 8, 4, 3, 7, 3, 4]
>>> oml_cart['Price'].floor()
[1, 7, 3, 2, 6, 2, 3]
>>> oml_cart['Price'].exp()
[3.2870812073831184, 1408.1048482046956, 52.45732594909905,
12.061276120444719, 428.37543685928694, 12.182493960703473, 54.05488936332659]
>>> oml_cart['Price'].log()
[0.173953307123438, 1.9810014688665833, 1.3762440252663892,
0.9122827104766162, 1.801709800081223, 0.9162907318741551, 1.3837912309017721]
>>> oml_cart['Price'].sqrt()
[1.0908712114635715, 2.692582403567252, 1.98997487421324, 1.57797338380595,
2.4617067250182343, 1.5811388300841898, 1.997498435543818]
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Modify the data set by replacing a few entries with NaNs to test
# how the na_position parameter works in the sort_values method.
Iris = oml_iris.pull()
Iris['Sepal_Width'].replace({3.5: None}, inplace=True)
Iris['Petal_Length'].replace({1.5: None}, inplace=True)
Iris['Petal_Width'].replace({2.3: None}, inplace=True)
7-26
Chapter 7
Explore Data
# Sort the last 5 rows of the iris data set first by Petal_Length
# then by Petal_Width. By default, rows with NaNs are placed
# after the other rows when the sort keys are the same.
oml_iris2.tail().sort_values(by = ['Petal_Length', 'Petal_Width'])
# Sort the last 5 rows of the iris data set first by Petal_Length
# and then by Petal_Width. When the values in these two columns
# are the same, place the row with a NaN before the other row.
oml_iris2.tail().sort_values(by = ['Petal_Length', 'Petal_Width'],
na_position = 'first')
oml.drop('IRIS')
oml.drop('IRIS2')
7-27
Chapter 7
Explore Data
7-28
Chapter 7
Explore Data
import pandas as pd
import oml
7-29
Chapter 7
Explore Data
7-30
Chapter 7
Render Graphics
For the parameters of the oml.boxplot and oml.hist functions, invoke help(oml.boxplot) or
help(oml.hist), or see Oracle Machine Learning for Python API Reference.
import oml
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
wine = datasets.load_wine()
wine_data = pd.DataFrame(wine.data, columns = wine.feature_names)
oml_wine = oml.push(wine_data)
oml.graphics.boxplot(oml_wine[:,8:12], showmeans=True,
meanline=True, patch_artist=True,
labels=oml_wine.columns[8:12])
plt.title('Distribution of Wine Attributes')
plt.show()
7-31
Chapter 7
Render Graphics
The image shows a box and whisker plot for each of the four columns of the wine data set:
Proanthocyanins, Color intensity, Hue, and OD280/OD315 of diluted wines. The boxes extend
from the lower to upper quartile values of the data, with a solid orange line at the median. The
whiskers that extend from the box show the range of the data. The caps are the horizontal
lines at the ends of the whiskers. Flier or outlier points are those past the ends of the whiskers.
The mean is shown as a green dotted line spanning the width of the each box.
Generate a Histogram
Use the oml.hist function to compute and draw a histogram for every data set column
contained in x.
Example 7-14 Using the oml.hist Function
This example first loads the wine data set from sklearn and creates the pandas.DataFrame
object wine_data. It then creates a temporary database table, with its corresponding proxy
oml.DataFrame object oml_wine, from wine_data. Next it draws a histogram on the proline
column of oml_wine. The argument bins specifies generating ten equal-width bins. Argument
color specifies filling the bars with the color purple. Arguments linestyle and edgecolor are
set to draw the bar edges as solid lines in pink.
import oml
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
wine = load_wine()
wine_data = pd.DataFrame(wine.data, columns = wine.feature_names)
oml_wine = oml.push(wine_data)
oml.graphics.hist(oml_wine['proline'], bins=10, color='red',
linestyle='solid', edgecolor='white')
plt.title('Proline content in Wine')
plt.xlabel('proline content')
plt.ylabel('# of wine instances')
plt.show()
7-32
Chapter 7
Render Graphics
The image shows a traditional bar-type histogram for the Proline column of the wine data set.
The range of proline values is divided into 10 bins of equal size. The height of the rectangular
bar for each bin indicates the number of wine instances in each bin. The bars are red with solid
white edges.
7-33
8
OML4Py Classes That Provide Access to In-
Database Machine Learning Algorithms
OML4Py has classes that provide access to in-database Oracle Machine Learning algorithms.
These classes are described in the following topics.
• About Machine Learning Classes and Algorithms
These classes provide access to in-database machine learning algorithms.
• About Model Settings
You can specify settings that affect the characteristics of a model.
• Shared Settings
These settings are common to all of the OML4Py machine learning classes.
• Export Oracle Machine Learning for Python Models
You can export an oml model from Python and then score it in SQL.
• Automatic Data Preparation
Oracle Machine Learning for Python supports Automatic Data Preparation (ADP) and user-
directed general data preparation.
• Model Explainability
Use the OML4Py Explainability module to identify the important features that impact a
trained model’s predictions.
• Attribute Importance
The oml.ai class computes the relative attribute importance, which ranks attributes
according to their significance in predicting a classification or regression target.
• Association Rules
The oml.ar class implements the Apriori algorithm to find frequent itemsets and
association rules, all as part of an association model object.
• Decision Tree
The oml.dt class uses the Decision Tree algorithm for classification.
• Expectation Maximization
The oml.em class uses the Expectation Maximization (EM) algorithm to create a clustering
model.
• Explicit Semantic Analysis
The oml.esa class extracts text-based features from a corpus of documents and performs
document similarity comparisons.
• Generalized Linear Model
The oml.glm class builds a Generalized Linear Model (GLM) model.
• k-Means
The oml.km class uses the k-Means (KM) algorithm, which is a hierarchical, distance-
based clustering algorithm that partitions data into a specified number of clusters.
• Naive Bayes
The oml.nb class creates a Naive Bayes (NB) model for classification.
8-1
Chapter 8
About Machine Learning Classes and Algorithms
• Neural Network
The oml.nn class creates a Neural Network (NN) model for classification and regression.
• Random Forest
The oml.rf class creates a Random Forest (RF) model that provides an ensemble
learning technique for classification.
• Singular Value Decomposition
Use the oml.svd class to build a model for feature extraction.
• Support Vector Machine
The oml.svm class creates a Support Vector Machine (SVM) model for classification,
regression, or anomaly detection.
Algorithm Classes
8-2
Chapter 8
About Machine Learning Classes and Algorithms
Repeatable Results
You can use the case_id parameter in the fit method of the OML4Py machine learning
algorithm classes to achieve repeatable sampling, data splits (train and held aside), and
random data shuffling.
Persisting Models
In-database models created through the OML4Py API exist as temporary objects that are
dropped when the database connection ends unless you take one of the following actions:
• Save a default-named model object in a datastore, as in the following example:
regr2 = oml.glm("regression")
oml.ds.save(regr2, 'regression2')
• Use the model_name parameter in the fit function when building the model, as in the
following example:
• Change the name of an existing model using the model_name function of the model, as in
the following example:
regr2(model_name = 'myRegression2')
An OML4Py model created this way persists until you drop it with the oml.drop function.
8-3
Chapter 8
About Model Settings
For most of the OML4Py machine learning classes, you can use the predict and
predict_proba methods of the model object to score new data.
For in-database models, you can use the SQL PREDICTION function on model proxy objects,
which scores directly in the database. You can use in-database models directly from SQL if you
prepare the data properly. For open source models, you can use Embedded Python Execution
and enable data-parallel execution for performance and scalability.
For the _init_ method, the argument can be key-value pairs or a dict. Each list element’s
name and value refer to a machine learning algorithm parameter setting name and value,
respectively. The setting value must be numeric or a string.
The argument for the **params parameter of the set_params method is a dict object mapping
a str to a str. The key should be the name of the setting, and the value should be the new
setting.
Example 8-1 Specifying Model Settings
This example shows the creation of an Expectation Maximization (EM) model and the
changing of a setting. For the complete code of the EM model example, see Example 8-10.
# Specify settings.
setting = {'emcs_num_iterations': 100}
# Create an EM model object
em_mod = em(n_clusters = 2, **setting)
8-4
Chapter 8
Shared Settings
8-5
Chapter 8
Shared Settings
ODMS_TEXT_MAX_FEATURES 1 <= value The maximum number of distinct features, across all text
attributes, to use from a document set passed to the
model. The default is 3000. An oml.esa model has the
default value of 300000.
ODMS_TEXT_MIN_DOCUMENTS Non-negative value This text processing setting controls how many documents
a token needs to appear in to be used as a feature.
The default is 1. An oml.esa model has the default value
of 3.
ODMS_TEXT_POLICY_NAME The name of an Oracle Text Affects how individual tokens are extracted from
POLICY created using unstructured text.
CTX_DDL.CREATE_POLICY. For details about CTX_DDL.CREATE_POLICY, see Oracle
Text Reference.
PREP_AUTO PREP_AUTO_ON This data preparation setting enables fully automated data
PREP_AUTO_OFF preparation.
The default is PREP_AUTO_ON.
PREP_SCALE_2DNUM pPREP_SCALE_STDDEV This data preparation setting enables scaling data
PREP_SCALE_RANGE preparation for two-dimensional numeric columns.
PREP_AUTO must be OFF for this setting to take effect. The
following are the possible values:
PREP_SCALE_STDDEV: A request to divide the column
values by the standard deviation of the column and is often
provided together with PREP_SHIFT_MEAN to yield z-score
normalization.
PREP_SCALE_RANGE: A request to divide the column
values by the range of values and is often provided
together with PREP_SHIFT_MIN to yield a range of [0,1].
PREP_SCALE_NNUM PREP_SCALE_MAXABS This data preparation setting enables scaling data
preparation for nested numeric columns. PREP_AUTO must
be OFF for this setting to take effect. If specified, then the
valid value for this setting is PREP_SCALE_MAXABS, which
yields data in the range of [-1,1].
8-6
Chapter 8
Export Oracle Machine Learning for Python Models
Export a Model
With the export_sermodel function of an OML4Py algorithm model, you can export the model
in a serialized format. You can then score that model in SQL. To save a model to a permanent
table, you must pass in a name for the new table. If the model is partitioned, then you can
optionally select an individual partition to export; otherwise all partitions are exported.
Note:
Any data transformations you apply to the data for model building you must also
apply to the data for scoring with the imported model.
This example preprocesses the iris data set and splits the data set into training data and test
data. It then fits an oml.svm model according to the training data of the data set, and saves the
fitted model in a serialized format to a new table named svm_sermod in the database.
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
8-7
Chapter 8
Export Oracle Machine Learning for Python Models
oml.drop('IRIS')
oml.drop('IRIS_TEST_DATA')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
df = oml.sync(table = "IRIS").pull()
8-8
Chapter 8
Export Oracle Machine Learning for Python Models
>>> try:
... oml.drop('IRIS')
... oml.drop('IRIS_TEST_DATA')
...except:
... pass
>>> # Create the IRIS database table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> df = oml.sync(table = "IRIS").pull()
>>>
>>> # Add a case identifier column.
... df.insert(0, 'ID', range(0,len(df)))
>>>
>>> # Create training data and test data.
... IRIS_TMP = oml.push(df).split()
>>> train_x = IRIS_TMP[0].drop('Species')
>>> train_y = IRIS_TMP[0]['Species']
>>> test_dat = IRIS_TMP[1]
>>>
>>> # Create the iris_test_data database table.
... oml_test_dat = oml.create(test_dat.pull(), table = "IRIS_TEST_DATA")
>>>
>>> # Create an oml SVM model object.
... svm_mod = oml.svm('classification',
... svms_kernel_function =
'dbms_data_mining.svms_linear')
>>>
>>> # Fit the SVM model with the training data.
... svm_mod = svm_mod.fit(train_x, train_y, case_id='ID')
>>>
>>> # Export the oml.svm model to a new table named 'svm_sermod'
... # in the database.
... svm_export = svm_mod.export_sermodel(table='svm_sermod')
>>> type(svm_export)
<class 'oml.core.bytes.Bytes'>
>>>
>>> # Show the first 10 characters of the BLOB content from the
... # model export.
... svm_export.pull()[0][1:10]
b'\xff\xfc|\x00\x00\x02\x9c\x00\x00'
Import a Model
In SQL, you can import the serialized format of an OML4Py model into an Oracle Machine
Learning for SQL model with the DBMS_DATA_MINING.IMPORT_SERMODEL procedure. To that
procedure, you pass the BLOB content from the table to which the model was exported and
the name of the model to be created. The import procedure provides the ability to score the
model. It does not create model views or tables that are needed for querying model details.
You can use the SQL function PREDICTION to apply the imported model to the test data and get
the prediction results.
Example 8-3 Import a Serialized SVM Model as an OML4SQL Model in SQL
This example retrieves the serialized content of the SVM classification model from the
svm_sermod table. It uses the IMPORT_SERMODEL procedure to create a model named
my_iris_svm_classifier with the content from the table. It also predicts test data saved in the
8-9
Chapter 8
Export Oracle Machine Learning for Python Models
iris_test_data table with the newly imported model my_iris_svm_classifier, and compares the
prediction results with the target classes.
DECLARE
v_blob blob;
BEGIN
SELECT SERVAL INTO v_blob FROM "svm_sermod";
dbms_data_mining.import_sermodel(v_blob, 'my_iris_svm_classifier');
END;
/
ID TARGET_SPECIES PREDICT_SPECIES
-- -------------- ---------------
0 setosa setosa
24 setosa setosa
27 setosa setosa
33 setosa setosa
36 setosa setosa
39 setosa setosa
48 setosa setosa
54 versicolor versicolor
57 versicolor versicolor
93 versicolor versicolor
114 virginica virginica
120 virginica virginica
132 virginica virginica
13 rows selected.
8-10
Chapter 8
Automatic Data Preparation
PREP_* Settings
The values for the PREP_* settings are described in the following table.
8-11
Chapter 8
Model Explainability
See Also:
import oml
from oml.mlx import GlobalFeatureImportance
8-12
Chapter 8
Model Explainability
import pandas as pd
import numpy as np
from sklearn import datasets
bc_ds = datasets.load_breast_cancer()
bc_data = bc_ds.data.astype(float)
X = pd.DataFrame(bc_data, columns=bc_ds.feature_names)
y = pd.DataFrame(bc_ds.target, columns=['TARGET'])
row_id = pd.DataFrame(np.arange(bc_data.shape[0]),
columns=['CASE_ID'])
df = oml.create(pd.concat([X, y, row_id], axis=1),
table='BreastCancer')
Create the MLX Global Feature Importance explainer, using the binary f1 metric.
gfi = GlobalFeatureImportance(mining_function='classification',
score_metric='f1', random_state=32,
parallel=4)
Run the explainer to generate the global feature importance. Here we construct an explanation
using the train data set and then display the explanation.
oml.drop('BreastCancer')
8-13
Chapter 8
Model Explainability
oml.drop('BreastCancer')
8-14
Chapter 8
Model Explainability
import oml
from oml.mlx import GlobalFeatureImportance
import pandas as pd
import numpy as np
from sklearn import datasets
iris_ds = datasets.load_iris()
iris_data = iris_ds.data.astype(float)
X = pd.DataFrame(iris_data, columns=iris_ds.feature_names)
y = pd.DataFrame(iris_ds.target, columns=['TARGET'])
row_id = pd.DataFrame(np.arange(iris_data.shape[0]),
columns=['CASE_ID'])
df = oml.create(pd.concat([X, y, row_id], axis=1), table='Iris')
Create the MLX Global Feature Importance explainer, using the f1_weighted metric.
gfi = GlobalFeatureImportance(mining_function='classification',
score_metric='f1_weighted',
random_state=32, parallel=4)
Run the explainer to generate the global feature importance. Here, we use the test data set.
Display the explanation.
oml.drop('Iris')
8-15
Chapter 8
Model Explainability
>>> oml.drop('Iris')
import oml
from oml.mlx import GlobalFeatureImportance
import pandas as pd
import numpy as np
from sklearn import datasets
boston_ds = datasets.load_boston()
boston_data = boston_ds.data
X = pd.DataFrame(boston_data, columns=boston_ds.feature_names)
y = pd.DataFrame(boston_ds.target, columns=['TARGET'])
row_id = pd.DataFrame(np.arange(boston_data.shape[0]),
columns=['CASE_ID'])
df = oml.create(pd.concat([X, y, row_id], axis=1), table='Boston')
8-16
Chapter 8
Model Explainability
model = oml.algo.nn(mining_function='regression',
ODMS_RANDOM_SEED=32).fit(X, y, case_id='CASE_ID')
"NN R^2 score = {:.2f}".format(model.score(X_test, y_test))
Create the MLX Global Feature Importance explainer, using the r2 metric.
gfi = GlobalFeatureImportance(mining_function='regression',
score_metric='r2', random_state=32,
parallel=4)
Run the explainer to generate the global feature importance. Here, we use the test data set.
Display the explanation.
oml.drop('Boston')
8-17
Chapter 8
Attribute Importance
>>> oml.drop('Boston')
Note:
Oracle Machine Learning does not support the scoring operation for oml.ai.
The results of oml.ai are the attributes of the build data ranked according to their predictive
influence on a specified target attribute. You can use the ranking and the measure of
importance for selecting attributes.
For information on the oml.ai class attributes and methods, invoke help(oml.ai) or see
Oracle Machine Learning for Python API Reference.
8-18
Chapter 8
Attribute Importance
See Also:
This example demonstrates the use of various methods of the oml.ai class.
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Specify settings.
setting = {'ODMS_SAMPLING':'ODMS_SAMPLING_DISABLE'}
8-19
Chapter 8
Attribute Importance
Settings:
setting name setting value
0 ALGO_NAME ALGO_AI_MDL
1 ODMS_DETAILS ODMS_ENABLE
2 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
3 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
4 PREP_AUTO ON
Global Statistics:
8-20
Chapter 8
Association Rules
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Partition: NO
Importance:
8-21
Chapter 8
Association Rules
8-22
Chapter 8
Association Rules
8-23
Chapter 8
Association Rules
See Also:
import pandas as pd
from sklearn import datasets
import oml
# Load the iris data set and create a pandas.DataFrame for it.
8-24
Chapter 8
Association Rules
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species']))
try:
oml.drop('IRIS')
except:
pass
# Specify settings.
setting = {'asso_min_support':'0.1', 'asso_min_confidence':'0.1'}
8-25
Chapter 8
Association Rules
Settings:
setting name setting value
0 ALGO_NAME ALGO_APRIORI_ASSOCIATION_RULES
1 ASSO_MAX_RULE_LENGTH 4
2 ASSO_MIN_CONFIDENCE 0.1
3 ASSO_MIN_REV_CONFIDENCE 0
4 ASSO_MIN_SUPPORT 0.1
5 ASSO_MIN_SUPPORT_INT 1
6 ODMS_DETAILS ODMS_ENABLE
7 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
8 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
9 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 ITEMSET_COUNT 6.000000
1 MAX_SUPPORT 0.333333
2 NUM_ROWS 150.000000
3 RULE_COUNT 2.000000
4 TRANSACTION_COUNT 150.000000
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
Itemsets:
8-26
Chapter 8
Decision Tree
1 2 0.173333 1 Sepal_Width 3
2 3 0.333333 1 Species setosa
3 4 0.333333 1 Species versicolor
4 5 0.333333 1 Species virginica
5 6 0.193333 2 Petal_Width .20000000000000001
6 6 0.193333 2 Species setosa
Rules:
Decision Tree models are classification models that contain axis-parallel rules. A rule is a
conditional statement that can be understood by humans and may be used within a database
to identify a set of records.
A decision tree predicts a target value by asking a sequence of questions. At a given stage in
the sequence, the question that is asked depends upon the answers to the previous questions.
The goal is to ask questions that, taken together, uniquely identify specific target values.
Graphically, this process forms a tree structure.
During the training process, the Decision Tree algorithm must repeatedly find the most efficient
way to split a set of cases (records) into two child nodes. The oml.dt class offers two
homogeneity metrics, gini and entropy, for calculating the splits. The default metric is gini.
For information on the oml.dt class attributes and methods, invoke help(oml.dt) or see
Oracle Machine Learning for Python API Reference.
8-27
Chapter 8
Decision Tree
8-28
Chapter 8
Decision Tree
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('COST_MATRIX')
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
8-29
Chapter 8
Decision Tree
train_x = dat[0].drop('Species')
train_y = dat[0]['Species']
test_dat = dat[1]
# Specify settings.
setting = {'TREE_TERM_MAX_DEPTH':'2'}
8-30
Chapter 8
Decision Tree
8-31
Chapter 8
Decision Tree
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_DECISION_TREE
1 CLAS_COST_TABLE_NAME "OML_USER"."COST_MATRIX"
2 CLAS_MAX_SUP_BINS 32
3 CLAS_WEIGHTS_BALANCED OFF
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
7 PREP_AUTO ON
8 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
9 TREE_TERM_MAX_DEPTH 2
10 TREE_TERM_MINPCT_NODE .05
11 TREE_TERM_MINPCT_SPLIT .1
12 TREE_TERM_MINREC_NODE 10
13 TREE_TERM_MINREC_SPLIT 20
Global Statistics:
attribute name attribute value
0 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Partition: NO
Distributions:
Nodes:
split \
0 (Petal_Length <=(2.4500000000000002E+000))
1 (Petal_Length >(2.4500000000000002E+000))
2 None
8-32
Chapter 8
Decision Tree
surrogate \
0 Petal_Width <=(8.0000000000000004E-001))
1 Petal_Width >(8.0000000000000004E-001))
2 None
full.splits
0 (Petal_Length <=(2.4500000000000002E+000))
1 (Petal_Length >(2.4500000000000002E+000))
2 (
>>>
>>> # Use the model to make predictions on the test data.
... dt_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Sepal_Width',
... 'Petal_Length',
... 'Species']])
Sepal_Length Sepal_Width Petal_Length Species PREDICTION
0 4.9 3.0 1.4 setosa setosa
1 4.9 3.1 1.5 setosa setosa
2 4.8 3.4 1.6 setosa setosa
3 5.8 4.0 1.2 setosa setosa
... ... ... ... ... ...
44 6.7 3.3 5.7 virginica versicolor
45 6.7 3.0 5.2 virginica versicolor
46 6.5 3.0 5.2 virginica versicolor
47 5.9 3.0 5.1 virginica versicolor
>>>
>>> # Return the prediction probability.
... dt_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Sepal_Width',
... 'Species']],
... proba = True)
Sepal_Length Sepal_Width Species PREDICTION PROBABILITY
0 4.9 3.0 setosa setosa 1.000000
1 4.9 3.1 setosa setosa 1.000000
2 4.8 3.4 setosa setosa 1.000000
3 5.8 4.0 setosa setosa 1.000000
... ... ... ... ... ...
44 6.7 3.3 virginica versicolor 0.514706
45 6.7 3.0 virginica versicolor 0.514706
46 6.5 3.0 virginica versicolor 0.514706
47 5.9 3.0 virginica versicolor 0.514706
>>> # Make predictions and return the probability for each class
>>> # on new data.
>>> dt_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:,
... ['Sepal_Length',
... 'Species']]).sort_values(by = ['Sepal_Length',
... 'Species'])
Sepal_Length Species PROBABILITY_OF_SETOSA \
0 4.4 setosa 1.0
1 4.4 setosa 1.0
2 4.5 setosa 1.0
8-33
Chapter 8
Expectation Maximization
PROBABILITY_OF_VERSICOLOR PROBABILITY_OF_VIRGINICA
0 0.000000 0.000000
1 0.000000 0.000000
2 0.000000 0.000000
3 0.000000 0.000000
... ... ...
42 0.514706 0.485294
43 0.514706 0.485294
44 0.514706 0.485294
45 0.514706 0.485294
>>>
>>> dt_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])
0.645833
8-34
Chapter 8
Expectation Maximization
Table 8-5 Expectation Maximization Settings for Data Preparation and Analysis
Note:
This
setting
applies
only to
attributes
that are
not
nested.
Note:
This
setting
applies
only to
attributes
that are
not nested
(2D).
8-35
Chapter 8
Expectation Maximization
Table 8-5 (Cont.) Expectation Maximization Settings for Data Preparation and Analysis
8-36
Chapter 8
Expectation Maximization
The following table lists the settings for component clustering for EM models.
8-37
Chapter 8
Expectation Maximization
The following table lists the settings for cluster statistics for EM models.
8-38
Chapter 8
Expectation Maximization
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
8-39
Chapter 8
Expectation Maximization
train_dat = dat[0]
test_dat = dat[1]
# Specify settings.
setting = {'emcs_num_iterations': 100}
8-40
Chapter 8
Expectation Maximization
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPECTATION_MAXIMIZATION
1 CLUS_NUM_CLUSTERS 2
2 EMCS_CLUSTER_COMPONENTS EMCS_CLUSTER_COMP_ENABLE
3 EMCS_CLUSTER_STATISTICS EMCS_CLUS_STATS_ENABLE
4 EMCS_CLUSTER_THRESH 2
5 EMCS_LINKAGE_FUNCTION EMCS_LINKAGE_SINGLE
6 EMCS_LOGLIKE_IMPROVEMENT .001
7 EMCS_MAX_NUM_ATTR_2D 50
8 EMCS_MIN_PCT_ATTR_SUPPORT .1
9 EMCS_MODEL_SEARCH EMCS_MODEL_SEARCH_DISABLE
10 EMCS_NUM_COMPONENTS 20
11 EMCS_NUM_DISTRIBUTION EMCS_NUM_DISTR_SYSTEM
12 EMCS_NUM_EQUIWIDTH_BINS 11
13 EMCS_NUM_ITERATIONS 100
14 EMCS_NUM_PROJECTIONS 50
15 EMCS_RANDOM_SEED 0
16 EMCS_REMOVE_COMPONENTS EMCS_REMOVE_COMPS_ENABLE
17 ODMS_DETAILS ODMS_ENABLE
18 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
19 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
20 PREP_AUTO ON
Computed Settings:
setting name setting value
0 EMCS_ATTRIBUTE_FILTER EMCS_ATTR_FILTER_DISABLE
1 EMCS_CONVERGENCE_CRITERION EMCS_CONV_CRIT_BIC
2 EMCS_NUM_QUANTILE_BINS 3
3 EMCS_NUM_TOPN_BINS 3
Global Statistics:
attribute name attribute value
0 CONVERGED YES
8-41
Chapter 8
Expectation Maximization
1 LOGLIKELIHOOD -2.10044
2 NUM_CLUSTERS 2
3 NUM_COMPONENTS 8
4 NUM_ROWS 104
5 RANDOM_SEED 0
6 REMOVED_COMPONENTS 12
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
Clusters:
Taxonomy:
PARENT_CLUSTER_ID CHILD_CLUSTER_ID
0 1 2.0
1 1 3.0
2 2 NaN
3 3 NaN
Centroids:
8-42
Chapter 8
Expectation Maximization
CLUSTER_ID CNT
0 2 68
1 3 36
Attribute Importance:
Components:
Cluster Hists:
label count
0 1:1.59 25
1 1.59:2.18 11
2 2.18:2.77 0
3 2.77:3.36 3
... ... ...
137 : 0
138 : 0
139 : 36
140 : 0
Rules:
8-43
Chapter 8
Expectation Maximization
>>> # Make predictions and return the probability for each class
... # on new data.
>>> em_mod.predict_proba(test_dat,
... supplemental_cols = test_dat[:,
... ['Sepal_Length', 'Sepal_Width',
... 'Petal_Length']]).sort_values(by = ['Sepal_Length',
... 'Sepal_Width', 'Petal_Length',
... 'PROBABILITY_OF_2', 'PROBABILITY_OF_3'])
Sepal_Length Sepal_Width Petal_Length PROBABILITY_OF_2 \
0 4.4 3.0 1.3 4.680788e-20
1 4.4 3.2 1.3 1.052071e-20
2 4.5 2.3 1.3 7.751240e-06
3 4.8 3.4 1.6 5.363418e-19
... ... ... ... ...
43 6.9 3.1 4.9 1.000000e+00
44 6.9 3.1 5.4 1.000000e+00
45 7.0 3.2 4.7 1.000000e+00
PROBABILITY_OF_3
0 1.000000e+00
1 1.000000e+00
2 9.999922e-01
8-44
Chapter 8
Expectation Maximization
3 1.000000e+00
... ...
43 3.295578e-97
44 6.438740e-137
45 3.853925e-89
>>>
>>> # Change the random seed and refit the model.
... em_mod.set_params(EMCS_RANDOM_SEED = '5').fit(train_dat)
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPECTATION_MAXIMIZATION
1 CLUS_NUM_CLUSTERS 2
2 EMCS_CLUSTER_COMPONENTS EMCS_CLUSTER_COMP_ENABLE
3 EMCS_CLUSTER_STATISTICS EMCS_CLUS_STATS_ENABLE
4 EMCS_CLUSTER_THRESH 2
5 EMCS_LINKAGE_FUNCTION EMCS_LINKAGE_SINGLE
6 EMCS_LOGLIKE_IMPROVEMENT .001
7 EMCS_MAX_NUM_ATTR_2D 50
8 EMCS_MIN_PCT_ATTR_SUPPORT .1
9 EMCS_MODEL_SEARCH EMCS_MODEL_SEARCH_DISABLE
10 EMCS_NUM_COMPONENTS 20
11 EMCS_NUM_DISTRIBUTION EMCS_NUM_DISTR_SYSTEM
12 EMCS_NUM_EQUIWIDTH_BINS 11
13 EMCS_NUM_ITERATIONS 100
14 EMCS_NUM_PROJECTIONS 50
15 EMCS_RANDOM_SEED 5
16 EMCS_REMOVE_COMPONENTS EMCS_REMOVE_COMPS_ENABLE
17 ODMS_DETAILS ODMS_ENABLE
18 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
19 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
20 PREP_AUTO ON
Computed Settings:
setting name setting value
0 EMCS_ATTRIBUTE_FILTER EMCS_ATTR_FILTER_DISABLE
1 EMCS_CONVERGENCE_CRITERION EMCS_CONV_CRIT_BIC
2 EMCS_NUM_QUANTILE_BINS 3
3 EMCS_NUM_TOPN_BINS 3
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 LOGLIKELIHOOD -1.75777
2 NUM_CLUSTERS 2
3 NUM_COMPONENTS 9
4 NUM_ROWS 104
5 RANDOM_SEED 5
6 REMOVED_COMPONENTS 11
Attributes:
8-45
Chapter 8
Expectation Maximization
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
Clusters:
RIGHT_CHILD_ID
0 3.0
1 NaN
2 NaN
Taxonomy:
PARENT_CLUSTER_ID CHILD_CLUSTER_ID
0 1 2.0
1 1 3.0
2 2 NaN
3 3 NaN
Centroids:
CLUSTER_ID CNT
0 2 36
1 3 68
8-46
Chapter 8
Expectation Maximization
Attribute Importance:
Components:
Cluster Hists:
label count
0 1:1.59 25
1 1.59:2.18 11
2 2.18:2.77 0
3 2.77:3.36 3
... ... ...
137 : 0
138 : 33
139 : 0
140 : 35
Rules:
8-47
Chapter 8
Explicit Semantic Analysis
8-48
Chapter 8
Explicit Semantic Analysis
See Also:
import oml
from oml import cursor
import pandas as pd
# Specify settings.
cur = cursor()
cur.execute("Begin ctx_ddl.create_policy('DMDEMO_ESA_POLICY'); End;")
cur.close()
ctx_settings = {'COMMENTS':
'TEXT(POLICY_NAME:DMDEMO_ESA_POLICY)(TOKEN_TYPE:STEM)'}
8-49
Chapter 8
Explicit Semantic Analysis
# Fit the ESA model according to the training data and parameter settings.
esa_mod = esa_mod.fit(train_dat, case_id = 'ID',
ctx_settings = ctx_settings)
esa_mod.transform(test_dat,
supplemental_cols = test_dat[:, ['ID', 'COMMENTS']],
topN = 2).sort_values(by = ['ID'])
esa_mod.feature_compare(test_dat,
compare_cols = 'COMMENTS',
supplemental_cols = ['ID'])
esa_mod.feature_compare(test_dat,
compare_cols = ['COMMENTS', 'YEAR'],
supplemental_cols = ['ID'])
cur = cursor()
cur.execute("Begin ctx_ddl.drop_policy('DMDEMO_ESA_POLICY'); End;")
cur.close()
8-50
Chapter 8
Explicit Semantic Analysis
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPLICIT_SEMANTIC_ANALYS
1 ESAS_MIN_ITEMS 1
2 ESAS_TOPN_FEATURES 1000
3 ESAS_VALUE_THRESHOLD .00000001
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
7 ODMS_TEXT_MAX_FEATURES 300000
8 ODMS_TEXT_MIN_DOCUMENTS 1
9 ODMS_TEXT_POLICY_NAME DMDEMO_ESA_POLICY
10 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 NUM_ROWS 4
Attributes:
COMMENTS
YEAR
Partition: NO
Features:
8-51
Chapter 8
Explicit Semantic Analysis
TOP_2 TOP_2_VAL
0 1 0.590565
1 2 0.616672
2 2 0.632604
>>>
>>> esa_mod.feature_compare(test_dat,
compare_cols = 'COMMENTS',
supplemental_cols = ['ID'])
ID_A ID_B SIMILARITY
0 4 6 0.946469
1 4 7 0.871994
2 6 7 0.954565
>>> esa_mod.feature_compare(test_dat,
... compare_cols = ['COMMENTS', 'YEAR'],
... supplemental_cols = ['ID'])
ID_A ID_B SIMILARITY
0 4 6 0.467644
1 4 7 0.377144
2 6 7 O.952857
8-52
Chapter 8
Generalized Linear Model
Settings:
setting name setting value
0 ALGO_NAME ALGO_EXPLICIT_SEMANTIC_ANALYS
1 ESAS_MIN_ITEMS 1
2 ESAS_TOPN_FEATURES 2
3 ESAS_VALUE_THRESHOLD 0.01
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
7 ODMS_TEXT_MAX_FEATURES 2
8 ODMS_TEXT_MIN_DOCUMENTS 1
9 ODMS_TEXT_POLICY_NAME DMDEMO_ESA_POLICY
10 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 NUM_ROWS 4
Attributes:
COMMENTS
YEAR
Partition: NO
Features:
>>>
>>> cur = cursor()
>>> cur.execute("Begin ctx_ddl.drop_policy('DMDEMO_ESA_POLICY'); End;")
>>> cur.close()
GLM models include and extend the class of linear models. They relax the restrictions on linear
models, which are often violated in practice. For example, binary (yes/no or 0/1) responses do
not have the same variance across classes.
GLM is a parametric modeling technique. Parametric models make assumptions about the
distribution of the data. When the assumptions are met, parametric models can be more
efficient than non-parametric models.
8-53
Chapter 8
Generalized Linear Model
The challenge in developing models of this type involves assessing the extent to which the
assumptions are met. For this reason, quality diagnostics are key to developing quality
parametric models.
In addition to the classical weighted least squares estimation for linear regression and
iteratively re-weighted least squares estimation for logistic regression, both solved through
Cholesky decomposition and matrix inversion, Oracle Machine Learning GLM provides a
conjugate gradient-based optimization algorithm that does not require matrix inversion and is
very well suited to high-dimensional data. The choice of algorithm is handled internally and is
transparent to the user.
GLM can be used to build classification or regression models as follows:
• Classification: Binary logistic regression is the GLM classification algorithm. The
algorithm uses the logit link function and the binomial variance function.
• Regression: Linear regression is the GLM regression algorithm. The algorithm assumes
no target transformation and constant variance over the range of target values.
The oml.glm class allows you to build two different types of models. Some arguments apply to
classification models only and some to regression models only.
For information on the oml.glm class attributes and methods, invoke help(oml.glm) or see
Oracle Machine Learning for Python API Reference.
8-54
Chapter 8
Generalized Linear Model
GLMS_FTR_SEL_CRIT GLMS_FTR_SEL_AIC Feature selection penalty criterion for adding a feature to the
GLMS_FTR_SEL_ALPHA_IN model.
V When feature selection is enabled, the algorithm
automatically chooses the penalty criterion based on the
GLMS_FTR_SEL_RIC
data.
GLMS_FTR_SEL_SBIC
GLMS_FTR_SELECTION GLMS_FTR_SELECTION_DI Enable or disable feature selection for GLM.
SABLE By default, feature selection is not enabled.
GLMS_MAX_FEATURES TO_CHAR(0 < When feature selection is enabled, this setting specifies the
numeric_expr <= 2000) maximum number of features that can be selected for the
final model.
By default, the algorithm limits the number of features to
ensure sufficient memory.
GLMS_NUM_ITERATIONS A positive integer. Maximum number of iterations for the GLM algorithm. The
default value is system-determined.
8-55
Chapter 8
Generalized Linear Model
See Also:
8-56
Chapter 8
Generalized Linear Model
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Specify settings.
setting = {'GLMS_SOLVER': 'dbms_data_mining.GLMS_SOLVER_QR'}
# Fit the GLM model according to the training data and parameter
# settings.
glm_mod = glm_mod.fit(train_x, train_y)
8-57
Chapter 8
Generalized Linear Model
'Petal_Length', 'Species']],
proba = True)
glm_mod.score(test_dat.drop('Petal_Width'),
test_dat[:, ['Petal_Width']])
8-58
Chapter 8
Generalized Linear Model
Target: Petal_Width
Settings:
setting name setting value
0 ALGO_NAME ALGO_GENERALIZED_LINEAR_MODEL
1 GLMS_CONF_LEVEL .95
2 GLMS_FTR_GENERATION GLMS_FTR_GENERATION_DISABLE
3 GLMS_FTR_SELECTION GLMS_FTR_SELECTION_DISABLE
4 GLMS_SOLVER GLMS_SOLVER_QR
5 ODMS_DETAILS ODMS_ENABLE
6 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
Computed Settings:
setting name setting value
0 GLMS_CONV_TOLERANCE .0000050000000000000004
1 GLMS_NUM_ITERATIONS 30
2 GLMS_RIDGE_REGRESSION GLMS_RIDGE_REG_ENABLE
Global Statistics:
attribute name attribute value
0 ADJUSTED_R_SQUARE 0.949634
1 AIC -363.888
2 COEFF_VAR 14.6284
3 CONVERGED YES
4 CORRECTED_TOTAL_DF 103
5 CORRECTED_TOT_SS 58.4565
6 DEPENDENT_MEAN 1.15577
7 ERROR_DF 98
8 ERROR_MEAN_SQUARE 0.028585
9 ERROR_SUM_SQUARES 2.80131
10 F_VALUE 389.405
11 GMSEP 0.030347
12 HOCKING_SP 0.000295
13 J_P 0.030234
14 MODEL_DF 5
15 MODEL_F_P_VALUE 0
16 MODEL_MEAN_SQUARE 11.131
17 MODEL_SUM_SQUARES 55.6552
18 NUM_PARAMS 6
19 NUM_ROWS 104
20 RANK_DEFICIENCY 0
21 ROOT_MEAN_SQ 0.16907
22 R_SQ 0.952079
23 SBIC -348.021
24 VALID_COVARIANCE_MATRIX YES
[1 rows x 25 columns]
Attributes:
Petal_Length
Sepal_Length
Sepal_Width
Species
8-59
Chapter 8
Generalized Linear Model
Partition: NO
Coefficients:
Fit Details:
name value
0 ADJUSTED_R_SQUARE 9.496338e-01
1 AIC -3.638876e+02
2 COEFF_VAR 1.462838e+01
3 CORRECTED_TOTAL_DF 1.030000e+02
...
21 ROOT_MEAN_SQ 1.690704e-01
22 R_SQ 9.520788e-01
23 SBIC -3.480213e+02
24 VALID_COVARIANCE_MATRIX 1.000000e+00
Rank:
Deviance:
2.801309
AIC:
-364
Null Deviance:
58.456538
DF Residual:
98.0
DF Null:
103.0
Converged:
True
>>>
>>> # Use the model to make predictions on the test data.
... glm_mod.predict(test_dat.drop('Petal_Width'),
8-60
Chapter 8
Generalized Linear Model
Target: Petal_Width
Settings:
setting name setting value
0 ALGO_NAME ALGO_GENERALIZED_LINEAR_MODEL
1 GLMS_CONF_LEVEL .95
2 GLMS_FTR_GENERATION GLMS_FTR_GENERATION_DISABLE
3 GLMS_FTR_SELECTION GLMS_FTR_SELECTION_DISABLE
4 GLMS_SOLVER GLMS_SOLVER_SGD
5 ODMS_DETAILS ODMS_ENABLE
6 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
8-61
Chapter 8
Generalized Linear Model
Computed Settings:
setting name setting value
0 GLMS_BATCH_ROWS 2000
1 GLMS_CONV_TOLERANCE .0001
2 GLMS_NUM_ITERATIONS 500
3 GLMS_RIDGE_REGRESSION GLMS_RIDGE_REG_ENABLE
4 GLMS_RIDGE_VALUE .01
Global Statistics:
attribute name attribute value
0 ADJUSTED_R_SQUARE 0.94175
1 AIC -348.764
2 COEFF_VAR 15.7316
3 CONVERGED NO
4 CORRECTED_TOTAL_DF 103
5 CORRECTED_TOT_SS 58.4565
6 DEPENDENT_MEAN 1.15577
7 ERROR_DF 98
8 ERROR_MEAN_SQUARE 0.033059
9 ERROR_SUM_SQUARES 3.23979
10 F_VALUE 324.347
11 GMSEP 0.035097
12 HOCKING_SP 0.000341
13 J_P 0.034966
14 MODEL_DF 5
15 MODEL_F_P_VALUE 0
16 MODEL_MEAN_SQUARE 10.7226
17 MODEL_SUM_SQUARES 53.613
18 NUM_PARAMS 6
19 NUM_ROWS 104
20 RANK_DEFICIENCY 0
21 ROOT_MEAN_SQ 0.181821
22 R_SQ 0.944578
23 SBIC -332.898
24 VALID_COVARIANCE_MATRIX NO
[1 rows x 25 columns]
Attributes:
Petal_Length
Sepal_Length
Sepal_Width
Species
Partition: NO
Coefficients:
8-62
Chapter 8
k-Means
Fit Details:
name value
0 ADJUSTED_R_SQUARE 9.417502e-01
1 AIC -3.487639e+02
2 COEFF_VAR 1.573164e+01
3 CORRECTED_TOTAL_DF 1.030000e+02
... ... ...
21 ROOT_MEAN_SQ 1.818215e-01
22 R_SQ 9.445778e-01
23 SBIC -3.328975e+02
24 VALID_COVARIANCE_MATRIX 0.000000e+00
Rank:
Deviance:
3.239787
AIC:
-349
Null Deviance:
58.456538
Prior Weights:
DF Residual:
98.0
DF Null:
103.0
Converged:
False
8.13 k-Means
The oml.km class uses the k-Means (KM) algorithm, which is a hierarchical, distance-based
clustering algorithm that partitions data into a specified number of clusters.
The algorithm has the following features:
• Several distance functions: Euclidean, Cosine, and Fast Cosine distance functions. The
default is Euclidean.
8-63
Chapter 8
k-Means
• For each cluster, the algorithm returns the centroid, a histogram for each attribute, and a
rule describing the hyperbox that encloses the majority of the data assigned to the cluster.
The centroid reports the mode for categorical attributes and the mean and variance for
numeric attributes.
For information on the oml.km class attributes and methods, invoke help(oml.km) or see
Oracle Machine Learning for Python API Reference.
8-64
Chapter 8
k-Means
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
8-65
Chapter 8
k-Means
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Specify settings.
setting = {'kmns_iterations': 20}
km_mod.transform(test_dat)
km_mod.score(test_dat)
8-66
Chapter 8
k-Means
>>>
>>> # Create the IRIS database table and the proxy object for the table.
... oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
>>>
>>> # Create training and test data.
... dat = oml.sync(table = 'IRIS').split()
>>> train_dat = dat[0]
>>> test_dat = dat[1]
>>>
>>> # Specify settings.
... setting = {'kmns_iterations': 20}
>>>
>>> # Create a KM model object and fit it.
... km_mod = omlkm(n_clusters = 3, **setting).fit(train_dat)
>>>
>>> # Show model details.
... km_mod
Settings:
setting name setting value
0 ALGO_NAME ALGO_KMEANS
1 CLUS_NUM_CLUSTERS 3
2 KMNS_CONV_TOLERANCE .001
3 KMNS_DETAILS KMNS_DETAILS_HIERARCHY
4 KMNS_DISTANCE KMNS_EUCLIDEAN
5 KMNS_ITERATIONS 20
6 KMNS_MIN_PCT_ATTR_SUPPORT .1
7 KMNS_NUM_BINS 11
8 KMNS_RANDOM_SEED 0
9 KMNS_SPLIT_CRITERION KMNS_VARIANCE
10 ODMS_DETAILS ODMS_ENABLE
11 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
12 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
13 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 NUM_ROWS 104.0
Attributes: Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
Clusters:
8-67
Chapter 8
k-Means
Taxonomy:
PARENT_CLUSTER_ID CHILD_CLUSTER_ID
0 1 2.0
1 1 3.0
2 2 4.0
3 2 5.0
4 3 NaN
5 4 NaN
6 5 NaN
CLUSTER_ID CNT
0 3 50
1 4 53
2 5 47
>>>
>>> # Use the model to make predictions on the test data.
... km_mod.predict(test_dat, ['Sepal_Length', 'Sepal_Width',
... 'Petal_Length', 'Species']])
Sepal_Length Sepal_Width Petal_Length Species CLUSTER_ID
0 4.9 3.0 1.4 setosa 3
1 4.9 3.1 1.5 setosa 3
2 4.8 3.4 1.6 setosa 3
3 5.8 4.0 1.2 setosa 3
... ... ... ... ... ...
38 6.4 2.8 5.6 virginica 5
39 6.9 3.1 5.4 virginica 5
40 6.7 3.1 5.6 virginica 5
41 5.8 2.7 5.1 virginica 5
>>>
>>> km_mod.predict_proba(test_dat,
... supplemental_cols =
... test_dat[:, ['Species']]).sort_values(by =
... ['Species', 'PROBABILITY_OF_3'])
Species PROBABILITY_OF_3 PROBABILITY_OF_4 PROBABILITY_OF_5
0 setosa 0.791267 0.208494 0.000240
1 setosa 0.971498 0.028350 0.000152
2 setosa 0.981020 0.018499 0.000481
3 setosa 0.981907 0.017989 0.000104
... ... ... ... ...
42 virginica 0.000655 0.316671 0.682674
43 virginica 0.001036 0.413744 0.585220
44 virginica 0.001036 0.413744 0.585220
45 virginica 0.002452 0.305021 0.692527
>>>
>>> km_mod.transform(test_dat)
CLUSTER_DISTANCE
0 1.050234
8-68
Chapter 8
Naive Bayes
1 0.859817
2 0.321065
3 1.427080
... ...
42 0.837757
43 0.479313
44 0.448562
45 1.123587
>>>
>>> km_mod.score(test_dat)
-47.487712
The Naive Bayes algorithm is based on conditional probabilities. Naive Bayes looks at the
historical data and calculates conditional probabilities for the target values by observing the
frequency of attribute values and of combinations of attribute values.
Naive Bayes assumes that each predictor is conditionally independent of the others. (Bayes'
Theorem requires that the predictors be independent.)
For information on the oml.nb class attributes and methods, invoke help(oml.nb) or see
Oracle Machine Learning for Python API Reference.
8-69
Chapter 8
Naive Bayes
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop(table = 'NB_PRIOR_PROBABILITY_DEMO')
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
train_x = dat[0].drop('Species')
8-70
Chapter 8
Naive Bayes
train_y = dat[0]['Species']
test_dat = dat[1]
8-71
Chapter 8
Naive Bayes
nb_mod.predict_proba(test_dat.drop('Species'),
supplemental_cols = test_dat[:,
['Sepal_Length',
'Species']]).sort_values(by =
['Sepal_Length',
'Species',
'PROBABILITY_OF_setosa',
'PROBABILITY_OF_versicolor'])
8-72
Chapter 8
Naive Bayes
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_NAIVE_BAYES
1 CLAS_WEIGHTS_BALANCED ON
2 NABS_PAIRWISE_THRESHOLD 0
3 NABS_SINGLETON_THRESHOLD 0
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
7 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Partition: NO
Priors:
Conditionals:
CONDITIONAL_PROBABILITY COUNT
0 0.027778 1
1 0.027778 1
8-73
Chapter 8
Naive Bayes
2 0.083333 3
3 0.277778 10
... ... ...
152 0.030303 1
153 0.060606 2
154 0.030303 1
155 0.060606 2
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_NAIVE_BAYES
1 CLAS_PRIORS_TABLE_NAME "OML_USER"."NB_PRIOR_PROBABILITY_DEMO"
2 CLAS_WEIGHTS_BALANCED OFF
3 NABS_PAIRWISE_THRESHOLD 0
4 NABS_SINGLETON_THRESHOLD 0
5 ODMS_DETAILS ODMS_ENABLE
6 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
Global Statistics:
attribute name attribute value
0 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Partition: NO
Priors:
8-74
Chapter 8
Naive Bayes
Conditionals:
CONDITIONAL_PROBABILITY COUNT
0 0.027778 1
1 0.027778 1
2 0.083333 3
3 0.277778 10
... ... ...
152 0.030303 1
153 0.060606 2
154 0.030303 1
155 0.060606 2
8-75
Chapter 8
Naive Bayes
>>> # Return the top two most influencial attributes of the highest
... # probability class.
>>> nb_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Sepal_Width',
... 'Petal_Length',
... 'Species']],
... topN_attrs = 2)
Sepal_Length Sepal_Width Petal_Length Species PREDICTION \
0 4.9 3.0 1.4 setosa setosa
1 4.9 3.1 1.5 setosa setosa
2 4.8 3.4 1.6 setosa setosa
3 5.8 4.0 1.2 setosa setosa
... ... ... ... ... ...
42 6.7 3.3 5.7 virginica virginica
43 6.7 3.0 5.2 virginica virginica
44 6.5 3.0 5.2 virginica virginica
45 5.9 3.0 5.1 virginica virginica
TOP_N_ATTRIBUTES
0 <Details algorithm="Naive Bayes" class="setosa...
1 <Details algorithm="Naive Bayes" class="setosa...
2 <Details algorithm="Naive Bayes" class="setosa...
3 <Details algorithm="Naive Bayes" class="setosa...
...
42 <Details algorithm="Naive Bayes" class="virgin...
43 <Details algorithm="Naive Bayes" class="virgin...
44 <Details algorithm="Naive Bayes" class="virgin...
45 <Details algorithm="Naive Bayes" class="virgin...
>>> # Make predictions and return the probability for each class
... # on new data.
>>> nb_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:,
... ['Sepal_Length',
... 'Species']]).sort_values(by =
... ['Sepal_Length',
... 'Species',
... 'PROBABILITY_OF_setosa,
... 'PROBABILITY_OF_versicolor'])
Sepal_Length Species PROBABILITY_OF_SETOSA \
0 4.4 setosa 1.000000e+00
1 4.4 setosa 1.000000e+00
2 4.5 setosa 1.000000e+00
3 4.8 setosa 1.000000e+00
... ... ... ...
8-76
Chapter 8
Neural Network
PROBABILITY_OF_VERSICOLOR PROBABILITY_OF_VIRGINICA
0 9.327306e-21 7.868301e-20
1 3.497737e-20 1.032715e-19
2 2.238553e-13 2.360490e-19
3 6.995487e-22 2.950617e-21
... ... ...
42 4.741700e-13 1.000000e+00
43 1.778141e-07 9.999998e-01
44 2.963565e-20 1.000000e+00
45 4.156340e-01 5.843660e-01
>>> # Make predictions on new data and return the mean accuracy.
... nb_mod.score(test_dat.drop('Species'), test_dat[:, ['Species']])
0.934783
Neural Network models can be used to capture intricate nonlinear relationships between inputs
and outputs or to find patterns in data.
The oml.nn class methods build a feed-forward neural network for regression on
oml.DataFrame data. It supports multiple hidden layers with a specifiable number of nodes.
Each layer can have one of several activation functions.
The output layer is a single numeric or binary categorical target. The output layer can have any
of the activation functions. It has the linear activation function by default.
Modeling with the ore.nn class is well-suited for noisy and complex data such as sensor data.
Problems that such data might have are the following:
• Potentially many (numeric) predictors, for example, pixel values
• The target may be discrete-valued, real-valued, or a vector of such values
• Training data may contain errors – robust to noise
• Fast scoring
• Model transparency is not required; models difficult to interpret
Typical steps in Neural Network modeling are the following:
1. Specifying the architecture
2. Preparing the data
3. Building the model
4. Specifying the stopping criteria: iterations, error on a validation set within tolerance
5. Viewing statistical results from the model
6. Improving the model
8-77
Chapter 8
Neural Network
For information on the oml.nn class attributes and methods, invoke help(oml.nn) or
help(oml.hist), or see Oracle Machine Learning for Python API Reference.
8-78
Chapter 8
Neural Network
8-79
Chapter 8
Neural Network
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
8-80
Chapter 8
Neural Network
nn_mod.predict(test_dat.drop('Species'),
supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
'Species']], proba = True)
nn_mod.predict_proba(test_dat.drop('Species'),
supplemental_cols = test_dat[:, ['Sepal_Length',
'Species']]).sort_values(by = ['Sepal_Length', 'Species',
'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])
8-81
Chapter 8
Neural Network
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_NEURAL_NETWORK
1 CLAS_WEIGHTS_BALANCED OFF
2 LBFGS_GRADIENT_TOLERANCE .000000001
3 LBFGS_HISTORY_DEPTH 20
4 LBFGS_SCALE_HESSIAN LBFGS_SCALE_HESSIAN_ENABLE
5 NNET_ACTIVATIONS 'NNET_ACTIVATIONS_LOG_SIG'
6 NNET_HELDASIDE_MAX_FAIL 6
7 NNET_HELDASIDE_RATIO .25
8 NNET_HIDDEN_LAYERS 1
9 NNET_ITERATIONS 200
10 NNET_NODES_PER_LAYER 30
11 NNET_TOLERANCE .000001
12 ODMS_DETAILS ODMS_ENABLE
13 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
14 ODMS_RANDOM_SEED 0
15 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
16 PREP_AUTO ON
Computed Settings:
setting name setting value
0 NNET_REGULARIZER NNET_REGULARIZER_NONE
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 ITERATIONS 60.0
2 LOSS_VALUE 0.0
3 NUM_ROWS 102.0
Attributes:
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width
8-82
Chapter 8
Neural Network
Partition: NO
Topology:
Weights:
TARGET_VALUE WEIGHT
0 None -39.836487
1 None 32.604824
2 None 0.953903
3 None 0.714064
... ... ...
239 virginica -22.650606
240 setosa 2.402457
241 versicolor 7.647615
242 virginica -9.493982
8-83
Chapter 8
Neural Network
>>> nn_mod.predict(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length', 'Sepal_Width',
... 'Species']], proba = True)
Sepal_Length Sepal_Width Species PREDICTION PROBABILITY
0 4.9 3.0 setosa setosa 1.000000
1 4.9 3.1 setosa setosa 1.000000
2 4.8 3.4 setosa setosa 1.000000
3 5.8 4.0 setosa setosa 1.000000
... ... ... ... ... ...
44 6.7 3.3 virginica virginica 1.000000
45 6.7 3.0 virginica virginica 1.000000
46 6.5 3.0 virginica virginica 1.000000
47 5.9 3.0 virginica virginica 1.000000
>>> nn_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Species']]).sort_values(by = ['Sepal_Length', 'Species',
... 'PROBABILITY_OF_setosa', 'PROBABILITY_OF_versicolor'])
Sepal_Length Species PROBABILITY_OF_SETOSA \
0 4.4 setosa 1.000000e+00
1 4.4 setosa 1.000000e+00
2 4.5 setosa 1.000000e+00
3 4.8 setosa 1.000000e+00
... ... ... ...
44 6.7 virginica 4.567318e-218
45 6.9 versicolor 3.028266e-177
46 6.9 virginica 1.203417e-215
47 7.0 versicolor 3.382837e-148
PROBABILITY_OF_VERSICOLOR PROBABILITY_OF_VIRGINICA
0 3.491272e-67 3.459448e-283
1 8.038930e-58 2.883999e-288
2 5.273544e-64 2.243282e-293
3 1.332150e-78 2.040723e-283
... ... ...
44 1.328042e-36 1.000000e+00
45 1.000000e+00 5.063405e-55
46 4.000953e-31 1.000000e+00
47 1.000000e+00 2.593761e-121
Target: Species
8-84
Chapter 8
Neural Network
Settings:
setting name setting value
0 ALGO_NAME ALGO_NEURAL_NETWORK
1 CLAS_WEIGHTS_BALANCED OFF
2 LBFGS_GRADIENT_TOLERANCE .000000001
3 LBFGS_HISTORY_DEPTH 20
4 LBFGS_SCALE_HESSIAN LBFGS_SCALE_HESSIAN_ENABLE
5 NNET_ACTIVATIONS 'NNET_ACTIVATIONS_LOG_SIG'
6 NNET_HELDASIDE_MAX_FAIL 6
7 NNET_HELDASIDE_RATIO .25
8 NNET_HIDDEN_LAYERS 1
9 NNET_ITERATIONS 200
10 NNET_NODES_PER_LAYER 50
11 NNET_TOLERANCE .000001
12 ODMS_DETAILS ODMS_ENABLE
13 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
14 ODMS_RANDOM_SEED 0
15 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
16 PREP_AUTO ON
Computed Settings:
setting name setting value
0 NNET_REGULARIZER NNET_REGULARIZER_NONE
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 ITERATIONS 68.0
2 LOSS_VALUE 0.0
3 NUM_ROWS 102.0
Attributes:
Sepal_Length
Sepal_Width
Petal_Length
Petal_Width
Partition: NO
Topology:
Weights:
8-85
Chapter 8
Random Forest
None
... ... ... ... ... ... ...
TARGET_VALUE WEIGHT
0 None 10.606389
1 None -37.256485
2 None -14.263772
3 None -17.945173
... ... ...
399 virginica -22.179815
400 setosa -6.452953
401 versicolor 13.186332
402 virginica -6.973605
8-86
Chapter 8
Random Forest
8-87
Chapter 8
Random Forest
See Also:
import oml
import pandas as pd
from sklearn import datasets
8-88
Chapter 8
Random Forest
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
oml.drop(table = 'RF_COST')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
8-89
Chapter 8
Random Forest
'Sepal_Width',
'Petal_Length',
'Species']])
8-90
Chapter 8
Random Forest
>>>
>>> # Create a cost matrix table in the database.
... cost_matrix = [['setosa', 'setosa', 0],
... ['setosa', 'virginica', 0.2],
... ['setosa', 'versicolor', 0.8],
... ['virginica', 'virginica', 0],
... ['virginica', 'setosa', 0.5],
... ['virginica', 'versicolor', 0.5],
... ['versicolor', 'versicolor', 0],
... ['versicolor', 'setosa', 0.4],
... ['versicolor', 'virginica', 0.6]]
>>> cost_matrix = \
... oml.create(pd.DataFrame(cost_matrix,
... columns = ['ACTUAL_TARGET_VALUE',
... 'PREDICTED_TARGET_VALUE',
... 'COST']),
... table = 'RF_COST')
>>>
>>> # Create an RF model object.
... rf_mod = oml.rf(tree_term_max_depth = '2')
>>>
>>> # Fit the RF model according to the training data and parameter
... # settings.
>>> rf_mod = rf_mod.fit(train_x, train_y, cost_matrix = cost_matrix)
>>>
>>> # Show details of the model.
... rf_mod
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_RANDOM_FOREST
1 CLAS_COST_TABLE_NAME "OML_USER"."RF_COST"
2 CLAS_MAX_SUP_BINS 32
3 CLAS_WEIGHTS_BALANCED OFF
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_RANDOM_SEED 0
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
9 RFOR_NUM_TREES 20
10 RFOR_SAMPLING_RATIO .5
11 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
12 TREE_TERM_MAX_DEPTH 2
13 TREE_TERM_MINPCT_NODE .05
14 TREE_TERM_MINPCT_SPLIT .1
15 TREE_TERM_MINREC_NODE 10
16 TREE_TERM_MINREC_SPLIT 20
Computed Settings:
setting name setting value
8-91
Chapter 8
Random Forest
0 RFOR_MTRY 2
Global Statistics:
attribute name attribute value
0 AVG_DEPTH 2
1 AVG_NODECOUNT 3
2 MAX_DEPTH 2
3 MAX_NODECOUNT 2
4 MIN_DEPTH 2
5 MIN_NODECOUNT 2
6 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Partition: NO
Importance:
8-92
Chapter 8
Random Forest
>>> # Return the top two most influencial attributes of the highest
... # probability class.
>>> rf_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Species']],
... topN = 2).sort_values(by = ['Sepal_Length', 'Species'])
Sepal_Length Species TOP_1 TOP_1_VAL TOP_2 TOP_2_VAL
0 4.4 setosa setosa 0.989130 versicolor 0.010870
1 4.4 setosa setosa 0.989130 versicolor 0.010870
2 4.5 setosa setosa 0.989130 versicolor 0.010870
3 4.8 setosa setosa 0.989130 versicolor 0.010870
... ... ... ... ... ... ...
42 6.7 virginica virginica 0.501016 versicolor 0.498984
43 6.9 versicolor virginica 0.501016 versicolor 0.498984
44 6.9 virginica virginica 0.501016 versicolor 0.498984
45 7.0 versicolor virginica 0.501016 versicolor 0.498984
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_RANDOM_FOREST
1 CLAS_COST_TABLE_NAME "OML_USER"."RF_COST"
2 CLAS_MAX_SUP_BINS 32
3 CLAS_WEIGHTS_BALANCED OFF
4 ODMS_DETAILS ODMS_ENABLE
5 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
6 ODMS_RANDOM_SEED 0
7 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
8 PREP_AUTO ON
9 RFOR_NUM_TREES 20
10 RFOR_SAMPLING_RATIO .5
11 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
12 TREE_TERM_MAX_DEPTH 3
13 TREE_TERM_MINPCT_NODE .05
14 TREE_TERM_MINPCT_SPLIT .1
15 TREE_TERM_MINREC_NODE 10
16 TREE_TERM_MINREC_SPLIT 20
Computed Settings:
setting name setting value
8-93
Chapter 8
Singular Value Decomposition
0 RFOR_MTRY 2
Global Statistics:
attribute name attribute value
0 AVG_DEPTH 3
1 AVG_NODECOUNT 5
2 MAX_DEPTH 3
3 MAX_NODECOUNT 6
4 MIN_DEPTH 3
5 MIN_NODECOUNT 4
6 NUM_ROWS 104
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Partition: NO
Importance:
The oml.svd class creates a model that uses the Singular Value Decomposition (SVD)
algorithm for feature extraction. SVD performs orthogonal linear transformations that capture
the underlying variance of the data by decomposing a rectangular matrix into three matrices:
U, V, and D. Columns of matrix V contain the right singular vectors and columns of matrix U
contain the left singular vectors. Matrix D is a diagonal matrix and its singular values reflect the
amount of data variance captured by the bases.
The SVDS_MAX_NUM_FEATURES constant specifies the maximum number of features supported
by SVD. The value of the constant is 2500.
For information on the oml.svd class attributes and methods, invoke help(oml.svd) or see
Oracle Machine Learning for Python API Reference.
8-94
Chapter 8
Singular Value Decomposition
8-95
Chapter 8
Singular Value Decomposition
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
try:
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
8-96
Chapter 8
Singular Value Decomposition
8-97
Chapter 8
Singular Value Decomposition
Settings:
setting name setting value
0 ALGO_NAME ALGO_SINGULAR_VALUE_DECOMP
1 ODMS_DETAILS ODMS_ENABLE
2 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
3 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
4 PREP_AUTO ON
5 SVDS_SCORING_MODE SVDS_SCORING_SVD
6 SVDS_U_MATRIX_OUTPUT SVDS_U_MATRIX_DISABLE
Computed Settings:
setting name setting value
0 FEAT_NUM_FEATURES 8
1 SVDS_SOLVER SVDS_SOLVER_TSEIGEN
2 SVDS_TOLERANCE .000000000000024646951146678475
Global Statistics:
attribute name attribute value
0 NUM_COMPONENTS 8
1 NUM_ROWS 111
2 SUGGESTED_CUTOFF 1
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Species
Partition: NO
Features:
8-98
Chapter 8
Singular Value Decomposition
D:
FEATURE_ID VALUE
0 1 886.737809
1 2 32.736792
2 3 10.043389
3 4 5.270496
4 5 2.708602
5 6 1.652340
6 7 0.938640
7 8 0.452170
V:
>>> # Perform dimensionality reduction and return values for the two
... # features that have the highest topN values.
8-99
Chapter 8
Support Vector Machine
>>> svd_mod.transform(test_dat,
... supplemental_cols = test_dat[:, ['Sepal_Length']],
... topN = 2).sort_values(by = ['Sepal_Length',
... 'TOP_1',
... 'TOP_1_VAL'])
Sepal_Length TOP_1 TOP_1_VAL TOP_2 TOP_2_VAL
0 4.4 7 0.153125 3 -0.130778
1 4.4 8 0.171819 2 0.147070
2 4.8 2 0.159324 6 -0.085194
3 4.8 7 0.157187 3 -0.141668
... ... ... ... ... ...
35 7.2 6 -0.167688 1 0.142545
36 7.2 7 -0.176290 6 -0.175527
37 7.6 4 0.205779 3 0.141533
38 7.9 8 -0.253194 7 -0.166967
8-100
Chapter 8
Support Vector Machine
8-101
Chapter 8
Support Vector Machine
See Also:
import oml
import pandas as pd
from sklearn import datasets
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species']))
try:
8-102
Chapter 8
Support Vector Machine
oml.drop('IRIS')
except:
pass
# Create the IRIS database table and the proxy object for the table.
oml_iris = oml.create(pd.concat([x, y], axis=1), table = 'IRIS')
# Fit the SVM Model according to the training data and parameter
# settings.
svm_mod.fit(train_x, train_y)
8-103
Chapter 8
Support Vector Machine
Target: Species
Settings:
setting name setting value
0 ALGO_NAME ALGO_SUPPORT_VECTOR_MACHINES
1 CLAS_WEIGHTS_BALANCED OFF
2 ODMS_DETAILS ODMS_ENABLE
3 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
4 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
5 PREP_AUTO ON
6 SVMS_CONV_TOLERANCE .0001
7 SVMS_KERNEL_FUNCTION SVMS_LINEAR
Computed Settings:
setting name setting value
0 SVMS_COMPLEXITY_FACTOR 10
1 SVMS_NUM_ITERATIONS 30
2 SVMS_SOLVER SVMS_SOLVER_IPM
Global Statistics:
attribute name attribute value
0 CONVERGED YES
1 ITERATIONS 14
2 NUM_ROWS 104
8-104
Chapter 8
Support Vector Machine
Attributes:
Petal_Length
Petal_Width
Sepal_Length
Sepal_Width
Partition: NO
COEFFICIENTS:
8-105
Chapter 8
Support Vector Machine
>>> # Make predictions and return the probability for each class
... # on new data.
>>> svm_mod.predict_proba(test_dat.drop('Species'),
... supplemental_cols = test_dat[:, ['Sepal_Length',
... 'Sepal_Width',
... 'Species']],
... topN = 1).sort_values(by = ['Sepal_Length', 'Sepal_Width'])
Sepal_Length Sepal_Width Species TOP_1 TOP_1_VAL
0 4.4 3.0 setosa setosa 0.698067
1 4.4 3.2 setosa setosa 0.815643
2 4.5 2.3 setosa versicolor 0.605105
3 4.8 3.4 setosa setosa 0.920317
... ... ... ... ... ...
44 6.7 3.3 virginica virginica 0.927706
45 6.9 3.1 versicolor versicolor 0.378391
46 6.9 3.1 virginica virginica 0.881118
47 7.0 3.2 versicolor setosa 0.586393
8-106
9
Automated Machine Learning
Use the automated algorithm selection, feature selection, and hyperparameter tuning of
Automated Machine Learning to accelerate the machine learning modeling process.
Automated Machine Learning in OML4Py is described in the following topics:
• About Automated Machine Learning
Automated Machine Learning (AutoML) provides built-in data science expertise about data
analytics and modeling that you can employ to build machine learning models.
• Algorithm Selection
The oml.automl.AlgorithmSelection class uses the characteristics of the data set and
the task to rank algorithms from the set of supported Oracle Machine Learning algorithms.
• Feature Selection
The oml.automl.FeatureSelection class identifies the most relevant feature subsets for a
training data set and an Oracle Machine Learning algorithm.
• Model Tuning
The oml.automl.ModelTuning class tunes the hyperparameters for the specified
classification or regression algorithm and training data.
• Model Selection
The oml.automl.ModelSelection class automatically selects an Oracle Machine Learning
algorithm according to the selected score metric and then tunes that algorithm.
9-1
Chapter 9
About Automated Machine Learning
Note:
As the fit method of the machine learning classes does, the AutoML functions
reduce, select, and tune provide a case_id parameter that you can use to achieve
repeatable data sampling and data shuffling during model building.
The AutoML functionality is also available in a no-code user interface alongside OML
Notebooks on Oracle Autonomous Database. For more information, see Oracle Machine
Learning AutoML User Interface .
Class Description
oml.automl.Algorit Using only the characteristics of the data set and the task, automatically
hmSelection selects the best algorithms from the set of supported Oracle Machine Learning
algorithms.
Supports classification and regression functions.
oml.automl.Feature Uses meta-learning to quickly identify the most relevant feature subsets given
Selection a training data set and an Oracle Machine Learning algorithm.
Supports classification and regression functions.
oml.automl.ModelTu Uses a highly parallel, asynchronous gradient-based hyperparameter
ning optimization algorithm to tune the algorithm hyperparameters.
Supports classification and regression functions.
oml.automl.ModelSe Selects the best Oracle Machine Learning algorithm and then tunes that
lection algorithm.
Supports classification and regression functions.
The Oracle Machine Learning algorithms supported by AutoML are the following:
9-2
Chapter 9
About Automated Machine Learning
9-3
Chapter 9
About Automated Machine Learning
9-4
Chapter 9
About Automated Machine Learning
sklearn.metrics.mean_absolute_error(y_true, y_pred,
sample_weight=None, multioutput=’uniform_average’)
Formula:
9-5
Chapter 9
Algorithm Selection
Formula:
neg_mean_squared_log_error Calculates the mean of the difference in the natural log of predicted and true
targets.
sklearn.metrics.mean_squared_log_error(y_true, y_pred,
sample_weight=None, multioutput=’uniform_average’)
Formula:
neg_median_absolute_error Calculates the median of the absolute difference between predicted and true
targets.
sklearn.metrics.median_absolute_error(y_true, y_pred)
Formula:
9-6
Chapter 9
Algorithm Selection
import oml
from oml import automl
import pandas as pd
from sklearn import datasets
9-7
Chapter 9
Feature Selection
This example uses the oml.automl.FeatureSelection class. The example builds a model on
the full data set and computes predictive accuracy. It performs automated feature selection,
filters the columns according to the determined set, and rebuilds the model. It then recomputes
predictive accuracy.
import oml
from oml import automl
9-8
Chapter 9
Feature Selection
import pandas as pd
import numpy as np
from sklearn import datasets
# Use the subset to select the features and create a model on the
# new reduced data set.
X_new = X_train[:,subset]
X_test_new = X_test[:,subset]
mod = oml.svm(mining_function='classification').fit(X_new, y_train)
"{:.2} with {:.1f}x feature reduction".format(
mod.score(X_test_new, y_test),
len(X_train.columns)/len(X_new.columns))
9-9
Chapter 9
Feature Selection
9-10
Chapter 9
Model Tuning
9-11
Chapter 9
Model Tuning
import oml
from oml import automl
import pandas as pd
from sklearn import datasets
# Use the tuned model to get the score on the test set.
"{:.2}".format(tuned_model.score(X_test, y_test))
9-12
Chapter 9
Model Tuning
Target: TARGET
9-13
Chapter 9
Model Tuning
Settings:
setting name setting value
0 ALGO_NAME ALGO_DECISION_TREE
1 CLAS_MAX_SUP_BINS 32
2 CLAS_WEIGHTS_BALANCED OFF
3 ODMS_DETAILS ODMS_DISABLE
4 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
5 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
6 PREP_AUTO ON
7 TREE_IMPURITY_METRIC TREE_IMPURITY_GINI
8 TREE_TERM_MAX_DEPTH 8
9 TREE_TERM_MINPCT_NODE 3.34
10 TREE_TERM_MINPCT_SPLIT 0.1
11 TREE_TERM_MINREC_NODE 10
12 TREE_TERM_MINREC_SPLIT 20
Attributes:
mean radius
mean texture
mean perimeter
mean area
mean smoothness
mean compactness
mean concavity
mean concave points
mean symmetry
mean fractal dimension
radius error
texture error
perimeter error
area error
smoothness error
compactness error
concavity error
concave points error
symmetry error
fractal dimension error
worst radius
worst texture
worst perimeter
worst area
worst smoothness
worst compactness
worst concavity
worst concave points
worst symmetry
worst fractal dimension
Partition: NO
>>>
>>> # Show the best tuned model train score and the
... # corresponding hyperparameters.
... score, params = results['all_evals'][0]
>>> "{:.2}".format(score), ["{}:{}".format(k, params[k])
... for k in sorted(params)]
9-14
Chapter 9
Model Selection
9-15
Chapter 9
Model Selection
This example creates an oml.automl.ModelSelection object and then uses the object to
select and tune the best model.
import oml
from oml import automl
import pandas as pd
from sklearn import datasets
9-16
Chapter 9
Model Selection
Target: TARGET
Settings:
setting name setting value
0 ALGO_NAME ALGO_SUPPORT_VECTOR_MACHINES
1 CLAS_WEIGHTS_BALANCED OFF
2 ODMS_DETAILS ODMS_DISABLE
3 ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
4 ODMS_SAMPLING ODMS_SAMPLING_DISABLE
5 PREP_AUTO ON
6 SVMS_COMPLEXITY_FACTOR 10
7 SVMS_CONV_TOLERANCE .0001
8 SVMS_KERNEL_FUNCTION SVMS_GAUSSIAN
9 SVMS_NUM_PIVOTS ...
10 SVMS_STD_DEV 5.3999999999999995
Attributes:
area error
compactness error
concave points error
concavity error
fractal dimension error
mean area
mean compactness
mean concave points
mean concavity
mean fractal dimension
9-17
Chapter 9
Model Selection
mean perimeter
mean radius
mean smoothness
mean symmetry
mean texture
perimeter error
radius error
smoothness error
symmetry error
texture error
worst area
worst compactness
worst concave points
worst concavity
worst fractal dimension
worst perimeter
worst radius
worst smoothness
worst symmetry
worst texture
Partition: NO
>>>
>>> # Score on the selected and tuned model.
... "{:.2}".format(select_model.score(X_test, y_test))
'0.99'
>>>
>>> # Drop the database table.
... oml.drop('BreastCancer')
9-18
10
Embedded Python Execution
Embedded Python Execution is a feature of Oracle Machine Learning for Python that allows
you to invoke user-defined Python functions directly in an Oracle database instance.
Embedded Python Execution is available on:
• Oracle Autonomous Database, where pre-installed Python packages can be used, via
Python, REST and SQL APIs.
• Oracle Database on premises, ExaCS, ExaC@C, DBCS, and Oracle Database deployed
in a compute instance, where the user can custom install third-party packages to use with
EPE, via Python and SQL APIs.
Embedded Python Execution is described in the following topics:
Topics:
• About Embedded Python Execution
With Embedded Python Execution, you can invoke user-defined Python functions in
Python engines spawned and managed by the Oracle database instance.
• Parallelism with OML4Py Embedded Python Execution
OML4Py embedded Python execution allows users to invoke user-defined functions from
Python, SQL, and REST interfaces using Python engines spawned and controlled by the
Oracle Autonomous Database environment.
• Embedded Python Execution Views
OML4Py includes a number of database views that contain information about datastores
and about the scripts and user-defined functions in the datastores. You can use these
views with the Embedded Python Execution APIs to work with the datastores and their
contents.
• Python API for Embedded Python Execution
You can invoke user-defined Python functions directly in an Oracle database instance by
using Embedded Python Execution functions.
• SQL API for Embedded Python Execution with On-premises Database
SQL API for Embedded Python Execution with On-premises Database has SQL interfaces
for Embedded Python Execution and for datastore and script repository management.
• SQL API for Embedded Python Execution with Autonomous Database
The SQL API for Embedded Python Execution with Autonomous Database provides SQL
interfaces for setting authorization tokens, managing access control list (ACL) privileges,
executing Python scripts, and synchronously and asynchronously running jobs.
10-1
Chapter 10
About Embedded Python Execution
• – An OML Notebooks Python interpreter session (see Use the Python Interpreter in a
Notebook Paragraph)
– REST API for Embedded Python Execution
– SQL API for Embedded Python Execution with Autonomous Database
In an on-premises Oracle Database, you can use:
• – Python API for Embedded Python Execution
– SQL API for Embedded Python Execution with On-premises Database
The following topic compares the four Embedded Python Execution APIs.
Topics:
• Comparison of the Embedded Python Execution APIs
The table below compares the four Embedded Python Execution APIs.
Category Python API for Embedded REST API for Embedded SQL APIs for Embedded
Python Execution Python Execution Python Execution
Embedded Python oml.do_eval function POST /api/py- • pyqEval Function
Execution function See Run a User-Defined Python scripts/v1/do-eval/ (Autonomous Database)
{scriptName} (Autonomous Database)
Function.
• pyqEval Function (On-
See Run a Python Function.
Premises Database) (on-
POST /api/py- premises database)
scripts/v1/do-eval/
{scriptName}/{ownerName}
See Run a Python Function with
Script Owner Specified.
Embedded Python oml.table_apply function POST /api/py-scripts/v1/ • pyqTableEval Function
Execution function See Run a User-Defined Python table-apply/{scriptName} (Autonomous Database)
(Autonomous Database)
Function on the Specified Data. See Run a Python Function on
Specified Data. • pyqTableEval Function (On-
Premises Database) (on-
POST /api/py-scripts/v1/ premises database)
table-apply/{scriptName}/
{ownerName}}
See Run a Python Function on
Specified Data with Script Owner
Specified.
10-2
Chapter 10
About Embedded Python Execution
Category Python API for Embedded REST API for Embedded SQL APIs for Embedded
Python Execution Python Execution Python Execution
Embedded Python oml.group_apply function POST /api/py-scripts/v1/ • pyqGroupEval Function
Execution function See Run a Python Function on group-apply/{scriptName} (Autonomous Database)
Data Grouped By Column (Autonomous Database)
See Run a Python Function on
Values. Grouped Data. • pyqGroupEval Function (On-
Premises Database) (on-
POST /api/py-scripts/v1/ premises database)
group-apply/{scriptName}/
{ownerName}
See Run a Python Function on
Grouped Data with Script Owner
Specified.
Embedded Python oml.row_apply function POST /api/py-scripts/v1/ • pyqRowEval Function
Execution function See Run a User-Defined Python row-apply/{scriptName} (Autonomous Database)
(Autonomous Database)
Function on Sets of Rows. See Run a Python Function on
Chunks of Rows. • pyqRowEval Function (On-
Premises Database) (on-
POST /api/py-scripts/v1/ premises database)
row-apply/{scriptName}/
{ownerName}
See Run a Python Function on
Chunks of Rows with Script
Owner Specified.
Embedded Python oml.index_apply function POST /api/py-scripts/v1/ • pyqIndexEval Function
Execution function See Run a User-Defined Python index-apply/{scriptName} (Autonomous Database)
(Autonomous Database)
Function Multiple Times. See Run a Python Function
Multiple Times. • The API for on-premises
Oracle Database has no
POST /api/py-scripts/v1/ pyqIndexEval function.
index-apply/{scriptName}/ Use pyqGroupEval Function
{ownerName} (On-Premises Database)
See Run a Python Function instead.
Multiple Times with Script Owner
Specified.
Job status API NA GET /api/py-scripts/v1/ • pyqJobStatus Function
jobs/{jobId} (Autonomous Database)
See Retrieve Asynchronous Job • NA (on-premises database)
Status.
Job result API NA GET /api/py-scripts/v1/ • pyqJobResult Function
jobs/{jobId}/result (Autonomous Database)
See Retrieve Asynchronous Job • NA (on-premises database)
Result.
Script repository oml.script.dir function GET /api/py-scripts/v1/ List the scripts by querying the
See List Available User-Defined scripts ALL_PYQ_SCRIPTS View and
Python Functions. the USER_PYQ_SCRIPTS View.
See List Scripts.
Script repository oml.script.create function NA • pyqScriptCreate Procedure
See Create and Store a User- (Autonomous Database)
(Autonomous Database)
Defined Python Function.
• pyqScriptCreate Procedure
(On-Premises Database)
(on-premises database)
10-3
Chapter 10
About Embedded Python Execution
Category Python API for Embedded REST API for Embedded SQL APIs for Embedded
Python Execution Python Execution Python Execution
Script repository oml.script.drop function NA • pyqScriptDrop Procedure
See Drop a User-Defined Python (Autonomous Database)
(Autonomous Database)
Function from the Repository.
• pyqScriptDrop Procedure
(On-Premises Database)
(on-premises database)
Script repository oml.script.load function NA NA
See Load a User-Defined Python (Scripts are loaded in the SQL
Function. APIs when the function is called.)
Script repository NA NA ALL_PYQ_SCRIPTS View
Script repository NA NA USER_PYQ_SCRIPTS View
Script repository oml.grant function NA • pyqGrant procedure (Oracle
and datastore See About the Script Repository. Autonomous Database)
• pyqGrant procedure (on-
premises database)
See About the SQL API for
Embedded Python Execution
with On-Premises Database
(on-premises database).
Script repository oml.revoke function NA • pyqRevoke procedure
and datastore See About the Script Repository. (Autonomous Database)
• pyqRevoke procedure (on-
premises database)
See About the SQL API for
Embedded Python Execution
with On-Premises Database
(on-premises database).
Datastore NA NA ALL_PYQ_DATASTORES View
Datastore NA NA ALL_PYQ_DATASTORE_CONT
ENTS View
Datastore NA NA USER_PYQ_DATASTORES
View
Authorization - NA NA • pyqAppendHostACE
Access Control Procedure (Autonomous
Lists Database)
• NA (on-premises database)
(On-premises, the
authorization is related to
logging into the user
schema.)
Authorization - NA NA • pyqRemoveHostACE
Access Control Procedure (Autonomous
Lists Database)
• NA (on-premises database)
Authorization - NA NA • pyqGetHostACE Function
Access Control (Autonomous Database)
Lists • NA (on-premises database)
Authorization - NA See Authenticate. • pyqSetAuthToken Procedure
Tokens (Autonomous Database)
• NA (on-premises database)
10-4
Chapter 10
Parallelism with OML4Py Embedded Python Execution
Category Python API for Embedded REST API for Embedded SQL APIs for Embedded
Python Execution Python Execution Python Execution
Authorization - NA See Authenticate. • pyqIsTokenSet Function
Tokens (Autonomous Database)
• NA (on-premises database)
Note:
An output limit exists on the length function for REST API and SQL APIs for
embedded Python execution. A query on the length function with a length of more
than 5000 will result in an error with error code 1024 and the error message "Output
exceeds maximum length 5000". The limit is set on the len() result of the returning
python object. For example, len() of a pandas.DataFrame is the number of rows,
len() of a list is the length of the list, etc. If pandas.DataFrame is returned, it cannot
have more than 5000 rows. If a list is returned, it should not contain more than 5000
items. This limit can be extended by updating the OML_OUTPUT_SZLIMIT in a %script
paragraph:
%script
10-5
Chapter 10
Embedded Python Execution Views
Note:
pyqIndexEval is available on Oracle Autonomous Database only.
-d '{"parallelFlag":true,"service":"MEDIUM"}'
View Description
ALL_PYQ_DATASTORES View Contains information about the datastores available
to the current user.
ALL_PYQ_DATASTORE_CONTENTS View Contains information about the objects in the
datastores available to the current user.
USER_PYQ_DATASTORES View Contains information about the datastores owned
by the current user.
ALL_PYQ_SCRIPTS View Describes the scripts that are available to the
current user.
10-6
Chapter 10
Embedded Python Execution Views
View Description
USER_PYQ_SCRIPTS View Describes the user-defined Python functions in the
script repository that are owned by the current user.
Topics:
• ALL_PYQ_DATASTORE_CONTENTS View
The ALL_PYQ_DATASTORE_CONTENTS view contains information about the contents of
datastores that are available to the current user.
• ALL_PYQ_DATASTORES View
The ALL_PYQ_DATASTORES view contains information about the datastores that are available
to the current user.
• ALL_PYQ_SCRIPTS View
The ALL_PYQ_SCRIPTS view contains information about the user-defined Python functions in
the OML4Py script repository that are available to the current user.
• USER_PYQ_DATASTORES View
The USER_PYQ_DATASTORES view contains information about the datastores that are owned
by the current user.
• USER_PYQ_SCRIPTS View
This view contains information about the user-defined Python functions in the OML4Py
script repository that are owned by the current user.
10-7
Chapter 10
Embedded Python Execution Views
10-8
Chapter 10
Embedded Python Execution Views
This example selects only the DSNAME and GRANTABLE columns from the view.
DSNAME G
---------- -
ds_pydata F
ds_pymodel T
ds_wine_data F
10-9
Chapter 10
Embedded Python Execution Views
OWNER NAME
-------- -----------------
OML_USER create_iris_table
OML_USER tmpqfun2
PYQSYS tmpqfun2
This example selects the name of the user-defined Python function and the function definition
from the view.
NAME SCRIPT
-----------------
---------------------------------------------------------------------
create_iris_table "def create_iris_table(): from sklearn.datasets import
load_iris ...
10-10
Chapter 10
Embedded Python Execution Views
This example selects only the DSNAME and GRANTABLE columns from the view.
DSNAME G
---------- -
ds_wine_data F
ds_pydata F
ds_pymodel T
NAME SCRIPT
-----------------
-------------------------------------------------------------------
create_iris_table "def create_iris_table(): from sklearn.datasets import
load_iris ...
tmpqfun2 "def return_frame(): import numpy as np import
pickle ...
10-11
Chapter 10
Python API for Embedded Python Execution
Topics:
• About Embedded Python Execution
• Run a User-Defined Python Function
Use the oml.do_eval function to run a user-defined input function that explicitly retrieves
data or for which external data is not required.
• Run a User-Defined Python Function on the Specified Data
Use the oml.table_apply function to run a Python function on data that you specify with
the data parameter.
• Run a Python Function on Data Grouped By Column Values
Use the oml.group_apply function to group the values in a database table by one or more
columns and then run a user-defined Python function on each group.
• Run a User-Defined Python Function on Sets of Rows
Use the oml.row_apply function to chunk data into sets of rows and then run a user-
defined Python function on each chunk.
• Run a User-Defined Python Function Multiple Times
Use the oml.index_apply function to run a Python function multiple times in Python
engines spawned by the database environment.
• Save and Manage User-Defined Python Functions in the Script Repository
The OML4Py script repository stores user-defined Python functions for use with Embedded
Python Execution functions.
Function Description
oml.do_eval Runs a user-defined Python function in a Python engine spawned and
managed by the database environment.
oml.group_apply Partitions a database table by the values in one or more columns and runs
the provided user-defined Python function on each partition.
oml.index_apply Runs a Python function multiple times, passing in a unique index of the
invocation to the user-defined function.
oml.row_apply Partitions a database table into sets of rows and runs the provided user-
defined Python function on the data in each set.
10-12
Chapter 10
Python API for Embedded Python Execution
Function Description
oml.table_apply Runs a Python function on data in the database as a single
pandas.DataFrame in a single Python engine.
Argument Description
oml_input_type Identifies the type of input data object that you are supplying
to the func argument.
The input types are the following:
• pandas.DataFrame
• numpy.recarray
• 'default' (the default value)
If all columns are numeric, then default type is a 2-
dimensional numpy.ndarray of type numpy.float64.
Otherwise, the default type is a pandas.DataFrame.
oml_na_omit Controls the handling of missing values in the input data. If
you specify oml_na_omit = True, then rows that contain
missing values are removed from the input data. If all of the
rows contain missing values, then the input data is an empty
oml.DataFrame. The default value is False.
About Output
When a user-defined Python function runs in OML4Py, by default it returns the Python objects
returned by the function. Also, OML4Py captures all matplotlib.figure.Figure objects
created by the user-defined Python function and converts them into PNG format.
If graphics = True, the Embedded Python Execution functions return
oml.embed.data_image._DataImage objects. The oml.embed.data_image._DataImage class
contains Python objects and PNG images. Calling the method __repr__() displays the PNG
images and prints out the Python object. By default, .dat returns the Python object that the
user-defined Python function returned; .img returns a list containing PNG image data for each
figure.
10-13
Chapter 10
Python API for Embedded Python Execution
In Embedded Python Execution, a user-defined Python function runs in one or more Python
engines spawned and managed by the database environment. The engines are dynamically
started and managed by the database. From the same user-defined Python function you can
get structured data and PNG images.
You can make the user-defined Python function either private or global. A global function is
available to any user. A private function is available only to the owner or to users to whom the
owner of the function has granted the read privilege.
The func argument is the function to run. It may be one of the following:
• A Python function
• A string that is the name of a user-defined Python function in the OML4Py script repository
• A string that defines a Python function
• An oml.script.script.Callable object returned by the oml.script.load function
The optional func_owner argument is a string or None (the default) that specifies the owner of
the registered user-defined Python function when argument func is a registered user-defined
Python function name.
The graphics argument is a boolean that specifies whether to look for images. The default
value is False.
With the **kwargs parameter, you can pass additional arguments to the func function. Special
control arguments, which start with oml_, are not passed to the function specified by func, but
instead control what happens before or after the running of the function.
See Also: About Special Control Arguments
The oml.do_eval function returns a Python object or an oml.embed.data_image._DataImage. If
no image is rendered in the user-defined Python function, oml.do_eval returns whatever
Python object is returned by the function. Otherwise, it returns an
oml.embed.data_image._DataImage object.
import pandas as pd
import oml
10-14
Chapter 10
Python API for Embedded Python Execution
import pandas as pd
id = list(range(0, int(num)))
res = [i/scale for i in id]
return pd.DataFrame({"ID":id, "RES":res})
res
10-15
Chapter 10
Python API for Embedded Python Execution
The data argument is an oml.DataFrame that contains the data that the func function operates
on.
The func argument is the function to run. It may be one of the following:
• A Python function
• A string that is the name of a user-defined Python function in the OML4Py script repository
• A string that defines a Python function
• An oml.script.script.Callable object returned by the oml.script.load function
The optional func_owner argument is a string or None (the default) that specifies the owner of
the registered user-defined Python function when argument func is a registered user-defined
Python function name.
The graphics argument is a boolean that specifies whether to look for images. The default
value is False.
With the **kwargs parameter, you can pass additional arguments to the func function. Special
control arguments, which start with oml_, are not passed to the function specified by func, but
instead control what happens before or after the execution of the function.
See Also: About Special Control Arguments
The oml.table_apply function returns a Python object or an
oml.embed.data_image._DataImage. If no image is rendered in the user-defined Python
function, oml.table_apply returns whatever Python object is returned by the function.
Otherwise, it returns an oml.embed.data_image._DataImage object.
import oml
import pandas as pd
from sklearn import datasets
from sklearn import linear_model
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
10-16
Chapter 10
Python API for Embedded Python Execution
res = oml.table_apply(data=oml_iris.head(n=10),
func=predict, regr=regr)
res
10-17
Chapter 10
Python API for Embedded Python Execution
normalize=False)
>>> regr.coef_
array([[ 0.65083716, 0.70913196, -0.55648266]])
>>>
>>> # Use oml.table_apply to predict using the model on the first 10
... # rows of the IRIS table.
... def predict(dat, regr):
... import pandas as pd
... pred = regr.predict(dat[['Sepal_Width', 'Petal_Length',
... 'Petal_Width']])
... return pd.concat([dat,pd.DataFrame(pred)], axis=1)
...
>>> res = oml.table_apply(data=oml_iris.head(n=10),
... func=predict, regr=regr)
>>> res Sepal_Length Sepal_Width Petal_Length Petal_Width
0 4.6 3.6 1 0.2
1 5.1 2.5 3 1.1
2 6.0 2.2 4 1.0
3 5.8 2.6 4 1.2
4 5.5 2.3 4 1.3
5 5.5 2.5 4 1.3
6 6.1 2.8 4 1.3
7 5.7 2.5 5 2.0
8 6.0 2.2 5 1.5
9 6.3 2.5 5 1.9
Species 0
0 setosa 4.796847
1 versicolor 4.998355
2 versicolor 5.567884
3 versicolor 5.716923
4 versicolor 5.466023
5 versicolor 5.596191
6 virginica 5.791442
7 virginica 5.915785
8 virginica 5.998775
9 virginica 5.971433
10-18
Chapter 10
Python API for Embedded Python Execution
The data argument is an oml.DataFrame that contains the in-database data that the func
function operates on.
The index argument is an oml.DataFrame object, the columns of which are used to group the
data before sending it to the func function.
The func argument is the function to run. It may be one of the following:
• A Python function
• A string that is the name of a user-defined Python function in the OML4Py script repository
• A string that defines a Python function
• An oml.script.script.Callable object returned by the oml.script.load function
The optional func_owner argument is a string or None (the default) that specifies the owner of
the registered user-defined Python function when argument func is a registered user-defined
Python function name.
The parallel argument is a boolean, an int, or None (the default) that specifies the preferred
degree of parallelism to use in the Embedded Python Execution job. The value may be one of
the following:
• A positive integer greater than or equal to 1 for a specific degree of parallelism
• False, None, or 0 for no parallelism
• True for the default data parallelism
The optional orderby argument is an oml.DataFrame, oml.Float, or oml.String that specifies
the ordering of the group partitions.
The graphics argument is a boolean that specifies whether to look for images. The default
value is False.
With the **kwargs parameter, you can pass additional arguments to the func function. Special
control arguments, which start with oml_, are not passed to the function specified by func, but
instead control what happens before or after the running of the function.
See Also: About Special Control Arguments
The oml.group_apply function returns a dict of Python objects or a dict of
oml.embed.data_image._DataImage objects. If no image is rendered in the user-defined
Python function, oml.group_apply returns a dict of Python object returned by the function.
Otherwise, it returns a dict of oml.embed.data_image._DataImage objects.
import pandas as pd
from sklearn import datasets
import oml
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
10-19
Chapter 10
Python API for Embedded Python Execution
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
# Group the data by the Species column and run the user-defined
# function for each species.
res = oml.group_apply(oml_iris, index, func=group_count,
oml_input_type="pandas.DataFrame")
res
# The output is a dict of key-value pairs for each species and model.
type(mod)
10-20
Chapter 10
Python API for Embedded Python Execution
10-21
Chapter 10
Python API for Embedded Python Execution
...
>>> # Run the model for each species and return an objectList in
... # dict format with a model for each species.
... mod = oml.group_apply(oml_iris[:,["Petal_Length", "Petal_Width",
... "Species"]], index, func=build_lm)
>>>
>>> # The output is a dict of key-value pairs for each species and model.
... type(mod)
<class 'dict'>
>>>
>>> # Sort dict by the key species.
... {k: mod[k] for k in sorted(mod.keys())}
{'setosa': LinearRegression(copy_X=True, fit_intercept=True,
n_jobs=None,normalize=False), 'versicolor': LinearRegression(copy_X=True,
fit_intercept=True, n_jobs=None, normalize=False), 'virginica':
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)}
The data argument is an oml.DataFrame that contains the data that the func function operates
on.
The func argument is the function to run. It may be one of the following:
• A Python function
• A string that is the name of a user-defined Python function in the OML4Py script repository
• A string that defines a Python function
• An oml.script.script.Callable object returned by the oml.script.load function
The optional func_owner argument is a string or None (the default) that specifies the owner of
the registered user-defined Python function when argument func is a registered user-defined
Python function name.
The rows argument is an int that specifies the maximum number of rows to include in each
chunk.
10-22
Chapter 10
Python API for Embedded Python Execution
The parallel argument is a boolean, an int, or None (the default) that specifies the preferred
degree of parallelism to use in the Embedded Python Execution job. The value may be one of
the following:
• A positive integer greater than or equal to 1 for a specific degree of parallelism
• False, None, or 0 for no parallelism
• True for the default data parallelism
The graphics argument is a boolean that specifies whether to look for images. The default
value is True.
With the **kwargs parameter, you can pass additional arguments to the func function. Special
control arguments, which start with oml_, are not passed to the function specified by func, but
instead control what happens before or after the running of the function.
See Also: About Special Control Arguments
The oml.row_apply function returns a pandas.DataFrame or a list of
oml.embed.data_image._DataImage objects. If no image is rendered in the user-defined
Python function, oml.row_apply returns a pandas.DataFrame. Otherwise, it returns a list of
oml.embed.data_image._DataImage objects.
The example builds a regression model based on iris data. It defines a function that predicts
the Petal_Width values based on the Sepal_Length, Sepal_Width, and Petal_Length columns
of the input data. It then concatenates the Species column, the Petal_Width column, and the
predicted Petal_Width as the object to return. Finally, the example calls the oml.row_apply
function to apply the make_pred() function on each 4-row chunk of the input data.
import oml
import pandas as pd
from sklearn import datasets
from sklearn import linear_model
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
x = pd.DataFrame(iris.data,
columns = ['Sepal_Length','Sepal_Width',
'Petal_Length','Petal_Width'])
y = pd.DataFrame(list(map(lambda x:
{0: 'setosa', 1: 'versicolor',
2:'virginica'}[x], iris.target)),
columns = ['Species'])
# Create the IRIS database table and the proxy object for the table.
10-23
Chapter 10
Python API for Embedded Python Execution
10-24
Chapter 10
Python API for Embedded Python Execution
>>>
>>> # Build a regression model to predict Petal_Width using in-memory
... # data.
... iris = oml_iris.pull()
>>> regr = linear_model.LinearRegression()
>>> regr.fit(iris[['Sepal_Length', 'Sepal_Width', 'Petal_Length']],
... iris[['Petal_Width']])
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
>>> regr.coef_
array([[-0.20726607, 0.22282854, 0.52408311]])
>>>
>>> # Define a Python function.
... def make_pred(dat, regr):
... import pandas as pd
... import numpy as np
... pred = regr.predict(dat[['Sepal_Length',
... 'Sepal_Width',
... 'Petal_Length']])
... return pd.concat([dat[['Species', 'Petal_Width']],
... pd.DataFrame(pred,
... columns=['Pred_Petal_Width'])],
... axis=1)
>>>
>>> input_data = oml_iris.split(ratio=(0.9, 0.1), strata_cols='Species')[1]
>>> input_data.crosstab(index = 'Species').sort_values('Species')
SPECIES count
0 setosa 7
1 versicolor 8
2 virginica 4
>>> res = oml.row_apply(input_data, rows=4, func=make_pred, regr=regr,
... columns=['Species',
... 'Petal_Width',
... 'Pred_Petal_Width']))
>>> res = oml.row_apply(input_data, rows=4, func=make_pred,
... regr=regr, parallel=2)
>>> type(res)
<class 'pandas.core.frame.DataFrame'>
>>> res
Species Petal_Width Pred_Petal_Width
0 setosa 0.4 0.344846
1 setosa 0.3 0.335509
2 setosa 0.2 0.294117
3 setosa 0.2 0.220982
4 setosa 0.2 0.080937
5 versicolor 1.5 1.504615
6 versicolor 1.3 1.560570
7 versicolor 1.0 1.008352
8 versicolor 1.3 1.131905
9 versicolor 1.3 1.215622
10 versicolor 1.3 1.272388
11 virginica 1.8 1.623561
12 virginica 1.8 1.878132
10-25
Chapter 10
Python API for Embedded Python Execution
The times argument is an int that specifies the number of times to run the func function.
The func argument is the function to run. It may be one of the following:
• A Python function
• A string that is the name of a user-defined Python function in the OML4Py script repository
• A string that defines a Python function
• An oml.script.script.Callable object returned by the oml.script.load function
The optional func_owner argument is a string or None (the default) that specifies the owner of
the registered user-defined Python function when argument func is a registered user-defined
Python function name.
The parallel argument is a boolean, an int, or None (the default) that specifies the preferred
degree of parallelism to use in the Embedded Python Execution job. The value may be one of
the following:
• A positive integer greater than or equal to 1 for a specific degree of parallelism
• False, None, or 0 for no parallelism
• True for the default data parallelism
The graphics argument is a boolean that specifies whether to look for images. The default
value is True.
With the **kwargs parameter, you can pass additional arguments to the func function. Special
control arguments, which start with oml_, are not passed to the function specified by func, but
instead control what happens before or after the running of the function.
See Also: About Special Control Arguments
The oml.index_apply function returns a list of Python objects or a list of
oml.embed.data_image._DataImage objects. If no image is rendered in the user-defined
Python function, oml.index_apply returns a list of the Python objects returned by the user-
defined Python function. Otherwise, it returns a list of oml.embed.data_image._DataImage
objects.
See Also: About Output
Example 10-10 Using the oml.index_apply Function
This example defines a function that returns the mean of a set of random numbers the
specified number of times.
import oml
import pandas as pd
10-26
Chapter 10
Python API for Embedded Python Execution
def compute_random_mean(index):
import numpy as np
import scipy
from statistics import mean
np.random.seed(index)
res = np.random.random((100,1))*10
return mean(res[1])
res = oml.index_apply(times=10, func=compute_random_mean)
type(res)
res
Note:
The user-defined Python functions can be used outside of Embedded Python
Execution. You can store functions and reload them back into notebooks or other
user-defined functions.
10-27
Chapter 10
Python API for Embedded Python Execution
Topics:
• About the Script Repository
Use these functions to store, manage, and use user-defined Python functions in the script
repository.
• Create and Store a User-Defined Python Function
Use the oml.script.create function to add a user-defined Python function to the script
repository.
• List Available User-Defined Python Functions
Use the oml.script.dir function to list the user-defined Python functions in the OML4Py
script repository.
• Load a User-Defined Python Function
Use the oml.script.load function to load a user-defined Python function from the script
repository into a Python session.
• Drop a User-Defined Python Function from the Repository
Use the oml.script.drop function to remove a user-defined Python function from the
script repository.
Function Description
oml.script.create Registers a single user-defined Python function in the script repository.
oml.script.dir Lists the user-defined Python functions present in the script repository.
oml.script.drop Drops a user-defined Python function from the script repository.
oml.script.load Loads a user-defined Python function from the script repository into a
Python session.
The following table lists the Python functions for managing access to user-defined Python
functions in the script repository, and to datastores and datastore objects.
Function Description
oml.grant Grants read privilege permission to another user to a datastore or user-
defined Python function owned by the current user.
oml.revoke Revokes the read privilege permission that was granted to another user to a
datastore or user-defined Python function owned by the current user.
10-28
Chapter 10
Python API for Embedded Python Execution
You can make the user-defined Python function either private or global. A private user-defined
Python function is available only to the owner, unless the owner grants the read privilege to
other users. A global user-defined Python function is available to any user.
The syntax of oml.script.create is the following:
The name argument is a string that specifies a name for the user-defined Python function in the
Python script repository.
The func argument is the Python function to run. The argument can be a Python function or a
string that contains the definition of a Python function. You must specify a string in an
interactive session if readline cannot get the command history.
The is_global argument is a boolean that specifies whether to create a global user-defined
Python function. The default value is False, which indicates that the user-defined Python
function is a private function available only to the current session user. When is_global is
True, it specifies that the function is global and every user has the read privilege and the
execute privilege to it.
The overwrite argument is a boolean that specifies whether to overwrite the user-defined
Python function if it already exists. The default value is False.
Load the iris dataset as a pandas dataframe from the seaborn library. Use the oml.create
function to create the IRIS database table and the proxy object for the table.
%python
# Load the iris data set and create a pandas.DataFrame for it.
iris = datasets.load_iris()
# Create the IRIS database table and the proxy object for the table.
try:
oml.drop(table="IRIS")
except:
pass
10-29
Chapter 10
Python API for Embedded Python Execution
%python
# Define a function.
res
res.coef_
Define another user-defined function build_lm2, store the function as a global script in the
OML4Py script repository. Run the user-defined Python function in embedded Python
execution.
%python
10-30
Chapter 10
Python API for Embedded Python Execution
LinearRegression()
List the user-defined Python functions in the script repository available to the current user only.
%python
oml.script.dir()
[4 rows x 4 columns]
List all of the user-defined Python functions available to the current user.
%python
oml.script.dir(sctype='all')
%python
oml.script.dir(sctype='global')
10-31
Chapter 10
Python API for Embedded Python Execution
The name argument is a string that specifies the name of a user-defined Python function or a
regular expression to match to the names of user-defined Python functions in the script
repository. When name is None, this function returns the type of user-defined Python functions
specified by argument sctype.
The regex_match argument is a boolean that indicates whether argument name is a regular
expression to match. The default value is False.
The sctype argument is a string that specifies the type of user-defined Python function to list.
The value may be one of the following.
• user, to specify the user-defined Python functions available to the current user only.
• grant, to specify the user-defined Python functions to which the read and execute privilege
have been granted by the current user to other users.
• granted, to specify the user-defined Python functions to which the read and execute
privilege have been granted by other users to the current user.
• global, to specify all of the global user-defined Python functions created by the current
user.
• all, to specify all of the user-defined Python functions available to the current user.
The oml.script.dir function returns a pandas.DataFrame that contains the columns NAME
and SCRIPT and, optionally, the columns OWNER and GRANTEE.
Example 10-12 Using the oml.script.dir Function
This example lists the contents of the script repository using different arguments to the
oml.script.dir function. For the creation of the user-defined Python functions, see
Example 10-11.
import oml
10-32
Chapter 10
Python API for Embedded Python Execution
oml.script.dir()
oml.script.load(name, owner=None)
The name argument is a string that specifies the name of the user-defined Python function to
load from the OML4Py script repository.
10-33
Chapter 10
Python API for Embedded Python Execution
The optional owner argument is a string that specifies the owner of the user-defined Python
function or None (the default). If owner=None, then this function finds and loads the user-defined
Python function that matches name in the following order:
import oml
The name argument is a string that specifies the name of the user-defined Python function in
the script repository.
10-34
Chapter 10
Python API for Embedded Python Execution
The is_global argument is a boolean that specifies whether the user-defined Python function
to drop is a global or a private user-defined Python function. The default value is False, which
indicates a private user-defined Python function.
The silent argument is a boolean that specifies whether to display an error message when
oml.script.drop encounters an error in dropping the specified user-defined Python function.
The default value is False.
import oml
10-35
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Topics:
• About the SQL API for Embedded Python Execution with On-Premises Database
With the SQL API, you can run user-defined Python functions in one or more separate
Python engines in an Oracle database environment, manage user-defined Python
functions in the OML4Py script repository, and control access to and get information about
datastores and about user-defined Python functions in the script repository.
• pyqEval Function (On-Premises Database)
This topic describes the pyqEval function when used in an on-premises Oracle Database.
The pyqEval function runs a user-defined Python function that explicitly retrieves data or
for which external data is to be automatically loaded for the function.
• pyqTableEval Function (On-Premises Database)
This topic describes the pyqTableEval function when used in an on-premises Oracle
Database. The pyqTableEval function runs a user-defined Python function on data from an
Oracle Database table.
• pyqRowEval Function (On-Premises Database)
This topic describes the pyqRowEval function when used in an on-premises Oracle
Database. The pyqRowEval function chunks data into sets of rows and then runs a user-
defined Python function on each chunk.
• pyqGroupEval Function (On-Premises Database)
This topic describes the pyqGroupEval function when used in an on-premises Oracle
Database. The pyqGroupEval function groups data by one or more columns and runs a
user-defined Python function on each group.
• pyqGrant Function (On-Premises Database)
This topic describes the pyqGrant function when used in an on-premises Oracle Database.
• pyqRevoke Function (On-Premises Database)
This topic describes the pyqRevoke function when used in an on-premises Oracle
Database.
• pyqScriptCreate Procedure (On-Premises Database)
This topic describes the pyqScriptCreate procedure in an on-premises Oracle Database.
The pyqScriptCreate procedure creates a user-defined Python function and adds it to the
OML4Py script repository.
• pyqScriptDrop Procedure (On-Premises Database)
This topic describes the pyqScriptDrop procedure in an on-premises Oracle Database.
The pyqScriptDrop procedure removes a user-defined Python function from the OML4Py
script repository.
10-36
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
10.5.1 About the SQL API for Embedded Python Execution with On-
Premises Database
With the SQL API, you can run user-defined Python functions in one or more separate Python
engines in an Oracle database environment, manage user-defined Python functions in the
OML4Py script repository, and control access to and get information about datastores and
about user-defined Python functions in the script repository.
You can use the SQL interface for Embedded Python Execution with an on-premises Oracle
Database instance.
OML4Py provides the following types of SQL functions and procedures.
• SQL table functions for running user-defined Python functions in one or more database-
spawned and managed Python engines; the user-defined Python functions may reference
Python objects in OML4Py datastores and use third-party packages installed with the
database server machine Python engines..
• PL/SQL procedures for creating and dropping user-defined Python functions in the
OML4Py script repository.
• PL/SQL procedures for granting and revoking the read privilege to datastores and the
datastore objects in them, and to user-defined Python functions in the OML4Py script
repository.
The following table lists the SQL functions for Embedded Python Execution and the PL/SQL
procedures for managing datastores and user-defined Python functions.
10-37
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
You can pass arguments to the Python function with the PAR_QRY parameter.
The pyqEval function does not automatically receive any data from the database. The Python
function generates the data that it uses or it explicitly retrieves it from a data source such as
Oracle Database, other databases, or flat files.
The Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a
pandas.DataFrame object. You define the form of the returned value with the OUT_QRY
parameter.
Syntax
pyqEval (
PAR_QRY VARCHAR2 IN
OUT_QRY VARCHAR2 IN
EXP_NAM VARCHAR2 IN)
Parameters
Parameter Description
PAR_QRY A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the EXP_NAM parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
EXP_NAM, but instead control what happens before or after the invocation of the
function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
OUT_QRY The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the table
returned by the function. Any image data is discarded.
• The name of a table or view to use as a prototype. If using a table or view
owned by another user, use the format <owner name>.<table/view
name>. You must have read access to the specified table or view.
• The string 'XML', which specifies that the table returned contains a CLOB
that is an XML string. The XML can contain both structured data and
images, with structured or semi-structured Python objects first, followed by
the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a BLOB
that has the image or images generated by the Python function. Images
are returned as a base 64 encoding of the PNG representation.
EXP_NAM The name of a user-defined Python function in the OML4Py script repository.
Returns
Function pyqEval returns a table that has the structure specified by the OUT_QRY parameter
value.
Example 10-15 Using the pyqEval Function
This example defines Python functions and stores them in the OML4Py script repository. It
invokes the pyqEval function on the user-defined Python functions.
10-38
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
In a PL/SQL block, create an unnamed Python function that is stored in script repository with
the name pyqFun1.
BEGIN
sys.pyqScriptCreate('pyqFun1', 'func = lambda: "Hello World from a
lambda!"',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
Invoke the pyqEval function, which runs the user-defined Python function and returns the
results as XML.
NAME VALUE
---- --------------------------------------------------
<root><str>Hello World from a lambda!</str></root>
BEGIN
sys.pyqScriptDrop('pyqFun1');
END;
/
Define a Python function that returns a numpy.ndarray that is stored in script repository with
the name pyqFun2.
BEGIN
sys.pyqScriptCreate('pyqFun2',
'def return_frame():
import numpy as np
import pickle
z = np.array([y for y in zip([str(x)+"demo" for x in range(10)],
[float(x)/10 for x in range(10)],
[x for x in range(10)],
[bool(x%2) for x in range(10)],
[pickle.dumps(x) for x in range(10)],
["test"+str(x**2) for x in range(10)])],
dtype=[("a", "U10"), ("b", "f8"), ("c", "i4"),
("d", "?"), ("e", "S20"), ("f", "O")])
return z');
END;
/
10-39
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Invoke the pyqEval function, which runs the pyqFun2 user-defined Python function.
SELECT *
FROM table(pyqEval(
NULL,
'{"A":"varchar2(10)", "B":"number",
"C":"number", "D":"number",
"E":"raw(10)", "F": "varchar2(10)" }',
'pyqFun2'));
A B C D E F
---------- ---------- ---------- ---------- -------------------- ----------
0demo 0 0 0 80034B002E test0
1demo 1.0E-001 1 1 80034B012E test1
2demo 2.0E-001 2 0 80034B022E test4
3demo 3.0E-001 3 1 80034B032E test9
4demo 4.0E-001 4 0 80034B042E test16
5demo 5.0E-001 5 1 80034B052E test25
6demo 6.0E-001 6 0 80034B062E test36
7demo 7.0E-001 7 1 80034B072E test49
8demo 8.0E-001 8 0 80034B082E test64
9demo 9.0E-001 9 1 80034B092E test81
10 rows selected.
BEGIN
sys.pyqScriptDrop('pyqFun2');
END;
/
The Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a
pandas.DataFrame object. You define the form of the returned value with the OUT_QRY
parameter.
Syntax
pyqTableEval (
INP_NAM VARCHAR2 IN
PAR_QRY VARCHAR2 IN
10-40
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
OUT_QRY VARCHAR2 IN
EXP_NAM VARCHAR2 IN)
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python
function specified by the EXP_NAM parameter. If using a table or view owned
by another user, use the format <owner name>.<table/view name>. You
must have read access to the specified table or view.
PAR_QRY A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the EXP_NAM parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
EXP_NAM, but instead control what happens before or after the invocation of
the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
OUT_QRY The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the
table returned by the function. Any image data is discarded.
• The name of a table or view to use as a prototype. If using a table or view
owned by another user, use the format <owner name>.<table/view
name>. You must have read access to the specified table or view.
• The string 'XML', which specifies that the table returned contains a
CLOB that is an XML string. The XML can contain both structured data
and images, with structured or semi-structured Python objects first,
followed by the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a
BLOB that has the image or images generated by the Python function.
Images are returned as a base 64 encoding of the PNG representation.
EXP_NAM The name of a user-defined Python function in the OML4Py script repository.
Returns
Function pyqTableEval returns a table that has the structure specified by the OUT_QRY
parameter value.
Example 10-16 Using the pyqTableEval Function
This example stores a user-defined Python function in the OML4Py script repository with the
name create_iris_table. It uses the function to create a database table as the result of a
pyqEval function invocation. It creates another user-defined Python function that fits a linear
regression model to the input data and saves the model in the OML4Py datastore. The
example runs a SQL SELECT statement that invokes the pyqTableEval function, which invokes
the function stored in the script repository with the name myLinearRegressionModel.
In a PL/SQL block, define the Python function create_iris_table and store in the script
repository with the name create_iris_table, overwriting any existing user-defined Python
function stored in the script repository with the same name.
10-41
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
The create_iris_table function imports and loads the iris data set, creates two
pandas.DataFrame objects, and then returns the concatenation of those objects.
BEGIN
sys.pyqScriptCreate('create_iris_table',
'def create_iris_table():
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width", "Petal_Length", "Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1)',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
CREATE TABLE IRIS AS
(SELECT * FROM pyqEval(
NULL,
'{"Species":"VARCHAR2(10)","Sepal_Length":"number",
"Sepal_Width":"number","Petal_Length":"number",
"Petal_Width":"number"}',
'create_iris_table'
));
Define the Python function fit_model and store it with the name myLinearRegressionModel as
a private function in the script repository, overwriting any existing user-defined Python function
stored with that name.
The fit_model function fits a regression model to the input data dat and then saves the fitted
model as an object specified by the modelName argument to the datastore specified by the
datastoreName argument. The fit_model function returns the fitted model in a string format.
By default, Python objects are saved to a new datastore with the specified datastoreName. To
save an object to an existing datastore, either set the overwrite or append argument to True in
the oml.ds.save invocation.
BEGIN
sys.pyqScriptCreate('myLinearRegressionModel',
'def fit_model(dat, modelName, datastoreName):
import oml
from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(dat.loc[:, ["Sepal_Length", "Sepal_Width", \
"Petal_Length"]], dat.loc[:,["Petal_Width"]])
oml.ds.save(objs={modelName:regr}, name=datastoreName,
overwrite=True)
return str(regr)',
FALSE, TRUE);
END;
/
10-42
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Run a SELECT statement that invokes the pyqTableEval function. The INP_NAM parameter of
the pyqTableEval function specifies the IRIS table as the data to pass to the Python function.
The PAR_QRY parameter specifies the names of the model and datastore to pass to the Python
function, and specifies the oml_connect control argument to establish an OML4Py connection
to the database during the invocation of the user-defined Python function. The OUT_QRY
parameter specifies returning the value in XML format and the EXP_NAM parameter specifies
the myLinearRegressionModel function in the script repository as the Python function to
invoke. The XML output is a CLOB; you can call set long [length] to get more output.
SELECT *
FROM table(pyqTableEval(
'IRIS',
'{"modelName":"linregr",
"datastoreName":"pymodel",
"oml_connect":1}',
'XML',
'myLinearRegressionModel'));
NAME VALUE
----- ------------------------------------------------------------
<root><str>LinearRegression()</str></root>
The ROW_NUM parameter specifies the maximum number of rows to pass to each invocation of
the Python function. The last set of rows may have fewer rows than the number specified.
The Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a
pandas.DataFrame object. You may define the form of the returned value with the OUT_QRY
parameter.
Syntax
pyqRowEval (
INP_NAM VARCHAR2 IN
PAR_QRY VARCHAR2 IN
OUT_QRY VARCHAR2 IN
ROW_NUM NUMBER IN
EXP_NAM VARCHAR2 IN)
10-43
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python
function specified by the EXP_NAM parameter. If using a table or view owned
by another user, use the format <owner name>.<table/view name>. You
must have read access to the specified table or view.
PAR_QRY A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the EXP_NAM parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
EXP_NAM, but instead control what happens before or after the invocation of
the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
OUT_QRY The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the
table returned by the function. Any image data is discarded.
• The name of a table or view to use as a prototype. If using a table or view
owned by another user, use the format <owner name>.<table/view
name>. You must have read access to the specified table or view.
• The string 'XML', which specifies that the table returned contains a
CLOB that is an XML string. The XML can contain both structured data
and images, with structured or semi-structured Python objects first,
followed by the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a
BLOB that has the image or images generated by the Python function.
Images are returned as a base 64 encoding of the PNG representation.
ROW_NUM The number of rows to include in each invocation of the Python function.
EXP_NAM The name of a user-defined Python function in the OML4Py script repository.
Returns
Function pyqRowEval returns a table that has the structure specified by the OUT_QRY parameter
value.
Example 10-17 Using the pyqRowEval Function
This example loads the Python model linregr to predict row chunks of sample iris data. The
model is created and saved in the datastore pymodel in Example 10-16.
The example defines a Python function and stores it in the OML4Py script repository. It uses
the user-defined Python function to create a database table as the result of the pyqEval
function. It defines a Python function that runs a prediction function on a model loaded from the
OML4Py datastore. It then invokes the pyqTableEval function to invoke the function on chunks
of rows from the database table.
In a PL/SQL block, define the function sample_iris_table and store it in the script repository.
The function loads the iris data set, creates two pandas.DataFrame objects, and then returns a
sample of the concatenation of those objects.
BEGIN
sys.pyqScriptCreate('sample_iris_table',
'def sample_iris_table(size):
10-44
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Create the SAMPLE_IRIS table in the database as the result of a SELECT statement, which
invokes the pyqEval function on the sample_iris_table user-defined Python function saved in
the script repository with the same name. The sample_iris_table function returns an iris data
sample of size size.
Define the Python function predict_model and store it with the name linregrPredict in the
script repository. The function predicts the data in dat with the Python model specified by the
modelName argument, which is loaded from the datastore specified by the datastoreName
argument. The predictions are finally concatenated and returned with dat as the object that the
function returns.
BEGIN
sys.pyqScriptCreate('linregrPredict',
'def predict_model(dat, modelName, datastoreName):
import oml
import pandas as pd
objs = oml.ds.load(name=datastoreName, to_globals=False)
pred = objs[modelName].predict(dat[["Sepal_Length","Sepal_Width",\
"Petal_Length"]])
return pd.concat([dat, pd.DataFrame(pred, \
columns=["Pred_Petal_Width"])], axis=1)',
FALSE, TRUE);
END;
/
Run a SELECT statement that invokes the pyqRowEval function, which runs the specified Python
function on each chunk of rows in the specified data set.
The INP_NAM argument specifies the data in the SAMPLE_IRIS table to pass to the Python
function.
10-45
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
The PAR_QRY argument specifies connecting to the OML4Py server with the special control
argument oml_connect, passing the input data as a pandas.DataFrame with the special control
argument oml_input_type, along with values for the function arguments modelName and
datastoreName.
In the OUT_QRY argument, the JSON string specifies the column names and data types of the
table returned by pyqRowEval.
The ROW_NUM argument specifies that five rows are included in each invocation of the function
specified by EXP_NAM.
The EXP_NAM parameter specifies linregrPredict, which is the name in the script repository
of the user-defined Python function to invoke.
SELECT *
FROM table(pyqRowEval(
'SAMPLE_IRIS',
'{"oml_connect":1,"oml_input_type":"pandas.DataFrame",
"modelName":"linregr", "datastoreName":"pymodel"}',
'{"Species":"varchar2(10)", "Sepal_Length":"number",
"Sepal_Width":"number", "Petal_Length":"number",
"Petal_Width":"number","Pred_Petal_Width":"number"}',
5,
'linregrPredict'));
10-46
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
The Python function can return a boolean, a dict, a float, an int, a list, a str, a tuple or a
pandas.DataFrame object. Define the form of the returned value with the OUT_QRY parameter.
Syntax
pyqGroupEval (
INP_NAM VARCHAR2 IN
PAR_QRY VARCHAR2 IN
OUT_QRY VARCHAR2 IN
GRP_COL VARCHAR2 IN
EXP_NAM VARCHAR2 IN)
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python
function specified by the EXP_NAM parameter. If using a table or view owned
by another user, use the format <owner name>.<table/view name>. You
must have read access to the specified table or view.
PAR_QRY A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the EXP_NAM parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
EXP_NAM, but instead control what happens before or after the invocation of
the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
10-47
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Parameter Description
OUT_QRY The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the
table returned by the function. Any image data is discarded.
• The name of a table or view to use as a prototype. If using a table or view
owned by another user, use the format <owner name>.<table/view
name>. You must have read access to the specified table or view.
• The string 'XML', which specifies that the table returned contains a
CLOB that is an XML string. The XML can contain both structured data
and images, with structured or semi-structured Python objects first,
followed by the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a
BLOB that has the image or images generated by the Python function.
Images are returned as a base 64 encoding of the PNG representation.
GRP_COL The names of the grouping columns by which to partition the data. Use
commas to separate multiple columns. For example, to group by GENDER and
YEAR:
"GENDER,YEAR"
EXP_NAM The name of a user-defined Python function in the OML4Py script repository.
Returns
Function pyqGroupEval returns a table that has the structure specified by the OUT_QRY
parameter value.
Example 10-18 Using the pyqGroupEval Function
This example defines the Python function create_iris_table and stores it with the name
create_iris_table in the OML4Py script repository. It then invokes pyqEval, which invokes the
user-definded Python function and creates the IRIS database table. The example creates the
package irisPkg and uses that package in specifying the data cursor to pass to the
irisGroupEval function, which is a user-defined pyqGroupEval function. It defines another
Python function, group_count and stores it in the script repository with the name
mygroupcount. The example then invokes the irisGroupEval function and passes it the
Python function saved with the name mygroupcount.
In a PL/SQL block, define the Python function create_iris_table and store in the script
repository with the name create_iris_table.
BEGIN
sys.pyqScriptCreate('create_iris_table',
'def create_iris_table():
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width","Petal_Length","Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1)');
END;
/
10-48
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Invoke the pyqEval function to create the database table IRIS, using the Python function
stored with the name create_iris_table in the script repository.
Define the Python function group_count and store it with the name mygroupcount in the script
repository. The function returns a pandas.DataFrame generated on each group of data dat.
BEGIN
sys.pyqScriptCreate('mygroupcount',
'def group_count(dat):
import pandas as pd
return pd.DataFrame([(dat["Species"][0], dat.shape[0])],\
columns = ["Species", "CNT"]) ');
END;
/
Issue a query that invokes the pyqGroupEval function. In the function, the INP_NAM argument
specifies the data in the IRIS table to pass to the function.
The PAR_QRY argument specifies the special control argument oml_input_type.
The OUT_QRY argument specifies a JSON string that contains the column names and data types
of the table returned by pyqGroupEval.
The EXP_NAM parameter specifies the user-defined Python function stored with the name
mygroupcount in the script repository.
SELECT *
FROM table(
pyqGroupEval(
'IRIS',
'{"oml_input_type":"pandas.DataFrame"}',
'{"Species":"varchar2(10)", "CNT":"number"}',
'Species',
'mygroupcount'));
Species CNT
---------- ----------
setosa 50
versicolor 50
virginica 50
10-49
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
The pyqGrant function grants read privilege access to an OML4Py datastore or to a script in
the OML4Py script repository.
Syntax
pyqGrant (
V_NAME VARCHAR2 IN
V_TYPE VARCHAR2 IN
V_USER VARCHAR2 IN DEFAULT)
Parameters
Parameter Description
V_NAME The name of an OML4Py datastore or a script in the OML4Py script repository.
V_TYPE For a datastore, the type is datastore; for script the type is pyqScript.
V_USER The name of the user to whom to grant access.
10-50
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
END;
/
The pyqRevoke function revokes read privilege access to an OML4Py datastore or to a script in
the OML4Py script repository.
Syntax
pyqRevoke (
V_NAME VARCHAR2 IN
V_TYPE VARCHAR2 IN
V_USER VARCHAR2 IN DEFAULT)
Parameters
Parameter Description
V_NAME The name of an OML4Py datastore or a script in the OML4Py script repository.
V_TYPE For a datastore, the type is datastore; for script the type is pyqScript.
V_USER The name of the user from whom to revoke access.
10-51
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Syntax
sys.pyqScriptCreate (
V_NAME VARCHAR2 IN
V_SCRIPT CLOB IN
V_GLOBAL BOOLEAN IN DEFAULT
V_OVERWRITE BOOLEAN IN DEFAULT)
Parameter Description
V_NAME A name for the user-defined Python function in the OML4Py script repository.
V_SCRIPT The definition of the Python function.
V_GLOBAL TRUE specifies that the user-defined Python function is public; FALSE specifies that
the user-defined Python function is private.
V_OVERWRITE If the script repository already has a user-defined Python function with the same
name as V_NAME, then TRUE replaces the content of that user-defined Python
function with V_SCRIPT and FALSE does not replace it.
BEGIN
sys.pyqScriptCreate('pyqFun2',
'def return_frame():
import numpy as np
import pickle
z = np.array([y for y in zip([str(x)+"demo" for x in range(10)],
[float(x)/10 for x in range(10)],
[x for x in range(10)],
[bool(x%2) for x in range(10)],
[pickle.dumps(x) for x in range(10)],
["test"+str(x**2) for x in range(10)])],
dtype=[("a", "U10"), ("b", "f8"), ("c", "i4"), ("d", "?"),
("e", "S20"), ("f", "O")])
return z');
10-52
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
END;
/
This example creates a global user-defined Python function named pyqFun2 in the script
repository and overwrites any existing user-defined Python function of the same name.
BEGIN
sys.pyqScriptCreate('pyqFun2',
'def return_frame():
import numpy as np
import pickle
z = np.array([y for y in zip([str(x)+"demo" for x in range(10)],
[float(x)/10 for x in range(10)],
[x for x in range(10)],
[bool(x%2) for x in range(10)],
[pickle.dumps(x) for x in range(10)],
["test"+str(x**2) for x in range(10)])],
dtype=[("a", "U10"), ("b", "f8"), ("c", "i4"), ("d", "?"),
("e", "S20"), ("f", "O")])
return z',
TRUE, -- Make the user-defined Python function global.
TRUE); -- Overwrite any global user-defined Python function
-- with the same name.
END;
/
This example creates a private user-defined Python function named create_iris_table in the
script repository.
BEGIN
sys.pyqScriptCreate('create_iris_table',
'def create_iris_table():
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width","Petal_Length","Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1)');
END;
/
NAME SCRIPT
-----------------
---------------------------------------------------------------------
create_iris_table def create_iris_table(): from sklearn.datasets
10-53
Chapter 10
SQL API for Embedded Python Execution with On-premises Database
Syntax
sys.pyqScriptDrop (
V_NAME VARCHAR2 IN
V_GLOBAL BOOLEAN IN DEFAULT
V_SILENT BOOLEAN IN DEFAULT)
Parameter Description
V_NAME A name for the user-defined Python function in the OML4Py script repository.
V_GLOBAL A BOOLEAN that specifies whether the user-defined Python function to drop is a
global or a private user-defined Python function. The default value is FALSE, which
indicates a private user-defined Python function. TRUE specifies that the user-
defined Python function is public.
V_SILENT A BOOLEAN that specifies whether to display an error message when
sys.pyqScriptDrop encounters an error in dropping the specified user-defined
Python function. The default value is FALSE.
10-54
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
This example drops the private user-defined Python function pyqFun2 from the script repository.
BEGIN
sys.pyqScriptDrop('pyqFun2');
END;
/
This example drops the global user-defined Python function pyqFun2 from the script repository.
BEGIN
sys.pyqScriptDrop('pyqFun2', TRUE);
END;
/
Topics:
• Access and Authorization Procedures and Functions
Use the network access control lists (ACL) API to control access by users to external
network services and resources from the database. Use the token store API to persist the
authorization token issued by a cloud host so it can be used with subsequent SQL calls.
• Embedded Python Execution Functions (Autonomous Database)
The SQL API for Embedded Python Execution with Autonomous Database functions are
described in the following topics.
• Asynchronous Jobs (Autonomous Database)
When a function is run asynchronously, it's run as a job which can be tracked by using the
pyqJobStatus and pyqJobResult functions.
• Special Control Arguments (Autonomous Database)
Use the PAR_LST parameter to specify special control arguments and additional arguments
to be passed into the Python script.
• Output Formats (Autonomous Database)
The OUT_FMT parameter controls the format of output returned by the table functions
pyqEval, pyqGroupEval, pyqIndexEval, pyqRowEval, pyqTableEval, and pyqJobResult.
• pyqAppendHostACE Procedure
10-55
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
• pyqGetHostACE Function
• pyqRemoveHostACE Procedure
Use the following to manage authorization tokens:
• pyqSetAuthToken Procedure
• pyqIsTokenSet Function
Workflow
The typical workflow for using the SQL API for Embedded Python Execution with Autonomous
Database is:
1. Connect to PDB as the ADMIN user, and add a normal user OMLUSER to the ACL list of the
cloud host of which the root domain is adb.us-region-1.oraclecloudapps.com:
exec pyqAppendHostAce('OMLUSER','adb.us-region-1.oraclecloudapps.com');
2. The OML Rest URLs can be obtained from the Oracle Autonomous Database that is
provisioned.
a. Sign into your Oracle Cloud Infrastructure account. You will need your OCI user name
and password.
b. Click the hamburger menu and select Autonomous Database instance that is
provisioned. For more information on provisioning an Autonomous Database, see:
Provision an Oracle Autonomous Database.
c. Click Service Console and then click Devlopment.
d. Scroll down to Oracle Machine Learning RESTful Services tile and click Copy to
obtain the following URLs for:
• Obtaining the REST authentication token for REST APIs provided by OML:
<oml-cloud-service-location-url>/omlusers/
10-56
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
{password}'"}'
"<oml-cloud-service-location-url>/omlusers/api/oauth2/v1/token"
Note:
When a token expires, all calls to the OML Services REST endpoints with return
a message stating that the token has expired along with the HTTP error:
HTTP/1.1 401 Unauthorized
4. Connect to PDB as OMLUSER, set the access token, and run pyqIndexEval:
ID RES
---------- ---
1 a
2 b
3 c
3 rows selected.
The following topics describe the steps to manage ACL privileges and authorization tokens.
Topics:
• pyqAppendHostACE Procedure
The pyqAppendHostACE procedure appends an access control entry (ACE) to the access
control list (ACL) of the cloud host. The ACL controls access to the cloud host from the
database, and the ACE specifies the connect privilege granted to the specified user name.
• pyqGetHostACE Function
The pyqGetHostACE function gets the existing host access control entry (ACE) for the
specified user. An exception is raised if the host ACE doesn't exist for the specified user.
• pyqRemoveHostACE Procedure
• pyqSetAuthToken Procedure
The pyqSetAuthToken procedure sets the access token in the token store.
10-57
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
• pyqIsTokenSet Function
The pyqIsTokenSet function returns whether the authorization token is set or not.
PROCEDURE SYS.pyqAppendHostACE(
username IN VARCHAR2,
host_root_domain IN VARCHAR2
)
Parameter
username - Database user to whom the connect privilege to the cloud host is granted.
host_root_domain - Root domain of the cloud host. For example, if the URL is https://
qtraya2braestch-omldb.adb.us-sanjose-1.oraclecloudapps.com, the root domain of the
cloud host is: adb.us-sanjose-1.oraclecloudapps.com.
Example
exec pyqAppendHostAce('OMLUSER','adb.us-region-1.oraclecloudapps.com');
Note:
OML username is case sensitive
Syntax
FUNCTION sys.pyqGetHostACE(
p_username IN VARCHAR2
)
Parameter
p_username - Database user to look for the host ACE.
10-58
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Example
If user OMLUSER has access to the cloud host, i.e., ibuwlq4mjqkeils-omlrgpy1.adb.us-
region-1.oraclecloudapps.com, the ADMIN user can run the following to check the user's
privileges:
Syntax
PROCEDURE SYS.pyqRemoveHostACE(
username IN VARCHAR2
)
Parameter
username - Database user from whom the connect privilege to the cloud host is revoked.
Syntax
PROCEDURE SYS.pyqSetAuthToken(
access_token IN VARCHAR2
)
Syntax
10-59
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Example
The following example shows how to use the pyqSetAuthToken procedure and the
pyqIsTokenSet function.
DECLARE
is_set BOOLEAN;
BEGIN
pyqSetAuthToken('<access token>');
is_set := pyqIsTokenSet();
IF (is_set) THEN
DBMS_OUTPUT.put_line ('token is set');
END IF;
END;
/
Topics
• pyqListEnvs Function (Autonomous Database)
The function pyqListEnvs when used in Oracle Autonomous Database, lists the
environments saved in an Object Storage.
• pyqEval Function (Autonomous Database)
The function pyqEval, when used in Oracle Autonomous Database, calls a user-defined
Python function. Users can pass arguments to the user-defined Python function.
• pyqTableEval Function (Autonomous Database)
The function pyqTableEval function when used in Oracle Autonomous Database, runs a
user-defined Python function on data from an Oracle Database table.
• pyqRowEval Function (Autonomous Database)
The function pyqRowEval when used in Oracle Autonomous Database, chunks data into
sets of rows and then runs a user-defined Python function on each chunk.
• pyqGroupEval Function (Autonomous Database)
The function pyqGroupEval when used in Oracle Autonomous Database, groups data by
one or more columns and runs a user-defined Python function on each group.
• pyqIndexEval Function (Autonomous Database)
The function pyqIndexEval when used in Oracle Autonomous Database, runs a user-
defined Python function multiple times as required in the Python engines spawned by the
database environment.
• pyqGrant Function (Autonomous Database)
This topic describes the pyqGrant function when used in Oracle Autonomous Database.
• pyqRevoke Function (Autonomous Database)
This topic describes the pyqRevoke function when used in Oracle Autonomous Database.
• pyqScriptCreate Procedure (Autonomous Database)
This topic describes the pyqScriptCreate procedure in Oracle Autonomous Database.
Use the pyqScriptCreate procedure to create a user-defined Python function and add it to
the OML4Py script repository.
10-60
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Syntax
FUNCTION PYQSYS.pyqListEnvs
RETURN SYS.AnyDataSet
Example
Issue a query that calls the pyqListEnvs function and lists the environments present.
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
{"envs":[{"size":" 1.7 GiB","name":"sbenv","description":"Conda environment
with seaborn","number_of_installed_packages":78,"tags":"appli
cation":"OML4PY"}]}
Syntax
FUNCTION PYQSYS.pyqEval(
PAR_LST VARCHAR2,
OUT_FMT VARCHAR2,
SCR_NAME VARCHAR2,
SCR_OWNER VARCHAR2 DEFAULT NULL,
ENV_NAME VARCHAR2 DEFAULT NULL
)
RETURN SYS.AnyDataSet
10-61
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameters
Parameter Description
PAR_LST A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the SCR_NAME parameter. Special control arguments,
which start with oml_, are not passed to the function specified by SCR_NAME, but
instead control what happens before or after the invocation of the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
See also: Special Control Arguments (Autonomous Database).
OUT_FMT The format of the output returned by the function. It can be one of the following:
• A JSON string that specifies the column names and data types of the table
returned by the function. Any image data is discarded. The Python function
must return a pandas.DataFrame, a numpy.ndarray, a tuple, or a list of
tuples.
• The string 'JSON', which specifies that the table returned contains a CLOB
that is a JSON string.
• The string 'XML', which specifies that the table returned contains a CLOB that
is an XML string. The XML can contain both structured data and images, with
structured or semi-structured Python objects first, followed by the image or
images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a BLOB that
has the image or images generated by the Python function. Images are
returned as a base 64 encoding of the PNG representation.
See also: Output Formats (Autonomous Database).
SCR_NAME The name of a user-defined Python function in the OML4Py script repository.
SCR_OWNER The owner of the registered Python script. The default value is NULL. If NULL, will
search for the Python script in the user’s script repository.
ENV_NAME The name of the conda environment that should be used when running the named
user-defined Python function.
Example
This example defines a Python function and stores it in the OML4Py script repository. It calls
the pyqEval function on the user-defined Python functions.
In a PL/SQL block, create a Python function that is stored in script repository with the name
pyqFun1.
begin
sys.pyqScriptCreate('pyqFun1',
'def fun_tab():
import pandas as pd
names = ["demo_"+str(i) for i in range(10)]
ids = [x for x in range(10)]
floats = [float(x)/10 for x in range(10)]
d = {''ID'': ids, ''NAME'': names, ''FLOAT'': floats}
scores_table = pd.DataFrame(d)
return scores_table
',FALSE,TRUE); -- V_GLOBAL, V_OVERWRITE
end;
/
10-62
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Next, call the pyqEval function, which runs the user-defined Python function.
The PAR_LST argument specifies using LOW service level with the special control argument
oml_service_level.
In the OUT_FMT argument, the string 'JSON', specifies that the table returned contains a CLOB
that is a JSON string.
The SCR_NAME parameter specifies the pyqFun1 function in the script repository as the Python
function to call.
The JSON output is a CLOB. You can call set long [length] to get more output.
NAME
----------------------------------------------------------------------
VALUE
----------------------------------------------------------------------
[{"FLOAT":0,"ID":0,"NAME":"demo_0"},{"FLOAT":0.1,"ID":1,"NAME":"demo_1
"},{"FLOAT":0.2,"ID":2,"NAME":"demo_2"},{"FLOAT":0.3,"ID":3,"NAME":"de
mo_3"},{"FLOAT":0.4,"ID":4,"NAME":"demo_4"},{"FLOAT":0.5,"ID":5,"NAME"
:"demo_5"},{"FLOAT":0.6,"ID":6,"NAME":"demo_6"},{"FLOAT":0.7,"ID":7,"N
AME":"demo_7"},{"FLOAT":0.8,"ID":8,"NAME":"demo_8"},{"FLOAT":0.9,"ID":
9,"NAME":"demo_9"}]
1 row selected.
Issue another query that invokes the same pyqFun1 script. The OUT_FMT argument specifies a
JSON string that contains the column names and data types of the structured table output.
select *
from table(pyqEval(
par_lst => '{"oml_service_level":"LOW"}',
out_fmt => '{"ID":"number", "NAME":"VARCHAR2(8)",
"FLOAT":"binary_double"}',
scr_name => 'pyqFun1'));
ID NAME FLOAT
0 demo_0 0.0
1 demo_1 0.1
2 demo_2 0.2
3 demo_3 0.3
4 demo_4 0.4
5 demo_5 0.5
6 demo_6 0.6
10-63
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
7 demo_7 0.7
8 demo_8 0.8
9 demo_9 0.9
10 rows selected.
In a PL/SQL block, define the Python function create_iris_table and store in the script
repository with the name create_iris_table, overwriting any existing user-defined Python
function stored in the script repository with the same name.
The create_iris_table function imports and loads the iris data set, creates two
pandas.DataFrame objects, and then returns the concatenation of those objects.
BEGIN
sys.pyqScriptCreate('create_iris_table',
'def create_iris_table():
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width", "Petal_Length", "Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1)',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
CREATE TABLE IRIS AS
(SELECT * FROM pyqEval(
NULL,
'{"Species":"VARCHAR2(10)","Sepal_Length":"number",
"Sepal_Width":"number","Petal_Length":"number",
"Petal_Width":"number"}',
'create_iris_table'
));
The user-defined Python function can return a boolean, a dict, a float, an int, a list, a str,
a tuple or a pandas.DataFrame object. You define the form of the returned value with the
OUT_FMT parameter.
Syntax
FUNCTION PYQSYS.pyqTableEval(
INP_NAM VARCHAR2,
PAR_LST VARCHAR2,
10-64
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
OUT_FMT VARCHAR2,
SCR_NAME VARCHAR2,
SCR_OWNER VARCHAR2 DEFAULT NULL,
ENV_NAME VARCHAR2 DEFAULT NULL
)
RETURN SYS.AnyDataSet
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python
function specified by the SCR_NAME parameter. If using a table or view owned
by another user, use the format <owner name>.<table/view name>. You
must have read access to the specified table or view.
PAR_LST A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the SCR_NAME parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
SCR_NAME, but instead control what happens before or after the invocation of
the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
See also: Special Control Arguments (Autonomous Database).
OUT_FMT The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the table
returned by the function. Any image data is discarded. The Python
function must return a pandas.DataFrame, a numpy.ndarray, a tuple,
or a list of tuples.
• The string 'JSON', which specifies that the table returned contains a
CLOB that is a JSON string.
• The string 'XML', which specifies that the table returned contains a CLOB
that is an XML string. The XML can contain both structured data and
images, with structured or semi-structured Python objects first, followed by
the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a BLOB
that has the image or images generated by the Python function. Images
are returned as a base 64 encoding of the PNG representation.
See also: Output Formats (Autonomous Database).
SCR_NAME The name of a user-defined Python function in the OML4Py script repository.
SCR_OWNER The owner of the registered Python script. The default value is NULL. If NULL,
will search for the Python script in the user’s script repository.
ENV_NAME The name of the conda environment that should be used when running the
named user-defined Python function.
Example
Define the Python function fit_model and store it with the name myLinearRegressionModel as
a private function in the script repository, overwriting any existing user-defined Python function
stored with that name.
The fit_model function fits a regression model to the input data dat and then saves the fitted
model as an object specified by the modelName argument to the datastore specified by the
datastoreName argument. The fit_model function returns the fitted model in a string format.
10-65
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
By default, Python objects are saved to a new datastore with the specified datastoreName. To
save an object to an existing datastore, either set the overwrite or append argument to True in
the oml.ds.save invocation.
BEGIN
sys.pyqScriptCreate('myLinearRegressionModel',
'def fit_model(dat, modelName, datastoreName):
import oml
from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(dat.loc[:, ["Sepal_Length", "Sepal_Width", \
"Petal_Length"]],
dat.loc[:,["Petal_Width"]])
oml.ds.save(objs={modelName:regr}, name=datastoreName,
overwrite=True)
return str(regr)',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
This example uses the IRIS table created in the example shown in pyqEval Function
(Autonomous Database). Run a SELECT statement that invokes the pyqTableEval function. The
INP_NAM parameter of the pyqTableEval function specifies the IRIS table as the data to pass to
the Python function. The PAR_LST parameter specifies the names of the model and datastore to
pass to the Python function. The OUT_FMT parameter specifies returning the value in XML
format and the SCR_NAME parameter specifies the myLinearRegressionModel function in the
script repository as the Python function to invoke. The XML output is a CLOB; you can call set
long [length] to get more output.
SELECT *
FROM table(pyqTableEval(
inp_nam => 'IRIS',
par_lst => '{"modelName":"linregr",
"datastoreName":"pymodel"}',
out_fmt => 'XML',
scr_name => 'myLinearRegressionModel'));
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
<root><str>LinearRegression()</str></root>
1 row selected.
10-66
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Syntax
FUNCTION PYQSYS.pyqRowEval(
INP_NAM VARCHAR2,
PAR_LST VARCHAR2,
OUT_FMT VARCHAR2,
ROW_NUM NUMBER,
SCR_NAME VARCHAR2,
SCR_OWNER VARCHAR2 DEFAULT NULL,
ENV_NAME VARCHAR2 DEFAULT NULL
)
RETURN SYS.AnyDataSet
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python
function specified by the SCR_NAME parameter. If using a table or view owned
by another user, use the format <owner name>.<table/view name>. You
must have read access to the specified table or view.
PAR_LST A JSON string that contains additional parameters to pass to the user-defined
Python function specified by the SCR_NAME parameter. Special control
arguments, which start with oml_, are not passed to the function specified by
SCR_NAME, but instead control what happens before or after the invocation of
the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
See also: Special Control Arguments (Autonomous Database).
10-67
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Description
OUT_FMT The format of the output returned by the function. It can be one of the
following:
• A JSON string that specifies the column names and data types of the table
returned by the function. Any image data is discarded. The Python
function must return a pandas.DataFrame, a numpy.ndarray, a tuple,
or a list of tuples.
• The string 'JSON', which specifies that the table returned contains a
CLOB that is a JSON string.
• The string 'XML', which specifies that the table returned contains a CLOB
that is an XML string. The XML can contain both structured data and
images, with structured or semi-structured Python objects first, followed by
the image or images generated by the Python function.
• The string 'PNG', which specifies that the table returned contains a BLOB
that has the image or images generated by the Python function. Images
are returned as a base 64 encoding of the PNG representation.
See also: Output Formats (Autonomous Database).
ROW_NUM The number of rows in a chunk. The Python script is executed in each chunk.
SCR_NAME The name of a user-defined Python function in the OML4Py script repository.
SCR_OWNER The owner of the registered Python script. The default value is NULL. If NULL,
will search for the Python script in the user’s script repository.
ENV_NAME The name of the conda environment that should be used when running the
named user-defined Python function.
Example
This example loads the Python model linregr to predict row chunks of sample iris data. The
model is created and saved in the datastore pymodel, which is shown in the example for
pyqTableEval Function (Autonomous Database).
The example defines a Python function and stores it in the OML4Py script repository. It uses
the user-defined Python function to create a database table as the result of the pyqEval
function. It defines a Python function that runs a prediction function on a model loaded from the
OML4Py datastore. It then invokes the pyqTableEval function to invoke the function on chunks
of rows from the database table.
In a PL/SQL block, define the function sample_iris_table and store it in the script repository.
The function loads the iris data set, creates two pandas.DataFrame objects, and then returns a
sample of the concatenation of those objects.
BEGIN
sys.pyqScriptCreate('sample_iris_table',
'def sample_iris_table(size):
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width","Petal_Length","Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1).sample(int(size))',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
10-68
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
END;
/
Create the SAMPLE_IRIS table in the database as the result of a SELECT statement, which
invokes the pyqEval function on the sample_iris_table user-defined Python function saved in
the script repository with the same name. The sample_iris_table function returns an iris data
sample of size size.
Define the Python function predict_model and store it with the name linregrPredict in the
script repository. The function predicts the data in dat with the Python model specified by the
modelName argument, which is loaded from the datastore specified by the datastoreName
argument. The function also plots the actual petal width values with the predicted values. The
predictions are finally concatenated and returned with dat as the object that the function
returns.
BEGIN
sys.pyqScriptCreate('linregrPredict',
'def predict_model(dat, modelName, datastoreName):
import oml
import pandas as pd
objs = oml.ds.load(name=datastoreName, to_globals=False)
pred = objs[modelName].predict(dat[["Sepal_Length",\
"Sepal_Width","Petal_Length"]])
return pd.concat([dat, pd.DataFrame(pred, \
columns=["Pred_Petal_Width"])], axis=1)',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
Run a SELECT statement that invokes the pyqRowEval function, which runs the specified Python
function on each chunk of rows in the specified data set.
The INP_NAM argument specifies the data in the SAMPLE_IRIS table to pass to the Python
function.
The PAR_LST argument specifies passing the input data as a pandas. DataFrame with the
special control argument oml_input_type, along with values for the function arguments
modelName and datastoreName.
In the OUT_FMT argument, the JSON string specifies the column names and data types of the
structured table output.
The ROW_NUM argument specifies that five rows are included in each invocation of the function
specified by SCR_NAME.
10-69
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
The SCR_NAME parameter specifies linregrPredict, which is the name in the script repository
of the user-defined Python function to invoke.
SELECT *
FROM table(pyqRowEval(
inp_nam => 'SAMPLE_IRIS',
par_lst => '{"oml_input_type":"pandas.DataFrame",
"modelName":"linregr", "datastoreName":"pymodel"}',
out_fmt => '{"Species":"varchar2(12)", "Petal_Length":"number",
"Pred_Petal_Width":"number"}',
row_num => 5,
scr_name => 'linregrPredict'));
20 rows selected.
Run a SELECT statement that invokes the pyqRowEval function and return the XML output. Each
invocation of script linregrPredict is applied to 10 rows of data in the SAMPLE_IRIS table. The
XML output is a CLOB; you can call set long [length] to get more output.
10-70
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
NAME VALUE
<root><pandas_dataFrame><ROW-pandas_dataFrame><Species>setosa</
Species><Sepal_Length>5</Sepal_Length><Sepal_Width>3.2</
Sepal_Width><Petal_Length>1.2</Petal_Length><Petal_Width>0.2</
Petal_Width><Pred_Petal_Width>0.0653133201897007</Pred_Petal_Width></ROW-
pandas_dataFrame><ROW-pandas_dataFrame><Species>
The user-defined Python function can return a boolean, a dict, a float, an int, a list, a str,
a tuple or a pandas.DataFrame object. Define the form of the returned value with the OUT_FMT
parameter.
Syntax
FUNCTION PYQSYS.pyqGroupEval(
INP_NAM VARCHAR2,
PAR_LST VARCHAR2,
OUT_FMT VARCHAR2,
GRP_COL VARCHAR2,
ORD_COL VARCHAR2,
SCR_NAME VARCHAR2,
SCR_OWNER VARCHAR2 DEFAULT NULL,
ENV_NAME VARCHAR2 DEFAULT NULL
)
RETURN SYS.AnyDataSet
Parameters
Parameter Description
INP_NAM The name of a table or view that specifies the data to pass to the Python function
specified by the SCR_NAME parameter. If using a table or view owned by another user,
use the format <owner name>.<table/view name>. You must have read access to
the specified table or view.
10-71
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Description
PAR_LST A JSON string that contains additional parameters to pass to the user-defined Python
function specified by the SCR_NAME parameter. Special control arguments, which start
with oml_, are not passed to the function specified by SCR_NAME, but instead control
what happens before or after the invocation of the function.
For example, to specify the input data type as pandas.DataFrame, use:
'{"oml_input_type":"pandas.DataFrame"}'
See also: Special Control Arguments (Autonomous Database).
OUT_FMT The format of the output returned by the function. It can be one of the following:
• A JSON string that specifies the column names and data types of the table returned
by the function. Any image data is discarded. The Python function must return a
pandas.DataFrame, a numpy.ndarray, a tuple, or a list of tuples.
• The string 'JSON', which specifies that the table returned contains a CLOB that is a
JSON string.
• The string 'XML', which specifies that the table returned contains a CLOB that is an
XML string. The XML can contain both structured data and images, with structured
or semi-structured Python objects first, followed by the image or images generated
by the Python function.
• The string 'PNG', which specifies that the table returned contains a BLOB that has
the image or images generated by the Python function. Images are returned as a
base 64 encoding of the PNG representation.
See also: Output Formats (Autonomous Database).
GRP_COL The names of the grouping columns by which to partition the data. Use commas to
separate multiple columns. For example, to group by GENDER and YEAR:
"GENDER,YEAR"
ORD_COL Comma-separated column names to order the input data. For example to order by
GENDER:
"GENDER"
If specified, the input data will first be ordered by the ORD_COL columns and then
grouped by the GRP_COL columns.
SCR_NAME The name of a user-defined Python function in the OML4Py script repository.
SCR_OWNER The owner of the registered Python script. The default value is NULL. If NULL, will search
for the Python script in the user’s script repository.
ENV_NAME The name of the conda environment that should be used when running the named user-
defined Python function.
Example
This example uses the IRIS table created in the example shown in pyqEval Function
(Autonomous Database).
Define the Python function group_count and store it with the name mygroupcount in the script
repository. The function returns a pandas.DataFrame generated on each group of data dat. The
function also plots the sepal length with the petal length values on each group.
BEGIN
sys.pyqScriptCreate('mygroupcount',
'def group_count(dat):
import pandas as pd
import matplotlib.pyplot as plt
plt.plot(dat[["Sepal_Length"]], dat[["Petal_Length"]], ".")
plt.xlabel("Sepal Length")
10-72
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
plt.ylabel("Petal Length")
plt.title("{}".format(dat["Species"][0]))
return pd.DataFrame([(dat["Species"][0], dat.shape[0])],\
columns = ["Species", "CNT"]) ',
FALSE, TRUE); -- V_GLOBAL, V_OVERWRITE
END;
/
Issue a query that invokes the pyqGroupEval function. In the function, the INP_NAM argument
specifies the data in the IRIS table to pass to the function.
The PAR_LST argument specifies the special control argument oml_input_type.
The OUT_FMT argument specifies a JSON string that contains the column names and data types
of the table returned by pyqGroupEval.
The SCR_NAME parameter specifies the user-defined Python function stored with the name
mygroupcount in the script repository.
SELECT *
FROM table(
pyqGroupEval(
inp_nam => 'IRIS',
par_lst => '{"oml_input_type":"pandas.DataFrame"}',
out_fmt => '{"Species":"varchar2(10)", "CNT":"number"}',
grp_col => 'Species',
ord_col => NULL,
scr_name => 'mygroupcount'));
Species CNT
---------- ----------
virginica 50
setosa 50
versicolor 50
3 rows selected.
Run the same script with IRIS data and return the XML output. The PAR_LST argument
specifies the special control argument oml_graphics_flag to capture images rendered in the
script. Both structured data and images are included in the XML output. The XML output is a
CLOB; you can call set long [length] to get more output.
10-73
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
NAME VALUE
virginica <root><Py-data><pandas_dataFrame><ROW-
pandas_dataFrame><Species>virginica</Species><CNT>50</CNT></ROW-
pandas_dataFrame></pandas_dataFrame></Py-data><images><image><img
src="data:image/pngbase64"><!
[CDATA[iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXR
wbG90bGliIHZlcnNpb24zLjMu
setosa <root><Py-data><pandas_dataFrame><ROW-
pandas_dataFrame><Species>setosa</Species><CNT>50</CNT></ROW-
pandas_dataFrame></pandas_dataFrame></Py-data><images><image><img
src="data:image/pngbase64"><!
[CDATA[iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXR
wbG90bGliIHZlcnNpb24zLjMuMyw
versicolor <root><Py-data><pandas_dataFrame><ROW-
pandas_dataFrame><Species>versicolor</Species><CNT>50</CNT></ROW-
pandas_dataFrame></pandas_dataFrame></Py-data><images><image><img
src="data:image/pngbase64"><!
[CDATA[iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAAAOXRFWHRTb2Z0d2FyZQBNYXR
wbG90bGliIHZlcnNpb24zLjM
Run the same script with IRIS data and get the PNG output. The PAR_LST argument specifies
the special control argument oml_graphics_flag to capture images.
10-74
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
4706C6F746C6962
2076657273696F6
E332E332E332C20
6874747073
Syntax
FUNCTION PYQSYS.pyqIndexEval(
PAR_LST VARCHAR2,
OUT_FMT VARCHAR2,
TIMES_NUM NUMBER,
SCR_NAME VARCHAR2,
SCR_OWNER VARCHAR2 DEFAULT NULL,
ENV_NAME VARCHAR2 DEFAULT NULL
10-75
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
)
RETURN SYS.AnyDataSet
10-76
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameters
Parameter Des
cript
ion
PAR_LST A
JSO
N
strin
g
that
cont
ains
addit
ional
para
mete
rs to
pass
to
the
user-
defin
ed
Pyth
on
funct
ion
speci
fied
by
the
SCR_
NAME
para
mete
r.
Spec
ial
contr
ol
argu
ment
s,
whic
h
start
with
oml_
, are
not
pass
ed to
the
funct
ion
speci
fied
by
10-77
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
SCR_
NAME
, but
inste
ad
contr
ol
what
happ
ens
befor
e or
after
the
invoc
ation
of
the
funct
ion.
For
exa
mple
, to
speci
fy
the
input
data
type
as
pand
as.D
ataF
rame
,
use:
'{"o
ml_i
nput
_typ
e":"
pand
as.D
ataF
rame
"}'
See
also:
Spec
ial
Cont
rol
Argu
ment
s
10-78
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
(Aut
ono
mou
s
Data
base
).
10-79
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
OUT_FMT The
form
at of
the
outp
ut
retur
ned
by
the
funct
ion.
It
can
be
one
of
the
follo
wing:
• A
J
S
O
N
s
t
r
i
n
g
t
h
a
t
s
p
e
c
i
f
i
e
s
t
h
e
c
o
l
u
m
n
n
a
m
10-80
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
e
s
a
n
d
d
a
t
a
t
y
p
e
s
o
f
t
h
e
t
a
b
l
e
r
e
t
u
r
n
e
d
b
y
t
h
e
f
u
n
c
t
i
o
n
.
A
n
y
i
m
a
g
e
d
a
10-81
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
t
a
i
s
d
i
s
c
a
r
d
e
d
.
T
h
e
P
y
t
h
o
n
f
u
n
c
t
i
o
n
m
u
s
t
r
e
t
u
r
n
a
p
a
n
d
a
s
.
D
a
t
a
F
r
10-82
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
a
m
e
,
a
n
u
m
p
y
.
n
d
a
r
r
a
y
,
a
t
u
p
l
e
,
o
r
a
l
i
s
t
o
f
t
u
p
l
e
s
.
• T
h
e
s
t
r
i
n
g
'
J
S
10-83
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
O
N
'
,
w
h
i
c
h
s
p
e
c
i
f
i
e
s
t
h
a
t
t
h
e
t
a
b
l
e
r
e
t
u
r
n
e
d
c
o
n
t
a
i
n
s
a
C
L
O
B
t
h
a
t
i
10-84
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
s
a
J
S
O
N
s
t
r
i
n
g
.
• T
h
e
s
t
r
i
n
g
'
X
M
L
'
,
w
h
i
c
h
s
p
e
c
i
f
i
e
s
t
h
a
t
t
h
e
t
a
b
l
e
r
e
10-85
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
t
u
r
n
e
d
c
o
n
t
a
i
n
s
a
C
L
O
B
t
h
a
t
i
s
a
n
X
M
L
s
t
r
i
n
g
.
T
h
e
X
M
L
c
a
n
c
o
n
t
a
i
n
b
o
t
10-86
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
h
s
t
r
u
c
t
u
r
e
d
d
a
t
a
a
n
d
i
m
a
g
e
s
,
w
i
t
h
s
t
r
u
c
t
u
r
e
d
o
r
s
e
m
i
-
s
t
r
u
c
t
u
r
e
d
10-87
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
P
y
t
h
o
n
o
b
j
e
c
t
s
f
i
r
s
t
,
f
o
l
l
o
w
e
d
b
y
t
h
e
i
m
a
g
e
o
r
i
m
a
g
e
s
g
e
n
e
r
a
t
e
d
b
y
10-88
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
t
h
e
P
y
t
h
o
n
f
u
n
c
t
i
o
n
.
• T
h
e
s
t
r
i
n
g
'
P
N
G
'
,
w
h
i
c
h
s
p
e
c
i
f
i
e
s
t
h
a
t
t
h
e
t
a
10-89
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
b
l
e
r
e
t
u
r
n
e
d
c
o
n
t
a
i
n
s
a
B
L
O
B
t
h
a
t
h
a
s
t
h
e
i
m
a
g
e
o
r
i
m
a
g
e
s
g
e
n
e
r
a
t
e
d
10-90
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
b
y
t
h
e
P
y
t
h
o
n
f
u
n
c
t
i
o
n
.
I
m
a
g
e
s
a
r
e
r
e
t
u
r
n
e
d
a
s
a
b
a
s
e
6
4
e
n
c
o
d
i
n
g
o
f
10-91
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
t
h
e
P
N
G
r
e
p
r
e
s
e
n
t
a
t
i
o
n
.
See
also:
Outp
ut
Form
ats
(Aut
ono
mou
s
Data
base
).
TIMES_NUM The
num
ber
of
time
s to
exec
ute
the
Pyth
on
scrip
t.
10-92
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
SCR_NAME The
nam
e of
a
user-
defin
ed
Pyth
on
funct
ion
in
the
OML
4Py
scrip
t
repo
sitor
y.
SCR_OWNER The
own
er of
the
regis
tered
Pyth
on
scrip
t.
The
defa
ult
value
is
NULL
. If
NULL
, will
sear
ch
for
the
Pyth
on
scrip
t in
the
user
’s
scrip
t
repo
sitor
y.
10-93
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Des
cript
ion
ENV_NAME The
nam
e of
the
cond
a
envir
onm
ent
that
shou
ld be
used
whe
n
runni
ng
the
nam
ed
user-
defin
ed
Pyth
on
funct
ion.
Example
Define the Python function fit_lm and store it with the name myFitMultiple in the script
repository. The function returns a pandas.DataFrame containing the index and prediction score
of the fitted model on the data sampled from scikit-learn’s IRIS dataset.
begin
sys.pyqScriptCreate('myFitMultiple',
'def fit_lm(i, sample_size):
from sklearn import linear_model
from sklearn.datasets import load_iris
import pandas as pd
import random
random.seed(10)
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width","Petal_Length","Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
dat = pd.concat([y, x], axis=1).sample(sample_size)
regr = linear_model.LinearRegression()
regr.fit(x.loc[:, ["Sepal_Length", "Sepal_Width", \
"Petal_Length"]],
10-94
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
x.loc[:,["Petal_Width"]])
sc = regr.score(dat.loc[:, ["Sepal_Length", "Sepal_Width", \
"Petal_Length"]],
dat.loc[:,["Petal_Width"]])
return pd.DataFrame([[i,sc]],columns=["id","score"])
',FALSE,TRUE); -- V_GLOBAL, V_OVERWRITE
end;
/
Issue a query that invokes the pyqIndexEval function. In the function, the PAR_LST argument
specifies the function argument sample_size. The OUT_FMT argument specifies a JSON string
that contains the column names and data types of the table returned by pyqIndexEval. The
TIMES_NUM parameter specifies the number of times to execute the script. The SCR_NAME
parameter specifies the user-defined Python function stored with the name myFitMultiple in
the script repository.
select *
from table(pyqIndexEval(
par_lst => '{"sample_size":80,
"oml_parallel_flag":true",
"oml_service_level":"MEDIUM"}',
out_fmt => '{"id":"number","score":"number"}',
times_num => 3,
scr_name => 'myFitMultiple'));
id score
---------- ----------
1 .943550631
2 .927836941
3 .937196049
3 rows selected.
The pyqGrant function grants read privilege access to an OML4Py datastore or to a script in
the OML4Py script repository.
Syntax
pyqGrant (
V_NAME VARCHAR2 IN
V_TYPE VARCHAR2 IN
V_USER VARCHAR2 IN DEFAULT)
Parameters
Parameter Description
V_NAME The name of an OML4Py datastore or a script in the OML4Py script repository.
10-95
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Description
V_TYPE For a datastore, the type is datastore; for script the type is pyqScript.
V_USER The name of the user to whom to grant access.
The pyqRevoke function revokes read privilege access to an OML4Py datastore or to a script in
the OML4Py script repository.
Syntax
pyqRevoke (
V_NAME VARCHAR2 IN
V_TYPE VARCHAR2 IN
V_USER VARCHAR2 IN DEFAULT)
10-96
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameters
Parameter Description
V_NAME The name of an OML4Py datastore or a script in the OML4Py script repository.
V_TYPE For a datastore, the type is datastore; for script the type is pyqScript.
V_USER The name of the user from whom to revoke access.
Syntax
sys.pyqScriptCreate (
V_NAME VARCHAR2 IN
V_SCRIPT CLOB IN
10-97
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Parameter Description
V_NAME A name for the user-defined Python function in the OML4Py script repository.
V_SCRIPT The definition of the Python function.
V_GLOBAL TRUE specifies that the user-defined Python function is public; FALSE specifies that
the user-defined Python function is private.
V_OVERWRITE If the script repository already has a user-defined Python function with the same
name as V_NAME, then TRUE replaces the content of that user-defined Python
function with V_SCRIPT and FALSE does not replace it.
BEGIN
sys.pyqScriptCreate('pyqFun2',
'def return_frame():
import numpy as np
import pickle
z = np.array([y for y in zip([str(x)+"demo" for x in range(10)],
[float(x)/10 for x in range(10)],
[x for x in range(10)],
[bool(x%2) for x in range(10)],
[pickle.dumps(x) for x in range(10)],
["test"+str(x**2) for x in range(10)])],
dtype=[("a", "U10"), ("b", "f8"), ("c", "i4"), ("d", "?"),
("e", "S20"), ("f", "O")])
return z');
END;
/
This example creates a global user-defined Python function named pyqFun2 in the script
repository and overwrites any existing user-defined Python function of the same name.
BEGIN
sys.pyqScriptCreate('pyqFun2',
'def return_frame():
import numpy as np
import pickle
z = np.array([y for y in zip([str(x)+"demo" for x in range(10)],
[float(x)/10 for x in range(10)],
[x for x in range(10)],
[bool(x%2) for x in range(10)],
[pickle.dumps(x) for x in range(10)],
["test"+str(x**2) for x in range(10)])],
dtype=[("a", "U10"), ("b", "f8"), ("c", "i4"), ("d", "?"),
("e", "S20"), ("f", "O")])
return z',
TRUE, -- Make the user-defined Python function global.
TRUE); -- Overwrite any global user-defined Python function
-- with the same name.
10-98
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
END;
/
This example creates a private user-defined Python function named create_iris_table in the
script repository.
BEGIN
sys.pyqScriptCreate('create_iris_table',
'def create_iris_table():
from sklearn.datasets import load_iris
import pandas as pd
iris = load_iris()
x = pd.DataFrame(iris.data, columns = ["Sepal_Length",\
"Sepal_Width","Petal_Length","Petal_Width"])
y = pd.DataFrame(list(map(lambda x: {0:"setosa", 1: "versicolor",\
2: "virginica"}[x], iris.target)),\
columns = ["Species"])
return pd.concat([y, x], axis=1)');
END;
/
NAME SCRIPT
-----------------
---------------------------------------------------------------------
create_iris_table def create_iris_table(): from sklearn.datasets
import load_iris ...
pyqFun2 def return_frame(): import numpy as np import
pickle ...
10-99
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Syntax
sys.pyqScriptDrop (
V_NAME VARCHAR2 IN
V_GLOBAL BOOLEAN IN DEFAULT
V_SILENT BOOLEAN IN DEFAULT)
Parameter Description
V_NAME A name for the user-defined Python function in the OML4Py script repository.
V_GLOBAL A BOOLEAN that specifies whether the user-defined Python function to drop is a
global or a private user-defined Python function. The default value is FALSE, which
indicates a private user-defined Python function. TRUE specifies that the user-
defined Python function is public.
V_SILENT A BOOLEAN that specifies whether to display an error message when
sys.pyqScriptDrop encounters an error in dropping the specified user-defined
Python function. The default value is FALSE.
BEGIN
sys.pyqScriptDrop('pyqFun2');
END;
/
This example drops the global user-defined Python function pyqFun2 from the script repository.
BEGIN
sys.pyqScriptDrop('pyqFun2', TRUE);
END;
/
10-100
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Topics:
• oml_async_flag Argument
The special control argument oml_async_flag determines if a job is run synchronously or
asynchronously. The default value is false.
• pyqJobStatus Function
Use the pyqJobStatus function to look up the status of an asynchronous job. If the job is
pending, it returns job is still running . If the job is completed, the function returns a
URL.
• pyqJobResult Function
Use the pyqJobResult function to return the job result.
• Asynchronous Job Example
The following examples shows how to submit asynchronous jobs with non-XML output and
with XML output.
10-101
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
In the following code, the Python function score_diff is defined and stored with the name
computeGradeDiff as a private function in the script repository. The function returns a
pandas.DataFrame after assigning a new DIFF column by computing the difference between
the SCORE and FINALGRADE column of the input data.
begin
sys.pyqScriptCreate('computeGradeDiff','def score_diff(dat):
import numpy as np
import pandas as pd
df = dat.assign(DIFF=dat.SCORE-dat.FINALGRADE)
return df
');
end;
/
select *
from table(pyqTableEval(
inp_nam => 'GRADE',
par_lst => '{"oml_async_flag":true}',
out_fmt => NULL,
scr_name => 'computeGradeDiff',
scr_owner => NULL
));
The VALUE column of the result contains a URL containing the job ID of the asynchronous job:
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
https://<host_name>/oml/tenants/<tenant_name>/databases/
<database_name>/api/py-scripts/v1/jobs/<job_id>
1 row selected.
10-102
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Syntax
FUNCTION PYQSYS.pyqJobStatus(
job_id VARCHAR2
)
RETURN PYQSYS.pyqClobSet
Parameters
Parameter Description
job_id The ID of the asynchronous job.
Example
The following example shows a pyqJobStatus call and its output.
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
1 row selected.
Syntax
FUNCTION PYQSYS.pyqJobResult(
job_id VARCHAR2,
out_fmt VARCHAR2 DEFAULT 'JSON'
)
RETURN SYS.AnyDataSet
Parameters
Parameter Description
job_id The ID of the asynchronous job.
10-103
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Example
The following example shows a pyqJobResult call and its output.
10 rows selected.
Non-XML Output
When submitting asynchronous jobs, for JSON, PNG and relational outputs, set the OUT_FMT
argument to NULL when submitting the job. When fetching the job result, specify OUT_FMT in the
pyqJobResult call.
This example uses the IRIS table created in the example shown in the pyqTableEval Function
(Autonomous Database) topic and the linregrPredict script created in the example shown in
the pyqRowEval Function (Autonomous Database) topic.
Issue a pyqGroupEval function call to submit an asynchronous job. In the function, the INP_NAM
argument specifies the data in the IRIS table to pass to the function.
The PAR_LST argument specifies submitting the job asynchronously with the special control
argument oml_async_flag, capturing the images rendered in the script with the special control
argument oml_graphics_flag, passing the input data as a pandas.DataFrame with the special
control argument oml_input_type, along with values for the function arguments modelName and
datastoreName.
10-104
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
The SCR_NAME parameter specifies the user-defined Python function stored with the name
linregrPredict in the script repository.
The asynchronous call returns a job status URL in CLOB, you can call set long [length] to
get the full URL.
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
https://<host name>/oml/tenants/<tenant name>/databases/<database
name>/api/py-scripts/v1/jobs/<job id>
1 row selected.
Run a SELECT statement that invokes the pyqJobStatus function, which returns a resource URL
containing the job ID when the job result is ready.
NAME
----------------------------------------------------------------------
VALUE
----------------------------------------------------------------------
job is still running
1 row selected.
NAME
------------------------------------------------------------------------------
--
10-105
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
VALUE
------------------------------------------------------------------------------
--
https://<host name>/oml/tenants/<tenant name>/databases/<database
name>/api/py-scripts/v1/jobs/<job id>/result
1 row selected.
In the OUT_FMT argument, the string 'PNG' specifies to include both return value and images
(titles and image bytes) in the result.
10-106
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
8319563387},{"S 6641686B6941414
pecies":"versic 141416C7753466C
7A4141415059514
141443245427144
2B6E61514141414
468305256683055
32396D644864686
36D554162574630
634778766447787
0596942325A584A
7A615739754D793
4784C6A49734947
3 rows selected.
XML Ouput
If XML output is expected from the asynchronous job, set the OUT_FMT argument to 'XML' when
submitting the job and fetching the job result.
This example uses the script myFitMultiple created in the example shown in the
pyqIndexEval Function (Autonomous Database) topic.
Issue a pyqIndexEval function call to submit an asynchronous job. In the function, the PAR_LST
argument specifies submitting the job asynchronously with the special control argument
oml_async_flag, along with values for the function arguments sample_size.
The asynchronous call returns a job status URL in CLOB, you can call set long [length] to
get the full URL.
10-107
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
https://<host name>/oml/tenants/<tenant name>/databases/<database
name>/api/py-scripts/v1/jobs/<job id>
1 row selected.
Run a SELECT statement that invokes the pyqJobStatus function, which returns a resource URL
containing the job id when the job result is ready.
NAME
----------------------------------------------------------------------
VALUE
----------------------------------------------------------------------
job is still running
1 row selected.
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
https://<host name>/oml/tenants/<tenant name>/databases/<database
name>/api/py-scripts/v1/jobs/<job id>/result
1 row selected.
10-108
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
In the OUT_FMT argument, the string 'XML' specifies that the table returned contains a CLOB
that is an XML string.
NAME
----------------------------------------------------------------------
VALUE
----------------------------------------------------------------------
1
<root><pandas_dataFrame><ROW-pandas_dataFrame><id>1</id><score>0.94355
0631313753</score></ROW-pandas_dataFrame></pandas_dataFrame></root>
2
<root><pandas_dataFrame><ROW-pandas_dataFrame><id>2</id><score>0.92783
6941437123</score></ROW-pandas_dataFrame></pandas_dataFrame></root>
3
<root><pandas_dataFrame><ROW-pandas_dataFrame><id>3</id><score>0.93719
6049031545</score></ROW-pandas_dataFrame></pandas_dataFrame></root>
3 rows selected.
10-109
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Examples
• Input data is pandas.DataFrame:
10-110
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
• Relational
• XML
• PNG
• Asynchronous Mode Output
JSON
When OUT_FMT is set to JSON, the table functions return a table containing a CLOB that is a
JSON string.
The following example invokes the pyqEval function on the 'pyqFun1' created in the pyqEval
function section.
SQL> select *
from table(pyqEval(
par_lst => '{"oml_service_level":"MEDIUM"}',
out_fmt => 'JSON',
scr_name => 'pyqFun1'));
NAME
----------------------------------------------------------------------
VALUE
----------------------------------------------------------------------
[{"FLOAT":0,"ID":0,"NAME":"demo_0"},{"FLOAT":0.1,"ID":1,"NAME":"demo_1
"},{"FLOAT":0.2,"ID":2,"NAME":"demo_2"},{"FLOAT":0.3,"ID":3,"NAME":"de
mo_3"},{"FLOAT":0.4,"ID":4,"NAME":"demo_4"},{"FLOAT":0.5,"ID":5,"NAME"
:"demo_5"},{"FLOAT":0.6,"ID":6,"NAME":"demo_6"},{"FLOAT":0.7,"ID":7,"N
AME":"demo_7"},{"FLOAT":0.8,"ID":8,"NAME":"demo_8"},{"FLOAT":0.9,"ID":
9,"NAME":"demo_9"}]
1 row selected.
Relational
When OUT_FMT is specified with a JSON string where column names are mapped to column
types, the table functions return the response by reshaping it into table columns.
For example, if OUT_FMT is specified with {"NAME":"varchar2(7)", "DIFF":"number"}, the
output should contain a NAME column of type VARCHAR2(7) and a DIFF column of type NUMBER.
The following example uses the table GRADE and the script 'computeGradeDiff' (created in
Asynchronous Jobs (Autonomous Database) and invokes the computeGradeDiff function:
SQL> select *
from table(pyqTableEval(
inp_nam => 'GRADE',
par_lst => '{"oml_input_type":"pandas.DataFrame"}',
out_fmt => '{"NAME":"varchar2(7)","DIFF":"number"}',
scr_name => 'computeGradeDiff'));
NAME DIFF
------- ----------
Abbott 3
Branfor -5
10-111
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
Crandel 10
Denniso 13
Edgar 9
Faust 5
Greeley -9
Hart 4
Isley 2
Jasper 8
10 rows selected.
XML
When OUT_FMT is specified with XML, the table functions return the response in a table with fixed
columns. The output consists of two columns. The NAME column contains the name of the row.
The NAME column value is NULL for pyqEval, pyqTableEval,pyqRowEval function returns. For
pyqGroupEval, pyqIndexEval, the NAME column value is the group/index name. The VALUE
column contains the XML string.
The XML can contain both structured data and images, with structured or semi-structured
Python objects first, followed by the image or images generated by the Python function.
Images are returned as a base 64 encoding of the PNG representation. To include images in
the XML string, the special control argument oml_graphics_flag must be set to true.
In the following code, the python function gen_two_images is defined and stored with name
plotTwoImages in the script repository. The function renders two subplots with random dots in
red and blue color and returns the number of columns of the input data.
begin
sys.pyqScriptCreate('plotTwoImages','def gen_two_images (dat):
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(22)
fig = plt.figure(1);
fig2 = plt.figure(2);
ax = fig.add_subplot(111);
ax.set_title("Random red dots")
ax2 = fig2.add_subplot(111);
ax2.set_title("Random blue dots")
ax.plot(range(100), np.random.normal(size=100), marker = "o",
color = "red", markersize = 2)
ax2.plot(range(100,0,-1), marker = "o", color = "blue",
markersize = 2)
return dat.shape[1]
',FALSE,TRUE);
end;
/
The following example shows the XML output of a pyqRowEval function call where both
structured data and images are included in the result:
SQL> select *
from table(pyqRowEval(
inp_nam => 'GRADE',
par_lst => '{"oml_graphics_flag":true}',
10-112
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
NAME
------------------------------------------------------------------------------
--
VALUE
----------------------------------------------------------------------
1
<root><Py-data><int>7</int></Py-data><images><image><img src="data:ima
ge/pngbase64"><![CDATA[iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAA
ABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADh0RVh0U29mdHdhcmUAb
WF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAA
gAElEQVR4nOydeZwcVb32n549k0xCSMhGEohhEZFNUAEBE0UUIYOACG4gFxWvgGzqldf3s
lz1xYuKLBe3i7LcNyhctoxsviCJoAQFNAKCCLITQyCQbZJMZqb
2
<root><Py-data><int>7</int></Py-data><images><image><img src="data:ima
ge/pngbase64"><![CDATA[iVBORw0KGgoAAAANSUhEUgAAAoAAAAHgCAYAAAA10dzkAAA
ABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADh0RVh0U29mdHdhcmUAb
WF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAA
gAElEQVR4nOydeZwcVb32n549k0xCSMhGEohhEZFNUAEBE0UUIYOACG4gFxWvgGzqldf3s
lz1xYuKLBe3i7LcNyhctoxsviCJoAQFNAKCCLITQyCQbZJMZqb
2 rows selected
PNG
When OUT_FMT is specified with PNG, the table functions return the response in a table with fixed
columns (including an image bytes column). When calling the SQL API, you must set the
special control argument oml_graphics_flag to true so that the web server can capture
images rendered in the executed script.
The PNG output consists of four columns. The NAME column contains the name of the row. The
NAME column value is NULL for pyqEval and pyqTableEval function returns. For pyqRowEval,
pyqGroupEval, pyqIndexEval, the NAME column value is the chunk/group/index name. The ID
column indicates the ID of the image. The VALUE column contains the return value of the
executed script. The TITLE column contains the titles of the rendered PNG images. The IMAGE
column is a BLOB column containing the bytes of the PNG images rendered by the executed
script.
The following example shows the PNG output of a pyqRowEval function call.
10-113
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
scr_owner =>NULL
));
4 rows selected.
10-114
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
The following example shows how to get the JSON output from an asynchronous
pyqIndexEval function call:
SQL> select *
from table(pyqGroupEval(
inp_nam => 'GRADE',
par_lst => '{"oml_async_flag":true, "oml_graphics_flag":true}',
out_fmt => NULL,
grp_col => 'GENDER',
ord_col => NULL,
scr_name => 'inp_twoimgs',
scr_owner => NULL
));
NAME
--------------------------------------------------------------------
VALUE
--------------------------------------------------------------------
1 row selected.
NAME
--------------------------------------------------------------------
VALUE
--------------------------------------------------------------------
10-115
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
1 row selected.
10-116
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
4784C6A49734947
4 rows selected
SQL> select *
from table(pyqIndexEval(
10-117
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
https://<host name>/oml/tenants/<tenant name>/databases/<database
name>/api/py-scripts/v1/jobs/<job id>
1 row selected.
2
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
1 row selected.
2 3 4
NAME
------------------------------------------------------------------------------
--
VALUE
------------------------------------------------------------------------------
--
1
<root><pandas_dataFrame><ROW-
pandas_dataFrame><ID>1</ID><RES>a</RES></ROW-pandas
_dataFrame></pandas_dataFrame></root>
10-118
Chapter 10
SQL API for Embedded Python Execution with Autonomous Database
2
<root><pandas_dataFrame><ROW-
pandas_dataFrame><ID>2</ID><RES>b</RES></ROW-pandas
_dataFrame></pandas_dataFrame></ro
3
<root><pandas_dataFrame><ROW-
pandas_dataFrame><ID>3</ID><RES>c</RES></ROW-pandas
_dataFrame></pandas_dataFrame></root>
3 rows selected
10-119
11
Administrative Tasks for Oracle Machine
Learning for Python
If you find that your Python process is consuming too many of your machine's resources, or
causing your machine to crash, you can get information about, or set limits for, the resources
Python is using.
The Python system and process utilities library psutil is a cross-platform library for retrieving
information on running processes and system utilization, such as CPU, memory, disks,
network, and sensors, in Python. It is useful for system monitoring, profiling, limiting process
resources, and the management of running processes.
The function psutil.Process.rlimit gets or sets process resource limits. In psutil, process
resource limits are constants with names beginning with psutil.RLIMIT_. Each resource is
controlled by a soft limit and hard limit tuple.
For example, psutil.RLIMIT_AS represents the maximum size (in bytes) of the virtual memory
(address space) used by the process. The default limit of psutil.RLIMIT_AS can be -1
(psutil.RLIM_INFINITY). You can lower the resource limit of psutil.RLIMIT_AS to prevent
your Python program from loading too much data into memory, as shown in the following
example.
Example 11-1 Resource Control with psutil.RLIMIT_AS
import psutil
import numpy as np
# Check the constant used to represent the limit for an unlimited resource.
psutil.RLIM_INFINITY
11-1
Chapter 11
{}'.format(cur_soft, cur_hard))
11-2
Chapter 11
>>>
>>> # Display the Virtual Memory Size of the current process.
... p.memory_info().vms
413175808
>>>
>>> # Get the process resource limit RLIMIT_AS.
... soft, hard = p.rlimit(psutil.RLIMIT_AS)
>>> print('Original resource limits of RLIMIT_AS (soft/hard): {}/
{}'.format(soft, hard))
Original resource limits of RLIMIT_AS (soft/hard): -1/-1
>>>
>>> # Check the constant used to represent the limit for an unlimited
resource.
... psutil.RLIM_INFINITY
-1
>>>
>>> # Set the resource RLIMIT_AS (soft, hard) limit to (1GB, 2GB).
... p.rlimit(psutil.RLIMIT_AS, (pow(1024,3)*1, pow(1024,3)*2))
>>>
>>> # Get the current resource limit of RLIMIT_AS.
... cur_soft, cur_hard = p.rlimit(psutil.RLIMIT_AS)
>>> print('Current resource limits of RLIMIT_AS (soft/hard): {}/
{}'.format(cur_soft, cur_hard))
Current resource limits of RLIMIT_AS (soft/hard): 1073741824/2147483648
>>>
>>> # Define a list of sizes to be allocated in MB (megabytes).
... sz = [100, 200, 500, 1000]
>>>
>>> # Define a megabyte variable in bytes.
... MB = 1024*1024
>>>
>>> # Allocate an increasing amount of data.
... for val in sz:
... stmt = "Allocate %s MB " % val
... try:
... print("virtual memory: %d MB" % int(p.memory_info().vms/MB))
... m = np.arange(val*MB/8, dtype="u8")
... print(stmt + " Success.")
... except:
... print(stmt + " Fail.")
... raise
...
virtual memory: 394 MB
Allocate 100 MB Success.
virtual memory: 494 MB
Allocate 200 MB Success.
virtual memory: 594 MB
Allocate 500 MB Fail.
Traceback (most recent call last):
File "<stdin>", line 6, in <module>
MemoryError
>>>
>>> # Delete the allocated variable.
... del m
>>>
>>> # Raise the soft limit of RLIMIT_AS to 2GB.
11-3
Chapter 11
11-4
Index
Numerics classes (continued)
machine learning, 8-2
3rd party package, 5-13 oml.ai, 8-18
3rd party packages, 5-9 oml.ar, 8-21
oml.automl.AlgorithmSelection, 9-6
oml.automl.FeatureSelection, 9-8
A oml.automl.ModelSelection, 9-15
ADMIN, 5-9 oml.automl.ModelTuning, 9-11
algorithm selection class, 9-6 oml.dt, 8-11, 8-27
algorithms oml.em, 8-34
Apriori, 8-21 oml.esa, 8-48
attribute importance, 8-18 oml.glm, 8-53
Automated Machine Learning, 9-1 oml.graphics, 7-31
Automatic Data Preparation, 8-11 oml.km, 8-63
automatically selecting, 9-15 oml.nb, 8-69
Decision Tree, 8-27 oml.nn, 8-77
Expectation Maximization, 8-34 oml.rf, 8-86
Explicit Semantic Analysis, 8-48 oml.svd, 8-94
Generalized Linear Model, 8-53 oml.svm, 8-100
k-Means, 8-63 classification algorithm, 8-86
machine learning, 8-2 classification models, 8-11, 8-27, 8-53, 8-69, 8-77,
Minimum Description Length, 8-18 8-86, 8-100
Naive Bayes, 8-69 client
Neural Network, 8-77 installing for Linux for Autonomous Database,
Random Forest, 8-86 2-1
settings common to all, 8-4 installing for Linux on-premises, 3-18
Singular Value Decomposition, 8-94 clustering models, 8-34, 8-48, 8-63
Support Vector Machine, 8-100 conda enviroment, 5-9
ALL_PYQ_DATASTORE_CONTENTS view, 10-7 connection
ALL_PYQ_DATASTORES view, 10-8 creating a on-premises database, 6-4
ALL_PYQ_SCRIPTS view, 10-9 functions, 6-2
anomaly detection models, 8-100 control arguments, 10-12
Apriori algorithm, 8-21 convert Python to SQL, 1-4
attribute importance, 8-18 creating
Automated Machine Learning proxy objects, 6-13, 6-16
about, 9-1 cx_Oracle package, 6-2
Automatic Data Preparation algorithm, 8-11 cx_Oracle.connect function, 6-2
Automatic Machine Learning
connection parameter, 6-2 D
Autonomous Database, 6-1
data
about moving, 6-9
C exploring, 7-17
classes filtering, 7-13
Automated Machine Learning, 9-1 preparing, 7-1
GlobalFeatureImportance, 8-12 selecting, 7-3
Index-1
Index
Index-2
Index
Linux (continued) N
requirements, 3-1
uninstalling on-premises client for, 3-22 Naive Bayes model, 8-69
uninstalling on-premises server for, 3-16 Neural Network model, 8-77
Linux for Autonomous Database
installing client for, 2-1
Linux on-premises
O
installing client for, 3-18 oml_input_type argument, 10-12
installing Oracle Instant Client for, 3-17 oml_na_omit argument, 10-12
installing server for, 3-6, 3-11 oml.ai class, 8-18
supporting packages for, 3-4 oml.ar class, 8-21
oml.automl.AlgorithmSelection class, 9-6
M oml.automl.FeatureSelection class, 9-8
oml.automl.ModelSelection class, 9-15
machine learning oml.automl.ModelTuning class, 9-11
classes, 8-2 oml.boxplot function, 7-31
methods oml.check_embed function, 6-2, 6-4
drop, 7-13 oml.connect function, 6-2, 6-4
drop_duplicates, 7-13 oml.create function, 6-16
dropna, 7-13 oml.cursor function, 6-9, 6-16
for exploring data, 7-17 oml.dir function, 6-9, 6-13
for preparing data, 7-1 oml.disconnect function, 6-2, 6-4
pull, 6-11 oml.do_eval function, 10-14
Minimum Description Length algorithm, 8-18 oml.drop function, 6-16
model selection, 9-15 oml.ds.delete function, 6-28
model tuning, 9-11 oml.ds.describe function, 6-27
models oml.ds.dir function, 6-25
association rules, 8-21 oml.ds.load function, 6-24
attribute importance, 8-18 oml.ds.save function, 6-21
Decision Tree, 8-11, 8-27 oml.dt class, 8-11, 8-27
Expectation Maximization, 8-34 oml.em class, 8-34
explainability, 8-12 oml.esa class, 8-48
Explicit Semantic Analysis, 8-48 oml.glm class, 8-53
exporting and importing, 8-7 oml.grant function, 6-30
for anomaly detection, 8-100 oml.graphics class, 7-31
for classification, 8-11, 8-27, 8-53, 8-69, 8-77, oml.group_apply function, 10-18
8-86, 8-100 oml.hist function, 7-31
for clustering, 8-34, 8-63 oml.index_apply function, 10-26
for feature extraction, 8-48, 8-94 oml.isconnected function, 6-2, 6-4
for regression, 8-53, 8-77, 8-100 oml.km class, 8-63
Generalized Linear Model, 8-53 oml.nb class, 8-69
k-Means, 8-63 oml.nn class, 8-77
Naive Bayes, 8-69 oml.push function, 6-9
Neural Network, 8-77 oml.revoke function, 6-30
parametric, 8-53 oml.rf class, 8-86
persisting, 8-2 oml.row_apply function, 10-22
Random Forest, 8-86 oml.script.create function, 10-28
Singular Value Decomposition, 8-94 oml.script.dir function, 10-32
Support Vector Machine, 8-100 oml.script.drop function, 10-34
moving data oml.script.load function, 10-33
about, 6-9 oml.set_connection function, 6-2, 6-4
to a local Python session, 6-11 oml.svd class, 8-94
to the database, 6-9 oml.svm class, 8-100
oml.sync function, 6-13
oml.table_apply function, 10-15
Index-3
Index
Index-4
Index
Index-5