Db2 For Linux, Unix, and Windows - Version 11+ Highlights
Db2 For Linux, Unix, and Windows - Version 11+ Highlights
Db2 For Linux, Unix, and Windows - Version 11+ Highlights
Actionable Insights
Continuous Availability
Massive Scalability
Outthink the Possible
Db2 for Linux, Unix, and Windows: Version 11 Highlights
Copyright © 2017 by International Business Machines Corporation (IBM).
All rights reserved. Printed in Canada. Except as permitted under the Copyright Act of
1976, no part of this publication may be reproduced or distributed in any form or by any
means, or stored in a database or retrieval system, without the prior written permission
of IBM, with the exception that the program listings may be entered, stored, and
executed in a computer system, but they may not be reproduced for publication.
The contents of this book represent those features that may or may not be available in
the current release of any products mentioned within this book despite what the book
may say. IBM reserves the right to include or exclude any functionality mentioned in this
book for the current release of Db2 11.1, or a subsequent release. In addition, any
claims made in this book are not official communications by IBM; rather, they are
observed by the authors in unaudited testing and research. The views expressed in
this book is those of the authors and not necessarily those of the IBM Corporation; both
are not liable for any of the claims, assertions, or contents in this book.
IBM's statements regarding its plans, directions, and intent are subject to change
or withdrawal without notice and at IBM's sole discretion.
Information regarding potential future products is intended to outline our general
product direction and it should not be relied on in making a purchasing decision. The
information mentioned regarding potential future products is not a commitment,
promise, or legal obligation to deliver any material, code or functionality. Information
about potential future products may not be incorporated into any contract.
The development, release, and timing of any future feature or functionality described
for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM
benchmarks in a controlled environment. The actual throughput or performance that
any user will experience will vary depending upon many factors, including
considerations such as the amount of multiprogramming in the user's job stream,
the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar
to those stated here.
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by
GSA ADP Schedule Contract with IBM. Information in this eBook (including information
relating to products that have not yet been announced by IBM) has been reviewed for
accuracy as of the date of initial publication and could include unintentional technical or
typographical errors. IBM shall have no responsibility to update this information. THIS
DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR
IMPLIED. IN NO EVENT SHALL IBM BE LIABLE FOR ANY DAMAGE ARISING FROM THE USE
OF THIS INFORMATION, INCLUDING BUT NOT LIMITED TO, LOSS OF DATA, BUSINESS
INTERRUPTION, LOSS OF PROFIT OR LOSS OF OPPORTUNITY.
IBM products and services are warranted according to the terms and conditions of the
agreements under which they are provided.
References in this document to IBM products, programs, or services does not imply that
IBM intends to make such products, programs or services available in all countries in
which IBM operates or does business.
Information concerning non-IBM products was obtained from the suppliers of those
products, their published announcements or other publicly available sources. IBM has
not tested those products in connection with this publication and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the suppliers
of those products. IBM does not warrant the quality of any third-party products, or the
ability of any such third-party products to interoperate with IBM's products. IBM
EXPRESSLY DISCLAIMS ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
The provision of the information contained herein is not intended to, and does not,
grant any right or license under any IBM patents, copyrights, trademarks or other
intellectual property right.
IBM, the IBM logo, ibm.com, Aspera®, Bluemix, Blueworks Live, CICS, Clearcase,
Cognos®, DOORS®, Emptoris®, Enterprise Document Management System™, FASP®,
FileNet®, Global Business Services ®, Global Technology Services ®, IBM
ExperienceOne™, IBM SmartCloud®, IBM Social Business®, Information on Demand,
ILOG, Maximo®, MQIntegrator®, MQSeries®, Netcool®, OMEGAMON, OpenPower,
PureAnalytics™, PureApplication®, pureCluster™, PureCoverage®, PureData®,
PureExperience®, PureFlex®, pureQuery®, pureScale®, PureSystems®, QRadar®,
Rational®, Rhapsody®, Smarter Commerce®, SoDA, SPSS, Sterling Commerce®, StoredIQ,
Tealeaf®, Tivoli®, Trusteer®, Unica®, urban{code}®, Watson, WebSphere®, Worklight®, X-
Force® and System z® Z/OS, are trademarks of International Business Machines
Corporation, registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies. A
current list of IBM trademarks is available on the Web at "Copyright and trademark
information" at: www.ibm.com/legal/copytrade.shtml.
All trademarks or copyrights mentioned herein are the possession of their respective
owners and IBM makes no claim of ownership by the mention of products that contain
these marks.
Initial Publication: October 23, 2016
Updated: May 16, 2017
Updated: June 20, 2017
Updated: October 2, 2017
Revisions
Forward
The new economy is changing the way we gather information, manage
data, gain insights, reinvent our businesses, and do so quickly and
iteratively. The digital transformation of everything is changing the
information value chain not just from your own organization, but from
stakeholders and third-party data providers. And with this new
information and in context from which they're gathered, we gain a
deeper understanding of client behaviors and market conditions, and
formulate faster responses to changing competitive landscape.
The advent of cloud and mobile computing further encourages the
distribution and consumption of information at a rapid pace, which in
turn generates more information to analyze.
Therefore, this new economy requires a robust database software that
sits at the heart of an enterprise to support its transformation in this new
era of digital, cloud and cognitive.
Db2 is a versatile and scalable database that can support your
transactional and analytical workloads for different applications whether
on premises or in the cloud up to petabytes in volume. It is easy to
deploy, upgrade and maintain and at the same time it simplifies the move
from Oracle databases by leveraging the SQL Compatibility component
along with your existing skills and assets with minimal changes to the
application.
Db2's pureScale capability ensures that your business data is always
available and accessible without any interruptions to your business
processes, ensuring that your consumers are not impacted wherever they
may be.
With BLU MPP, you leverage multi parallel processing with in-memory
columnar technology to provide extreme performance enhancements for
your analytic workloads.
Finally, Db2 provides the flexibility of deploying on premise or on the
cloud with integrated security to ensure that your new business process
applications and data are where they make sense, near your consumers.
Several other distributed database offerings are available from IBM in a
managed environment, hosted environment, virtualized, using your own
data center, or on an appliance:
• Db2 Warehouse (Local Docker Container)
• Db2 Warehouse on Cloud (Managed Warehouse Service)
• Db2 on Cloud (Managed OLTP Service)
• Db2 Hosted
• Informix
• Informix on Cloud
• PureData System for Analytics
• IBM Integrated Analytics System
• Cloudant
• IBM Db2 Event Store
No matter what your requirements are, IBM has a database offering that
will meet your needs.
Packaging, Installation, and Administration .................................................. 2
End of Service and Marketing ............................................................................... 2
New Included Features ......................................................................................... 3
Federation ......................................................................................................... 3
Encryption ......................................................................................................... 3
pureScale Administration Feature ..................................................................... 4
Db2 Express and Db2 Express Community Editions .............................................. 6
Db2 Workgroup and Enterprise Editions ............................................................... 7
Advanced Workgroup and Advanced Enterprise Editions .................................... 7
Additional Products for Db2 .................................................................................. 8
Advanced Recovery Feature .............................................................................. 8
Db2 Performance Management Offering ......................................................... 8
Db2 Direct Editions ................................................................................................ 9
Db2 Developer-C (Community) Edition ........................................................... 10
Db2 Developer Community Edition ................................................................. 11
Operating System Support .................................................................................. 11
Virtualization Support ......................................................................................... 12
Upgrade Support ................................................................................................. 13
Administration Improvements ............................................................................ 14
Range Partition Reorganization ...................................................................... 14
ADMIN_MOVE_TABLE .................................................................................... 14
Remote Storage .............................................................................................. 15
Db2 History File Backup .................................................................................. 15
Hardware Backup Compression ...................................................................... 15
Db2 BLU Enhancements .............................................................................. 18
Massively Parallel Processing Architecture ......................................................... 19
Fast Communication Manager ............................................................................ 20
Faster SQL MERGE processing ............................................................................. 21
Nested Loop Join Support ................................................................................... 21
Sort Processing Enhancements ........................................................................... 22
Query Rewrite Enhancements ............................................................................ 23
Push-down of OLAP functions ............................................................................. 24
SQL Functions Optimized for BLU ....................................................................... 24
Automatic Dictionary Creation ............................................................................ 25
Continuous Availability ................................................................................ 27
pureScale Architecture ........................................................................................ 27
Deployments ....................................................................................................... 28
Virtualization ....................................................................................................... 31
Installation ........................................................................................................... 31
Workload Balancing ............................................................................................ 32
Contents ix
Acknowledgments
We want to thank the following people who, in one way or another,
contributed to this book:
• Michael Springgay, Kelly Schlamb, Paul Bird, Matt Huras, Calisto
Zuzarte, John Hornibrook, Greg Stager, and Alan Lee
We would also like to thank all the development team for helping to
deliver this release given the tremendous deadlines and constraints that
they have been under.
The contents of this eBook are the result of a lot of research and testing
based on the contents of our Db2 11.1 Knowledge Center. The authors of
our online documentation deserve special thanks for getting the details
to us early in the product development cycle so we could build much of
our material.
For the most up-to-date information on Db2 11.1 features, please
refer to the IBM Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSEPGG
There you can select the Version 11.1.0 release from the drop-down
menu and get more details on the features we explore in this eBook.
Introduction
About This Book
The Db2 11.1 release delivers several significant enhancements including
Database Partitioning Feature (DPF) for BLU columnar technology,
improved pureScale performance and further High Availability Disaster
Recovery (HADR) support, and numerous
SQL features.
This book was written to highlight many of the new features and
functions that are now available in this release, without you having to
search through various forums, blogs, and online manuals. We hope that
this book gives you more insight into what you can now accomplish with
Db2 11.1, and include it on your shortlist of databases to deploy, whether
it is on premise, in the cloud, or a hybrid approach.
George and Enzo
How This Book Is Organized
We organized this book into 11 chapters that cover many of the
highlights and key features found in the Db2 11.1 release.
• Chapter 1 gives you an overview of the new packaging changes in
Db2 11.1, along with pre-requisites for installing the product
• Chapter 2 discusses the enhancements to Db2 BLU, including
support for BLU columnar tables in a DPF (Database Partitioning
Feature) environment
• Chapter 3 examines what pureScale capabilities provide you in a
production environment, along with all the enhancements that
have been made to simplify installation and management of
a cluster
• Chapter 4 through 12 are focused on SQL and compatibility
enhancements
1
Packaging,
Installation, and
Administration
Federation
Db2 11.1 now includes data federation and virtualization through a
robust SQL interface and a full relational database engine designed for
global optimization of data access. This capability offers homogeneous or
heterogeneous federation (depending on the Db2 Edition) between
select IBM and non-IBM data sources. You can quickly prototype and
implement solutions involving disparate data by virtualizing access and
optimizing performance. Data can remain in place for both transactional
as well as analytical workloads.
Figure 1: Federation Support
Db2 Advanced Server Editions include all of the wrappers that are part of
the federation support. The Db2 Workgroup and Enterprise Edition
products only include support for connecting to Db2 and Informix data
sources.
Encryption
Db2 included encryption at the database level in Version 10.5. However,
the encryption feature required the use of a local keystore to contain the
Chapter 1: Packaging, Installation, and Administration 4
master key for the database. This required that the keystore itself be
backed up and managed independently from the database backup.
Figure 2: Encryption Support
The limited two-member pureScale cluster configuration enables one
member to process application workloads, and a second member to
provide continuous availability when the first member is undergoing
planned or unplanned outages. In addition, the second member can also
be used to perform administrative tasks and utilities, thus off-loading
these tasks from the primary member.
The first member is fully licensed, while the second member only
requires minimal licensing similar to an HADR environment (licensed as
warm standby, e.g. 100 PVUs or 1 VPC).
Figure 3: pureScale 2-node configuration
Both licenses are sold as a monthly license charge using the Virtual
Processor Core (VPC) sold as a monthly license charge. Customers can
choose to maintain these databases with either:
• Long Term Support Release (LTSR) support
• Continuous Delivery (CD) support
LTSR support will give the customer fix packs only with no additional
functionality added to the product. CD support will contain the fixes
found in the LTSR stream and may also contain new functionality and
enhancements.
Virtualization Support
Db2 11.1 supports a number of virtualization environments. Virtualization
can improve the utilization of processor/machine resources and also give
the customer an environment to test systems without having to invest in
entire systems. The following is a list of virtualized environments that
Db2 supports:
• IBM System z
o IBM Processor Resource/System Manager
o z/VM and z/KVM on IBM System z
• IBM Power
o IBM PowerVM and PowerKVM and IBM Workload
Partitions on IBM Power Systems
• Linux X86-64 Platforms
o Red Hat KVM
Chapter 1: Packaging, Installation, and Administration 13
o SUSE KVM
• VMWare ESXi
• Docker container support – Linux only
• Microsoft
o Hyper-V
o Microsoft Windows Azure on x86-64 Windows
Platforms only
• pureScale support on Power VM/KVM, VMWare, and KVM
You will note that pureScale is supported in a virtualized environment
and that ROCE Adapters can now be virtualized in a VMWare
environment (initially) so that multiple members can share the same
hardware adapter.
Upgrade Support
Upgrading to Db2 11.1 is supported from Db2 10.5, Db2 10.1, or Db2 9.7.
If you have an earlier version of Db2, you must first upgrade to Db2 10.5,
Db2 10.1, or Db2 9.7 before upgrading to Db2 11.1.
• Upgrading to a Db2 11.1 non-root installation is supported from a
Db2 10.5, Db2 10.1, or Db2 9.7 non-root installation. Upgrading to
a Db2 11.1 non-root installation from a pre-Db2 11.1 root
installation is not supported.
• Instance bit size is determined by the operating system where
Db2 11.1 is installed, and support for 32-bit kernels and 64-bit
kernels has changed.
• Upgrading from a system with multiple copies of Db2 10.5, Db2
10.1, or Db2 9.7 is supported. On Windows operating systems,
you must be aware of the restrictions on coexistence of previous
versions of the Db2 database products.
• Upgrading from a partitioned database environment with multiple
database partitions is supported.
Restoring full database offline backups from pre-Db2 11.1 copies is
supported. However, rolling forward of logs from a previous level is not
possible. Review Backup and restore operations between different
operating systems and hardware platforms for complete details about
upgrade support using the RESTORE DATABASE command.
Chapter 1: Packaging, Installation, and Administration 14
Administration Improvements
Db2 11.1 includes a number of administrative enhancements that
customers will find useful. Five of them that are highlighted here include:
• Range Partition Table Reorganization
• ADMIN_MOVE_TABLE improvements
• Remote Storage Option
• Db2 History File Backup
• Backup Compression Hardware Acceleration
ADMIN_MOVE_TABLE
There are two new options in the ADMIN_MOVE_TABLE command:
• REPORT
• TERM
The REPORT option can be used to monitor the progress of table moves.
The command calculates a set of values to monitor the progress of single
or multiple table moves. The focus is on the COPY and REPLAY phase of a
running table move.
The REPORT option requires a table schema and name to get information
on a specific table move. If these values are left blank or NULL,
information on all outstanding table moves is retrieved.
The TERM option can be used to terminate a table move in progress.
TERM will force off the application running the table move, roll back all
Chapter 1: Packaging, Installation, and Administration 15
Remote Storage
Db2 11.1 delivers more flexibility and options for acquiring, sharing and
storing data files and backup images, by allows customers to use remote
storage for several Db2 utilities:
• INGEST, LOAD, BACKUP, and RESTORE
Db2 supports remote storage using storage aliases for:
• IBM® SoftLayer® Object Storage
• Amazon Simple Storage Service (S3)
Allowing for existing customers that are using the built-in Db2 backup
compression (i.e. software compression) to benefit without any changes
to scripts/jobs.
Once the registry variable is set, using the following backup command
format will benefit from the hardware compression accelerator:
backup database <dbname> compress
Log archive compression is also supported and can be configured by
updating the database configuration parameter LOGARCHCOMPR1 or
LOGARCHCOMPR2 to NX842:
update database configuration for <dbname>
using LOGARCHCOMPR1 NX842
2
Db2 BLU
Enhancements
Query Throughput BD
Insights (800GB)
1200 1.36x
1000
Queries Per Hour
800
600
400
200
0
DB2 V10.5 FP5 DB2 V11.1
QpH 703.85 955.82
Figure 4: DB Insight Query Performance
Chapter 2: Db2 BLU Enhancements 19
Figure 5: Columnar Engine Native SORT Example
Figure 6: Correlated Subquery Example
Chapter 2: Db2 BLU Enhancements 24
Decorrelation allows for bulk join processing, via HASH join exploitation,
instead of a row at a time. There are several scenarios which are
decorrelated such as:
• UPDATE statements with correlated subqueries in the
SET statement
• CASE expressions containing correlated subqueries
• OR predicates containing correlated subqueries
• Subqueries containing GROUP BY
The corresponding Db2 Query Rewrite generated decorrelated query for
the correlated subquery example in Figure 6 is as follows:
SELECT F.key
FROM fact F LEFT OUTER JOIN lookup L
ON F.key = L.key
WHERE F.flag = 1 OR
F.key IS NOT NULL;
A nested-loop join is the join method that best deals with true correlated
subqueries, which can now also be chosen as of Db2 11.1 for queries
where decorrelation is not possible.
3
pureScale Continuous
Availability
Continuous Availability
Db2 11.1 introduces some additional capabilities to Db2 pureScale that
will provide simplification with installation & configuration, virtualization,
workload management, manageability and additional Disaster Recovery
(DR) configurations.
Db2 pureScale has been providing scalability and continuous availability
through planned and unplanned events since version 9.8. Transparent
scalability means that as Members are added to the Db2 pureScale
Instance, applications can scale without any changes or awareness at the
application level. Both scalability and availability have been designed
into the architecture and possible using the highly reliable cluster caching
facility (CF) - capability ported from the gold standard of availability, Db2
zOS Sysplex. Before we get into the new capabilities delivered in Db2
11.1, let's take a quick review of the pureScale architecture.
pureScale Architecture
The architecture of a Db2 pureScale cluster (also referred to as a
pureScale instance), as shown in Figure 7, is an active/active data-sharing
environment in which multiple nodes – called Db2 members – handle the
transactional workload of the system and they also have equal and
shared access to a single copy of the database on disk.
A member is a logical component and essentially a single instance of Db2
Server with its own bufferpool, memory region, log files, etc. The single
copy of the database as well as the log files for each member are stored
within the same shared storage using the highly scalable IBM Spectrum
Scale (GPFS) filesystem.
Clients can connect into any of the members and need only know of one
as there are default workload balancing capabilities that will
automatically distribute the workload across all the active members. If a
member fails, client connections are automatically rerouted to healthy
members. A critical element of the architecture is that there is no
member to member communications.
The members communicate with the cluster Caching Facility (CF) for
centralized locking and cache management. For availability, there is
Chapter 3: pureScale Continuous Availability 28
support for two CFs, a primary and standby. This communication takes
place over a high speed, low latency interconnect via the RDMA (Remote
Direct Memory Access) protocol. Alternatively, the use of TCP/IP sockets
for the interconnect instead of an RDMA-based interconnect is also
supported. The Db2 Cluster Services (CS) provides integrated failure
detection, recovery automation and the clustered file system.
All the components are fully integrated into Db2 pureScale, in that they
are all installed together as part of a single installation process, they are
configured as part of installation, and they are managed and maintained
together all within pureScale.
Clients
CS CS CS CS
Member Member Member Member
Cluster Interconnect
Database
Shared Storage
Deployments
Db2 pureScale architecture allows for highly flexible deployment
topologies due to the logical aspects of the members and CFs. This allows
for the members and CFs to be deployed in any number of combinations,
be it virtual (LPAR, VM, KVM) or physical servers. Following are some
Chapter 3: pureScale Continuous Availability 29
example deployment options for members and CF. For the examples, we
refer to 'server' which represents either virtual or physical
implementation.
The minimal requirement for pureScale is one member and one CF.
However, the recommended deployment configuration, as shown in
Figure 8, is two members and two CFs running across two servers. As this
provides the continuously available configuration for which pureScale is
architected.
Member Member
CFp CFs
Figure 8: Two Servers - Two Members Two CFs
We can take the same two members and two CF configuration and
deploy it over three servers, as in Figure 9, or four servers as in
Figure 10.
Figure 9: Three Servers - Two Members Two CFs
Figure 10: Four Servers - Two Members Two CFs
With three servers, you can also deploy three members and two CFs, as
shown in Figure 11 and with four servers, you can have four members
and two CFs (Figure 12):
Chapter 3: pureScale Continuous Availability 30
Figure 11: Three Servers - Three Members Two CFs
Figure 12: Four Servers - Four Members Two CFs
Virtualization
Db2 pureScale has supported a variety of virtualization technologies such
as IBM PowerVM, PowerKVM, Red Hat KVM and VMWare ESXi. Db2 11.1
enhances the virtualization support for VMWare by now delivering RDMA
over Converged Ethernet (RoCE) support in VMWare through RoCE
Single-Root I/O Virtualization (SR-IOV) for RHEL 7.2. SR-IOV standard
enables one PCI Express (PCIe) adapter to be presented as multiple
separate logical devices (Virtual Functions) to virtual machines. This
allows the virtual machines to run native RoCE and achieve near wire
speed performance. This capability can be enable with Db2 pureScale
when using Mellanox ConnectX-3/ConnectX-3 Pro/Connect X-3 VPI
adapters for Ethernet.
Installation
The Db2 pureScale installation process in Db2 11.1 has been simplified
with smarter defaults, intuitive options, parallel and quick pre-
deployment validation across hosts. Allowing users to be up and running
within hours of starting the installation process. There has been re-
engineering efforts in install which reduces the complexity by at least
40% for sockets and takes a 30-step native GPFS setup down to a simple
4-step Db2 install process.
Db2 11.1 provides simplification in the setup and configuration of GPFS
replication. GPFS replication is used for customers who want to provide
protection against a complete SAN storage subsystem failure. The GPFS
replication, Figure 13, is completed with a few db2cluster commands and
is defined across two separate storage subsystems.
Chapter 3: pureScale Continuous Availability 32
Figure 13: Four Servers - Four Members Two CFs and GPFS Replication
Workload Balancing
There has been additional capability included in Db2 pureScale with
workload balancing when using member subsets. Member subsets allow
clients to connect and be balanced across a subset of the members
comprising the Db2 pureScale instance.
Db2 11.1 allows you to explicitly define the members to be used as
alternates in the event of primary member failures. An alternate
member can be any other member of the pureScale Instance. An
alternate member of a subset behaves as any other member and
processing transactions but is considered dormant for that subset until a
failure occurs to a primary member of the subset. Upon a failure, the
alternate member will be assigned workloads for the respective subset in
addition to other workloads that has been assigned/running on that
member. Db2 will attempt to maintain the same number of primary
members in the respective subset, if there are enough alternates
available.
Chapter 3: pureScale Continuous Availability 33
Failover priority has values from 0-254. Members with failover priority of
0, the default if not specified, are considered primary members of the
subset. Members with failover priority of 1-254 are considered alternate
members of the subset. If a primary member fails, an alternate member
is automatically selected to service the member subset workload. The
determination of which alternate member to assign to the subset is made
by choosing the alternate member with the lowest failover priority. If two
alternate members have the same priority defined, then the alternate
member with the lowest member number is chosen. If a primary member
fails, an alternate member is automatically selected to service the
member subset workload, from a lower failover priority. Creating and
altering member subsets is a Db2 server side dynamically managed online
operation.
Assuming we already have a member subset over members 0 and 1
called BATCH, the following example assigns member 2 and 4 as an
alternate member with failover priority of 1 and 2, respectively, to the
subset with name 'BATCH':
CALL SYSPROC.WLM_ALTER_MEMBER_SUBSET
('BATCH', NULL,
'(ADD 2 FAILOVER_PRIORITY 1,
ADD 4 FAILOVER PRIORITY 2)');
Chapter 3: pureScale Continuous Availability 34
In addition to the catalog view, the db2pd command can be used to see
the alternate member information:
db2pd –db <dbname> –membersubsetstatus
To preserve the prior Db2 11.1 behavior of member subsets, during the
upgrade, the value of FAILOVER_PRIORITY attribute is set to 0 for all
members. Prior member subset behavior can also be maintained for
newly created subsets in Db2 11.1 pureScale, by setting a
FAILOVER_PRIORTY of 0 for all the members in the respective subset.
The Db2 11.1 enhancement to member subsets allows customers
currently using client affinity to move their configuration to using
member subsets and failover priority so that they can exploit the new
benefits such as dynamic server side reconfiguration. Simplification to
setting up client affinity with having control at the Server vs client – no
need to db2dsdriver.cfg.
Failover behavior
During a planned or unplanned event to a member of a subset, the
application will automatically be routed to the alternate member with
the lowest failover priority and lowest member number, when multiple
alternates are defined with the same failover priority. Should a
subsequent event occur, applications will be routed to the next alternate
member.
Chapter 3: pureScale Continuous Availability 35
If the member subset is defined as an inclusive subset and there are not
enough alternate members to failover, then the application uses the
other available members in the cluster that are not part of the subset
Failback behavior
Once a primary member or a member with lower failover priority
becomes available, applications from the member with the highest
failover priority fail back. This means that all new applications connect to
the member with lower priority that became available and existing
applications connected to the alternate member, with higher failover
priority, can complete processing before failing back to the lower priority
member.
Manageability
Pre-Installation
As part of the improvements to installation, Db2 11.1 adds a new option,
adapter list, to the db2prereqcheck command. This option is used to
verify the network connectivity that all the hosts in the Db2 pureScale
configuration are pingable using RDMA. The option requires an input file
to be specified which contains the list of hostname, netname and adapter
names for each host to be verified.
db2prereqcheck –adapter_list <adapter_list_filename>
Where <adapter_list_filename> specifies the file name that contains the
list of hostname, netname, and adapter names for each of the host to
be verified.
The input file must have the following format:
Table 2: Adapter list filename example
Post-Installation
In addition to the enhancements to pre-installation above, Db2 11.1
delivers a post-installation unified health check option to the db2cluster
command. The db2cluster –verify command performs a comprehensive
list of checks to validate the health of the Db2 pureScale cluster.
Accordingly, an alert is raised for each failed criterion and is displayed in
the instance monitoring command db2instance -list. The validations
performed include, but are not limited to, the following:
• Configuration settings in peer domain and GPFS cluster
• Communications between members and CFs (including RDMA)
• Replication setting for each file system
• Status of each disk in the file system
Disaster Recovery
There have been enhancements to Db2 pureScale in HADR (High
Availability Disaster Recovery) and GDPC (Geographically Dispersed Db2
pureScale Cluster). Both provide zero data loss Disaster Recovery
solutions integrated and delivered as part of Db2 pureScale.
HADR
Db2 11.1 enables HADR synchronous (SYNC) and near-synchronous
(NEARSYNC) support to pureScale deployments. This enhancement
combines the continuous availability of the Db2 pureScale feature with
the robust disaster recovery capabilities of HADR providing an integrated
zero data loss (i.e. RPO=0) disaster recovery solution.
Figure 14: HADR and pureScale
Chapter 3: pureScale Continuous Availability 37
Figure 15: Geographically Dispersed Db2 pureScale Cluster
4
Compatibility
Features
Compatibility Features
Moving from one database vendor to another can sometimes be difficult
due to syntax differences between data types, functions, and language
elements. Db2 already has a high degree of compatibility with Oracle
PLSQL along with some of the Oracle data types.
Db2 11.1 introduces some additional data type and function compatibility
that will reduce the migration effort required when porting from other
systems. There are some specific features within Db2 that are targeted at
Netezza SQL and that is discussed in a separate section.
The BINARY data type will reduce the amount of conversion required
from other data bases. Although binary data was supported with the FOR
BIT DATA clause on a character column, it required manual DDL changes
when migrating a table definition.
This example shows the creation of the three types of binary data types.
CREATE TABLE HEXEY
(
AUDIO_SHORT BINARY(255),
AUDIO_LONG VARBINARY(1024),
AUDIO_CHAR VARCHAR(255) FOR BIT DATA
);
Inserting data into a binary column can be done using BINARY functions,
or the use of X'xxxx' modifiers when using the VALUE clause. For fixed
strings, you use the X'00' format to specify a binary value and BX'00' for
variable length binary strings. For instance, the following SQL will insert
data into the previous table that was created.
INSERT INTO HEXEY VALUES
(BINARY('Hello there'),
BX'2433A5D5C1',
VARCHAR_BIT_FORMAT(HEX('Hello there')));
Handling binary data with a FOR BIT DATA column was sometimes
tedious, so the BINARY columns will make coding a little simpler. You can
compare and assign values between any of these types of columns. The
next SQL statement will update the AUDIO_CHAR column with the
contents of the AUDIO_SHORT column. Then the SQL will test to make
sure they are the same value.
UPDATE HEXEY
SET AUDIO_CHAR = AUDIO_SHORT;
1
-----------
1
be used in a table definition. Db2 11.1.1.1 now allows you to use this data
type in a table definition and use TRUE/FALSE clauses to compare values.
This simple table will be used to demonstrate how BOOLEAN types can be
used.
CREATE TABLE TRUEFALSE (
EXAMPLE INT,
STATE BOOLEAN
);
The keywords for a true value are TRUE, 'true', 't', 'yes', 'y', 'on', and '1'.
For false the values are FALSE, 'false', 'f', 'no', 'n', and '0'.
INSERT INTO TRUEFALSE VALUES
(1, TRUE),
(2, FALSE),
(3, 0),
(4, 't'),
(5, 'no');
Now we can check to see what has been inserted into the table.
EXAMPLE STATE
----------- ------
1 1
2 0
3 0
4 1
5 0
Retrieving the data in a SELECT statement will return an integer value for
display purposes. 1 is true and 0 is false (binary 1 and 0).
Comparison operators with BOOLEAN data types will use TRUE, FALSE, 1
or 0 or any of the supported binary values. You have the choice of using
the equal (=) operator or the IS or IS NOT syntax as shown in the
following SQL.
SELECT * FROM TRUEFALSE
WHERE STATE = TRUE OR STATE = 1 OR STATE = 'on'
OR STATE IS TRUE;
that use different names for these data types, so Db2 11.1 now allows
these data types as synonyms for the base types.
These new data types are:
Table 3: Data Type Synonyms
Type Db2 Equivalent
INT2 SMALLINT
INT4 INTEGER
INT8 BIGINT
FLOAT4 REAL
FLOAT8 FLOAT
The following SQL will create a table with these data types.
CREATE TABLE SYNONYM_EMPLOYEE
(
NAME VARCHAR(20),
SALARY INT4,
BONUS INT2,
COMMISSION INT8,
COMMISSION_RATE FLOAT4,
BONUS_RATE FLOAT8
);
When you create a table with these other data types, Db2 does not use
these "types" in the system catalog. What Db2 will do is use the Db2 type
instead of these synonym types. If you describe the contents of a table,
you will see the Db2 types displayed, not these synonym types.
DESCRIBE TABLE SYNONYM_EMPLOYEE;
The following table lists the function names and the equivalent Db2
function that they are mapped to.
Table 4: Function Synonyms
Function Db2 Equivalent
COVAR_POP COVARIANCE
STDDEV_POP STDDEV
VAR_POP VARIANCE
VAR_SAMP VARIANCE_SAMP
ISNULL IS NULL
NOTNULL IS NOT NULL
LOG LN
RANDOM RAND
STRPOS POSSTR
STRLEFT LEFT
STRRIGHT RIGHT
BPCHAR VARCHAR (Casting function)
DISTRIBUTE ON DISTRIBUTE BY
5
Netezza
Compatibility
Netezza Compatibility
Db2 provides features that enable applications that were written for a
Netezza Performance Server (NPS) database to be used against a Db2
database.
The SQL_COMPAT global variable is used to activate the following
optional NPS compatibility features:
• Double-dot notation - When operating in NPS compatibility mode,
you can use double-dot notation to specify a database object.
• TRANSLATE parameter syntax - The syntax of the TRANSLATE
parameter depends on whether NPS compatibility mode is
being used.
• Operators - Which symbols are used to represent operators in
expressions depends on whether NPS compatibility mode is being
used.
• Grouping by SELECT clause columns - When operating in NPS
compatibility mode, you can specify the ordinal position or
exposed name of a SELECT clause column when grouping the
results of a query.
• Routines written in NZPLSQL - When operating in NPS
compatibility mode, the NZPLSQL language can be used in
addition to the SQL PL language.
The following SQL will display the Db2 interpretation of the special
characters.
SET SQL_COMPAT = 'DB2';
WITH SPECIAL(OP, DESCRIPTION, EXAMPLE, RESULT) AS
(
VALUES
(' ^ ','XOR ', '2 ^ 3 ', 2 ^ 3),
(' # ','NONE ', ' ',0)
)
SELECT * FROM SPECIAL;
SELECT WORKDEPT,INT(AVG(SALARY))
FROM EMPLOYEE
GROUP BY WORKDEPT;
WORKDEPT 2
-------- -----------
A00 70850
B01 94250
C01 77222
D11 58784
D21 51240
E01 80175
E11 45306
E21 47087
Chapter 5: Netezza Compatibility 49
If you try using the ordinal location (similar to an ORDER BY clause), you
will get an error message.
SELECT WORKDEPT, INT(AVG(SALARY))
FROM EMPLOYEE
GROUP BY 1;
TRANSLATE Function
The translate function syntax in Db2 is:
TRANSLATE(expression, to_string, from_string, padding)
The TRANSLATE function returns a value in which one or more characters
in a string expression might have been converted to other characters. The
function converts all the characters in char-string-exp in from-string-exp
to the corresponding characters in to-string-exp or, if no corresponding
characters exist, to the pad character specified by padding.
If no parameters are given to the function, the original string is converted
to uppercase.
In NPS mode, the translate syntax is:
TRANSLATE(expression, from_string, to_string)
If a character is found in the from string, and there is no corresponding
character in the to string, it is removed. If it was using Db2 syntax, the
padding character would be used instead.
Note: If ORACLE compatibility is ON then the behavior of TRANSLATE is
identical to NPS mode.
Chapter 5: Netezza Compatibility 50
1
-----
HELLO
1
-----
Hell1
Note that you could replace more than one character by expanding both
the "to" and "from" strings. This example will replace the letter "e" with a
"2" as well as "o" with "1".
VALUES TRANSLATE('Hello','oe','12');
1
-----
H2ll1
1
-----
H21
6
SQL Extensions
SQL Extensions
Db2 has the ability to limit the amount of data retrieved on a SELECT
statement through the use of the FETCH FIRST n ROWS ONLY clause. In
Db2 11.1, the ability to offset the rows before fetching was added to the
FETCH FIRST clause.
LASTNAME
---------------
HAAS
THOMPSON
KWAN
GEYER
STERN
You can also add ORDER BY and GROUP BY clauses in the SELECT
statement. Note that Db2 still needs to process all the records and do the
ORDER/GROUP BY work before limiting the answer set. So, you are not
getting the first 5 rows "sorted". You are getting the entire answer set
sorted before retrieving just 5 rows.
SELECT LASTNAME FROM EMPLOYEE
ORDER BY LASTNAME
FETCH FIRST 5 ROWS ONLY;
LASTNAME
---------------
ADAMSON
ALONZO
BROWN
GEYER
GOUNOT
Here is an example with the GROUP BY statement. This first SQL
statement gives us the total answer set - the count of employees by
WORKDEPT.
Chapter 6: SQL Extensions 53
WORKDEPT 2
-------- -----------
A00 5
B01 1
C01 4
D11 11
D21 7
E01 1
E11 7
E21 6
Adding the FETCH FIRST clause only reduces the rows returned, not the
rows that are used to compute the GROUPing result.
SELECT WORKDEPT, COUNT(*) FROM EMPLOYEE
GROUP BY WORKDEPT
ORDER BY WORKDEPT
FETCH FIRST 5 ROWS ONLY;
WORKDEPT 2
-------- -----------
A00 5
B01 1
C01 4
D11 11
D21 7
OFFSET Extension
The FETCH FIRST n ROWS ONLY clause can also include an OFFSET
keyword. The OFFSET keyword allows you to retrieve the answer set after
skipping "n" number of rows. The syntax of the OFFSET keyword is:
OFFSET n ROWS FETCH FIRST x ROWS ONLY
The OFFSET n ROWS must precede the FETCH FIRST x ROWS ONLY clause.
The OFFSET clause can be used to scroll down an answer set without
having to hold a cursor. For instance, you could have the first SELECT call
request 10 rows by just using the FETCH FIRST clause. After that you
could request the first 10 rows be skipped before retrieving the next 10
rows.
The one thing you must be aware of is that that answer set could change
between calls if you use this technique of a "moving" window. If rows are
updated or added after your initial query you may get different results.
Chapter 6: SQL Extensions 54
This is due to the way that Db2 adds rows to a table. If there is a DELETE
and then an INSERT, the INSERTed row may end up in the empty slot.
There is no guarantee of the order of retrieval. For this reason, you are
better off using an ORDER BY to force the ordering although this too
won't always prevent rows changing positions.
Here are the first 10 rows of the employee table (not ordered).
SELECT LASTNAME FROM EMPLOYEE
FETCH FIRST 10 ROWS ONLY;
LASTNAME
---------------
HAAS
THOMPSON
KWAN
GEYER
STERN
PULASKI
HENDERSON
SPENSER
LUCCHESSI
O'CONNELL
You can specify a zero offset to begin from the beginning.
SELECT LASTNAME FROM EMPLOYEE
OFFSET 0 ROWS
FETCH FIRST 10 ROWS ONLY;
Now we can move the answer set ahead by 5 rows and get the remaining
5 rows in the answer set.
SELECT LASTNAME FROM EMPLOYEE
OFFSET 5 ROWS
FETCH FIRST 5 ROWS ONLY;
LASTNAME
---------------
PULASKI
HENDERSON
SPENSER
LUCCHESSI
O'CONNELL
subselect. In this case, you are limiting the amount of data that Db2 will
scan when determining the answer set.
For instance, say you wanted to find the names of the employees who
make more than the average salary of the 3rd highest paid department.
(By the way, there are multiple ways to do this, but this is one approach).
The first step is to determine what the average salary is of all
departments.
SELECT WORKDEPT, AVG(SALARY) FROM EMPLOYEE
GROUP BY WORKDEPT
ORDER BY AVG(SALARY) DESC;
WORKDEPT 2
-------- ------------------------------------------
B01 94250
E01 80175
C01 77222.5
A00 70850
D11 58783.63636363636363636363636363636
D21 51240
E21 47086.66666666666666666666666666667
E11 45305.71428571428571428571428571429
We only want one record from this list (the third one), so we can use the
FETCH FIRST clause with an OFFSET to get the value we want (Note: we
need to skip 2 rows to get to the 3rd one).
SELECT WORKDEPT, AVG(SALARY) FROM EMPLOYEE
GROUP BY WORKDEPT
ORDER BY AVG(SALARY) DESC
OFFSET 2 ROWS FETCH FIRST 1 ROWS ONLY;
WORKDEPT 2
-------- ------------------------------------------
C01 77222.5
And here is the list of employees that make more than the average salary
of the 3rd highest department in the company.
SELECT LASTNAME, SALARY FROM EMPLOYEE
WHERE
SALARY > (
SELECT AVG(SALARY) FROM EMPLOYEE
GROUP BY WORKDEPT
ORDER BY AVG(SALARY) DESC
OFFSET 2 ROWS FETCH FIRST 1 ROW ONLY
)
ORDER BY SALARY;
Chapter 6: SQL Extensions 56
LASTNAME SALARY
--------------- -----------
GEYER 80175.00
SPENSER 86150.00
HENDERSON 89750.00
THOMPSON 94250.00
PULASKI 96170.00
KWAN 98250.00
HAAS 152750.00
WORKDEPT 2
-------- ------------------------------------------
C01 77222.5
And here is the list of employees that make more than the average salary
of the 3rd highest department in the company. Note that the LIMIT
clause specifies only the offset (LIMIT x) or the offset and limit (LIMIT y,x)
when you do not use the LIMIT keyword. One would think that LIMIT x
OFFSET y would translate into LIMIT x,y but that is not the case. Don't try
to figure out the SQL standards reasoning behind the syntax!
Chapter 6: SQL Extensions 57
LASTNAME SALARY
--------------- -----------
GEYER 80175.00
SPENSER 86150.00
HENDERSON 89750.00
THOMPSON 94250.00
PULASKI 96170.00
KWAN 98250.00
HAAS 152750.00
You can also use the OFFSET clause as part of the FETCH FIRST ONLY to
get chunks of data from the original table.
CREATE TABLE AS_EMP(DEPARTMENT, LASTNAME) AS
(SELECT WORKDEPT, LASTNAME FROM EMPLOYEE
OFFSET 5 ROWS
FETCH FIRST 10 ROWS ONLY
) WITH DATA;
7
Date Functions
Date Functions
There are plenty of new date and time functions found in Db2 11.1.
These functions allow you to extract portions from a date and format the
date in a variety of different ways. While Db2 already has a number of
date and time functions, these new functions allow for greater
compatibility with other database implementations, making it easier to
port to Db2.
Extract Function
The EXTRACT function extracts an element from a date/time value. The
syntax of the EXTRACT command is:
EXTRACT( element FROM expression )
This is a slightly different format from most functions that you see in Db2.
Element must be one of the following values:
Table 7: Date Elements
Element Name Description
EPOCH Number of seconds since 1970-01-01 00:00:00.00. The value can be
positive or negative.
MILLENNIUM(S) The millennium is to be returned.
CENTURY(CENTURIES) The number of full 100-year periods represented by the year.
DECADE(S) The number of full 10-year periods represented by the year.
YEAR(S) The year portion is to be returned.
QUARTER The quarter of the year (1 - 4) is to be returned.
MONTH The month portion is to be returned.
WEEK The number of the week of the year (1 - 53) that the specified day is
to be returned.
DAY(S) The day portion is to be returned.
DOW The day of the week that is to be returned. Note that "1" represents
Sunday.
DOY The day (1 - 366) of the year that is to be returned.
HOUR(S) The hour portion is to be returned.
MINUTE(S) The minute portion is to be returned.
SECOND(S) The second portion is to be returned.
MILLISECOND(S) The second of the minute, including fractional parts to one
thousandth of a second
MICROSECOND(S) The second of the minute, including fractional parts to one millionth
of a second
Chapter 7: Date Functions 62
This SQL will return every possible extract value from the current date.
The NOW keyword is a synonym for CURRENT TIMESTAMP.
WITH DATES(FUNCTION, RESULT) AS (
VALUES
('EPOCH', EXTRACT( EPOCH FROM NOW )),
('MILLENNIUM(S)', EXTRACT( MILLENNIUM FROM NOW )),
('CENTURY(CENTURIES)', EXTRACT( CENTURY FROM NOW )),
('DECADE(S)', EXTRACT( DECADE FROM NOW )),
('YEAR(S)', EXTRACT( YEAR FROM NOW )),
('QUARTER', EXTRACT( QUARTER FROM NOW )),
('MONTH', EXTRACT( MONTH FROM NOW )),
('WEEK', EXTRACT( WEEK FROM NOW )),
('DAY(S)', EXTRACT( DAY FROM NOW )),
('DOW', EXTRACT( DOW FROM NOW )),
('DOY', EXTRACT( DOY FROM NOW )),
('HOUR(S)', EXTRACT( HOURS FROM NOW )),
('MINUTE(S)', EXTRACT( MINUTES FROM NOW )),
('SECOND(S)', EXTRACT( SECONDS FROM NOW )),
('MILLISECOND(S)', EXTRACT( MILLISECONDS FROM NOW )),
('MICROSECOND(S)', EXTRACT( MICROSECONDS FROM NOW ))
)
SELECT * FROM DATES;
FUNCTION RESULT
------------------ ---------------------------
EPOCH 1474090894.000000
MILLENNIUM(S) 2.000000
CENTURY(CENTURIES) 20.000000
DECADE(S) 201.000000
YEAR(S) 2016.000000
QUARTER 3.000000
MONTH 9.000000
WEEK 38.000000
DAY(S) 17.000000
DOW 7.000000
DOY 261.000000
HOUR(S) 5.000000
MINUTE(S) 41.000000
SECOND(S) 34.578000
MILLISECOND(S) 34578.000000
MICROSECOND(S) 34578000.000000
DATE_PART Function
DATE_PART is like the EXTRACT function but it uses the more familiar
function syntax:
DATE_PART(element, expression)
In the case of the DATE_PART function, the element must be placed in
quotes, rather than as a keyword in the EXTRACT function. in addition,
the DATE_PART always returns a BIGINT, while the EXTRACT function will
Chapter 7: Date Functions 63
FUNCTION RESULT
------------------ --------------------
EPOCH 1474090894
MILLENNIUM(S) 2
CENTURY(CENTURIES) 20
DECADE(S) 201
YEAR(S) 2016
QUARTER 3
MONTH 9
WEEK 38
DAY(S) 17
DOW 7
DOY 261
HOUR(S) 5
MINUTE(S) 41
SECOND(S) 34
MILLISECOND(S) 34809
MICROSECOND(S) 34809000
DATE_TRUNC Function
DATE_TRUNC computes the same results as the DATE_PART function but
then truncates the value down. Note that not all values can be truncated.
The function syntax is:
DATE_TRUNC(element, expression)
FUNCTION RESULT
------------------ --------------------------
MILLENNIUM(S) 2000-01-01 00:00:00.000000
CENTURY(CENTURIES) 2000-01-01 00:00:00.000000
DECADE(S) 2010-01-01 00:00:00.000000
YEAR(S) 2016-01-01 00:00:00.000000
QUARTER 2016-07-01 00:00:00.000000
MONTH 2016-09-01 00:00:00.000000
WEEK 2016-09-12 00:00:00.000000
DAY(S) 2016-09-17 00:00:00.000000
HOUR(S) 2016-09-17 05:00:00.000000
MINUTE(S) 2016-09-17 05:41:00.000000
SECOND(S) 2016-09-17 05:41:35.000000
MILLISEC(S) 2016-09-17 05:41:35.049000
MICROSEC(S) 2016-09-17 05:41:35.049000
Chapter 7: Date Functions 65
1
--------------------------
2016-09-17 05:41:35.229000
1
-----------
17
FIRST_DAY will return the first day of the month. You could probably
compute this with standard SQL date functions, but it is a lot easier just
to use this built-in function.
VALUES FIRST_DAY(NOW);
1
--------------------------
2016-09-01 05:41:35.399000
Finally, DAYS_TO_END_OF_MOTNH will return the number of days to the
end of the month. A Zero would be returned if you are on the last day of
the month.
VALUES DAYS_TO_END_OF_MONTH(NOW);
1
-----------
13
Chapter 7: Date Functions 66
FUNCTION RESULT
--------------- --------------------------
CURRENT DATE 2016-09-17 05:41:35.669000
ADD_YEARS 2017-09-17 05:41:35.669000
ADD_MONTHS 2016-10-17 05:41:35.669000
ADD_DAYS 2016-09-18 05:41:35.669000
ADD_HOURS 2016-09-17 06:41:35.669000
ADD_MINUTES 2016-09-17 05:42:35.669000
ADD_SECONDS 2016-09-17 05:41:36.669000
Chapter 7: Date Functions 67
A negative number can be used to subtract values from the current date.
WITH DATES(FUNCTION, RESULT) AS
(
VALUES
('CURRENT DATE ',NOW),
('ADD_YEARS ',ADD_YEARS(NOW,-1)),
('ADD_MONTHS ',ADD_MONTHS(NOW,-1)),
('ADD_DAYS ',ADD_DAYS(NOW,-1)),
('ADD_HOURS ',ADD_HOURS(NOW,-1)),
('ADD_MINUTES ',ADD_MINUTES(NOW,-1)),
('ADD_SECONDS ',ADD_SECONDS(NOW,-1))
)
SELECT * FROM DATES;
FUNCTION RESULT
--------------- --------------------------
CURRENT DATE 2016-09-17 05:41:35.749000
ADD_YEARS 2015-09-17 05:41:35.749000
ADD_MONTHS 2016-08-17 05:41:35.749000
ADD_DAYS 2016-09-16 05:41:35.749000
ADD_HOURS 2016-09-17 04:41:35.749000
ADD_MINUTES 2016-09-17 05:40:35.749000
ADD_SECONDS 2016-09-17 05:41:34.749000
Chapter 7: Date Functions 68
FUNCTION RESULT
--------------- --------------------------
CURRENT DATE 2016-09-17 05:41:35.879000
THIS_WEEK 2016-09-11 00:00:00.000000
THIS_MONTH 2016-09-01 00:00:00.000000
THIS_QUARTER 2016-07-01 00:00:00.000000
THIS_YEAR 2016-01-01 00:00:00.000000
There is also a NEXT function for each of these. The NEXT function will
return the next week, month, quarter, or year given a current date.
WITH DATES(FUNCTION, RESULT) AS
(
VALUES
('CURRENT DATE ',NOW),
('NEXT_WEEK ',NEXT_WEEK(NOW)),
('NEXT_MONTH ',NEXT_MONTH(NOW)),
('NEXT_QUARTER ',NEXT_QUARTER(NOW)),
('NEXT_YEAR ',NEXT_YEAR(NOW))
)
SELECT * FROM DATES;
FUNCTION RESULT
--------------- --------------------------
CURRENT DATE 2016-09-17 05:41:35.979000
NEXT_WEEK 2016-09-18 00:00:00.000000
NEXT_MONTH 2016-10-01 00:00:00.000000
NEXT_QUARTER 2016-10-01 00:00:00.000000
NEXT_YEAR 2017-01-01 00:00:00.000000
The following SQL will show you the "day" after the current date that is
Monday through Sunday.
WITH DATES(FUNCTION, RESULT) AS
(
VALUES
('CURRENT DATE ',NOW),
('Monday ',NEXT_DAY(NOW,'Monday')),
('Tuesday ',NEXT_DAY(NOW,'TUE')),
('Wednesday ',NEXT_DAY(NOW,'Wednesday')),
('Thursday ',NEXT_DAY(NOW,'Thursday')),
('Friday ',NEXT_DAY(NOW,'FRI')),
('Saturday ',NEXT_DAY(NOW,'Saturday')),
('Sunday ',NEXT_DAY(NOW,'Sunday'))
)
SELECT * FROM DATES;
FUNCTION RESULT
--------------- --------------------------
CURRENT DATE 2016-09-17 05:41:36.139000
Monday 2016-09-19 05:41:36.139000
Tuesday 2016-09-20 05:41:36.139000
Wednesday 2016-09-21 05:41:36.139000
Thursday 2016-09-22 05:41:36.139000
Friday 2016-09-23 05:41:36.139000
Saturday 2016-09-24 05:41:36.139000
Sunday 2016-09-18 05:41:36.139000
FUNCTION RESULT
--------------- --------------------
SECONDS_BETWEEN 32230861
MINUTES_BETWEEN 537181
HOURS_BETWEEN 8953
DAYS BETWEEN 373
WEEKS_BETWEEN 53
YEARS_BETWEEN 1
MONTHS_BETWEEN Function
You may have noticed that the MONTHS_BETWEEN function was not in
the previous list of functions. The reason for this is that the value
returned for MONTHS_BETWEEN is different from the other functions.
The MONTHS_BETWEEN function returns a DECIMAL value rather than an
integer value. The reason for this is that the duration of a month is not as
precise as a day, week or year. The following example will show how the
duration is a decimal value rather than an integer. You could always
truncate the value if you want an integer.
WITH DATES(FUNCTION, RESULT) AS (
VALUES
('0 MONTH ',MONTHS_BETWEEN(NOW, NOW)),
('1 MONTH ',MONTHS_BETWEEN(NOW + 1 MONTH, NOW)),
('1 MONTH + 1 DAY',MONTHS_BETWEEN(NOW + 1 MONTH + 1 DAY, NOW)),
('LEAP YEAR ',MONTHS_BETWEEN('2016-02-01','2016-03-01')),
('NON-LEAP YEAR ',MONTHS_BETWEEN('2015-02-01','2015-03-01'))
)
SELECT * FROM DATES;
Chapter 7: Date Functions 71
FUNCTION RESULT
--------------- ---------------------------------
0 MONTH 0.000000000000000
1 MONTH 1.000000000000000
1 MONTH + 1 DAY 1.032258064516129
LEAP YEAR -1.000000000000000
NON-LEAP YEAR -1.000000000000000
FUNCTION RESULT
------------------------------ -----------
AGE + 1 DAY 1
AGE + 1 MONTH 100
AGE + 1 YEAR 10000
AGE + 1 DAY + 1 MONTH 101
AGE + 1 DAY + 1 YEAR 10001
AGE + 1 DAY + 1 MONTH + 1 YEAR 10101
The YMD_BETWEEN function is like the AGE function except that it takes
two date arguments. We can simulate the AGE function by supplying the
NOW function to the YMD_BETWEEN function.
Chapter 7: Date Functions 72
WITH DATES(FUNCTION, RESULT) AS (
VALUES
('1 DAY ',YMD_BETWEEN(NOW,NOW - 1 DAY)),
('1 MON ',YMD_BETWEEN(NOW,NOW - 1 MONTH)),
('1 YR ',YMD_BETWEEN(NOW,NOW - 1 YEAR)),
('1 DAY + 1 MON ',YMD_BETWEEN(NOW,NOW - 1 DAY - 1 MONTH)),
('1 DAY +1 YR ',YMD_BETWEEN(NOW,NOW - 1 DAY - 1 YEAR)),
('1 DAY+1 MON +1 YR',YMD_BETWEEN(NOW,NOW - 1 DAY - 1 MONTH - 1 YEAR))
)
SELECT * FROM DATES;
FUNCTION RESULT
------------------------------ -----------
1 DAY 1
1 MONTH 100
1 YEAR 10000
1 DAY + 1 MONTH 101
1 DAY + 1 YEAR 10001
1 DAY + 1 MONTH + 1 YEAR 10101
OVERLAPS Predicate
The OVERLAPS predicate is used to determine whether two chronological
periods overlap. This is not a function within Db2, but rather a special SQL
syntax extension.
A chronological period is specified by a pair of date-time expressions. The
first expression specifies the start of a period; the second specifies its
end.
(start1, end1) OVERLAPS (start2, end2)
The beginning and end values are not included in the periods. The
following summarizes the overlap logic. For example, the periods 2016-
10-19 to 2016-10-20 and 2016-10-20 to 2016-10-21 do not overlap.
The following interval does not overlap.
VALUES
CASE
WHEN (NOW, NOW + 1 DAY) OVERLAPS (NOW + 1 DAY, NOW + 2 DAYS)
THEN 'Overlaps'
ELSE 'No Overlap'
END;
1
----------
No Overlap
Chapter 7: Date Functions 73
If the first date range is extended by one day, then the range will overlap.
VALUES
CASE
WHEN (NOW, NOW + 2 DAYS) OVERLAPS (NOW + 1 DAY, NOW + 2 DAYS)
THEN 'Overlaps'
ELSE 'No Overlap'
END;
1
----------
Overlaps
Identical date ranges will overlap.
VALUES
CASE
WHEN (NOW, NOW + 1 DAY) OVERLAPS (NOW, NOW + 1 DAY)
THEN 'Overlaps'
ELSE 'No Overlap'
END;
1
----------
Overlaps
At this point in time(!) we can start inserting records into our table. We
have already set the timezone to be Toronto, so the next insert
statement will take the current time (NOW) and insert it into the table.
For reference, here is the current time when the example was run.
VALUES NOW;
1
--------------------------
2016-09-17 05:53:18.956000
We will insert one record into the table and immediately retrieve the
result.
INSERT INTO TXS VALUES(1,1,NOW);
ID CUSTID TXTIME
----------- ----------- --------------------------
1 1 2016-09-17 05:53:19.056000
Chapter 7: Date Functions 77
Note that the timestamp appears to be the same as what we insert (plus
or minus a few seconds). What actually sits in the base table is the UTC
time.
SELECT * FROM TXS_BASE;
ID CUSTID TXTIME_UTC
----------- ----------- --------------------------
1 1 2016-09-17 09:53:19.056000
We can modify the time that is returned to us by changing our local
timezone. The statement will make the system think we are in
Vancouver.
SET TIME_ZONE = 'America/Vancouver';
Retrieving the results will show that the timestamp has shifted by 3 hours
(Vancouver is 3 hours behind Toronto).
SELECT * FROM TXS;
ID CUSTID TXTIME
----------- ----------- --------------------------
1 1 2016-09-17 02:53:19.056000
So, what happens if we insert a record into the table now that we are in
Vancouver?
INSERT INTO TXS VALUES(2,2,NOW);
ID CUSTID TXTIME
----------- ----------- --------------------------
1 1 2016-09-17 02:53:19.056000
2 2 2016-09-17 05:53:19.436000
The data retrieved reflects the fact that we are now in Vancouver from an
application perspective. Looking at the base table and you will see that
everything has been converted to UTC time.
SELECT * FROM TXS_BASE;
ID CUSTID TXTIME_UTC
----------- ----------- --------------------------
1 1 2016-09-17 09:53:19.056000
2 2 2016-09-17 12:53:19.436000
Finally, we can switch back to Toronto time and see when the
transactions were done. You will see that from a Toronto perspective
Chapter 7: Date Functions 78
that the transactions were done three hours later because of the
timezone differences.
SET TIME_ZONE = 'America/Toronto';
ID CUSTID TXTIME
----------- ----------- --------------------------
1 1 2016-09-17 05:53:19.056000
2 2 2016-09-17 08:53:19.436000
8
Binary Manipulation
Hex Functions
Several new HEX manipulation functions have been added to
Db2 11.1. There are a class of functions that manipulate different size
integers (SMALL, INTEGER, BIGINT) using NOT, OR, AND, and XOR. In
addition to these functions, there are several functions that display and
convert values into hexadecimal values.
INTN Functions
The INTN functions are bitwise functions that operate on the "two's
complement" representation of the integer value of the input arguments
and return the result as a corresponding base 10 integer value. The
function names all include the size of the integers that are being
manipulated:
Table 10: Argument Sizes
Identifier (N) Integer Size
2 SMALLINT – 2 bytes
4 INTEGER – 4 bytes
8 BIGINT – 8 bytes
There are four functions:
• INTNAND - Performs a bitwise AND operation
• INTNOR - Performs a bitwise OR operation
• INTNXOR Performs a bitwise exclusive OR operation
• INTNNOT - Performs a bitwise NOT operation
This example will show the four functions used against INT2 data types.
WITH LOGIC(EXAMPLE, X, Y, RESULT) AS
(
VALUES
('INT2AND(X,Y)',1,3,INT2AND(1,3)),
('INT2OR(X,Y) ',1,3,INT2OR(1,3)),
('INT2XOR(X,Y)',1,3,INT2XOR(1,3)),
('INT2NOT(X) ',1,3,INT2NOT(1)) )
SELECT * FROM LOGIC;
EXAMPLE X Y RESULT
------------ ------ ------ ------
INT2AND(X,Y) 1 3 1
INT2OR(X,Y) 1 3 3
INT2XOR(X,Y) 1 3 2
INT2NOT(X) 1 3 -2
Chapter 8: Binary Manipulation 81
You can mix and match the INT2, INT4, and INT8 values in these functions
but you may get truncation if the value is too big.
TO_HEX Function
The TO_HEX function converts a numeric expression into a character
hexadecimal representation. For example, the numeric value 255
represents x'FF'. The value returned from this function is a VARCHAR
value and its length depends on the size of the number you supply.
VALUES
TO_HEX(255);
1
--------
ff
RAWTOHEX Function
The RAWTOHEX function returns a hexadecimal representation of a value
as a character string. The result is a character string itself.
VALUES RAWTOHEX('00');
1
----
3030
The string "00" converts to a hex representation of x'3030' which is
12336 in Decimal. So, the TO_HEX function would convert this back to
the HEX representation.
VALUES
TO_HEX(12336);
1
--------
3030
9
Regular Expressions
Regular Expressions
Db2 11.1 introduced support for regular expressions. Regular expressions
allow you to do very complex pattern matching in character strings.
Normal SQL LIKE searches are limited to very specific patterns, but
regular expressions have a rich syntax that gives you much more
flexibility in searching.
Chapter 9: Regular Expressions 84
Codeunits can be specified as CODEUNITS16, CODEUNITS32, or OCTETS.
CODEUNITS16 specifies that start is expressed in 16-bit UTF-16 code
units. CODEUNITS32 specifies that start is expressed in 32-bit UTF-32
code units. OCTETS specify that start is expressed in bytes.
Pattern and flag values are complex and so are discussed in the following
sections.
STATION
--------------------
West Ruislip
Ruislip Gardens
South Ruislip
The pattern 'Ruislip' will look for a match of Ruislip within the STATION
column. Note that this pattern will also match 'West Ruislip' or 'Ruislip
Gardens' since we placed no restriction on where the pattern can be
found in the string. The match will also be exact (case matters). This type
of search would be equivalent to using the SQL LIKE statement:
SELECT STATION FROM CENTRAL_LINE
WHERE STATION LIKE '%Ruislip%';
STATION
--------------------
West Ruislip
Ruislip Gardens
South Ruislip
If you didn't place the % at the beginning of the LIKE string, only the
stations that start with Ruislip would be found.
SELECT STATION FROM CENTRAL_LINE
WHERE STATION LIKE 'Ruislip%';
STATION
--------------------
Ruislip Gardens
If you want to match Ruislip with upper or lower case being ignored, you
would add the 'i' flag as part of the REGEXP_LIKE (or any REGEXP
function).
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'RUISLIP','i');
STATION
--------------------
West Ruislip
Ruislip Gardens
South Ruislip
force a match to start at the beginning of a string, the carat symbol ^ can
be used to force the match to occur at the beginning of a string.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'^Ruislip');
STATION
--------------------
Ruislip Gardens
To match a pattern at the end of the string, the dollar sign $ can be used.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'Ruislip$');
STATION
--------------------
West Ruislip
South Ruislip
To force an exact match with a string you would use both the beginning
and end anchors.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'^Leyton$');
STATION
--------------------
Leyton
Note that if we didn't use the end anchor, we are going to get more than
one result.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'^Leyton');
STATION
--------------------
Leyton
Leytonstone
The following SQL will insert a single line with multiple CRLF characters in
it to simulate a multi-line text string.
CREATE TABLE LONGLINE (NAME VARCHAR(255));
Searching for Katrina at the beginning and end of string doesn't work.
SELECT COUNT(*) FROM LONGLINE
WHERE REGEXP_LIKE(NAME,'^Katrina$');
1
-----------
0
We can override the regular expression search by telling it to treat each
NL/CRLF as the end of a string within a string.
SELECT COUNT(*) FROM LONGLINE
WHERE REGEXP_LIKE(NAME,'^Katrina$','m');
1
-----------
1
Logical OR Operator
Regular expressions can match more than one pattern. The OR operator
(|) is used to define alternative patterns that can match in a string. The
following example searches for stations that have "ing" in their name as
well as "hill".
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'way|ing');
STATION
--------------------
Ealing Broadway
Notting Hill Gate
Queensway
Barkingside
Roding Valley
Epping
Some things to be aware of when creating the search pattern. Spaces in
the patterns themselves are significant. If the previous search pattern
had a space in one of the words, it would not find it (unless of course
there was a space in the station name).
Chapter 9: Regular Expressions 89
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'way| ing');
STATION
--------------------
Ealing Broadway
Queensway
Using the "x" flag will ignore blanks in your pattern, so this would fix
issues that we have in the previous example.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'way| ing','x');
STATION
--------------------
Ealing Broadway
Notting Hill Gate
Queensway
Barkingside
Roding Valley
Epping
Brackets can be used to make it clear what the pattern is that you are
searching for and avoid the problem of having blanks in the expression.
Brackets do have a specific usage in regular expressions, but here we are
using it only to separate the two search strings.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(way)|(ing)');
STATION
--------------------
Ealing Broadway
Notting Hill Gate
Queensway
Barkingside
Roding Valley
Epping
Combining Patterns
As we found out in the previous section, there is an OR operator that you
can use to select between two patterns. How do you request that
multiple patterns be present? First, we must understand how matching
occurs when we have multiple strings that need to be matched that have
an unknown number of characters between them.
Chapter 9: Regular Expressions 90
For instance, how do we create a pattern that looks for "ing" followed by
"way" somewhere in the string? Regular expressions recognize the "."
(period) character as matching anything. Following the pattern, you can
add a modifier that specifies how many times you want the pattern
matched:
Table 12: London Central Line (Subset)
Char Meaning
* Match zero or more times
? Match zero or one times
+ Match one or more times
{m} Match exactly m times
{m,} Match as least a minimum of m times
{m,n} Match at least a minimum of m times and no more than n times
The following regular expression searches for a pattern with "ing"
followed by any characters and then "way".
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(ing)*.(way)');
STATION
--------------------
Ealing Broadway
Queensway
The previous answer gave you two results (Ealing Broadway and
Queensway). Why two? The reason is that we used the * in the wrong
place (a single character in a wrong place can result in very different
results!). What we really needed to do was place a .* after the (ing) to
match "ing" and then any characters, before matching "way". What our
query did above was match 0 or more occurrences of "ing", followed by
any character, and then match "way". Here is the correct query.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(ing).*(way)')
STATION
--------------------
Ealing Broadway
Finding at least one occurrence of a pattern requires the use of the +
operator, or the bracket operators. This example locates at least one
occurrence of the "an" string in station names.
Chapter 9: Regular Expressions 91
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(an)+');
STATION
--------------------
Hanger Lane
Holland Park
Lancaster Gate
Chancery Lane
Bank
Wanstead
Gants Hill
Grange Hill
If we want to find an exact number of occurrences, we need to use the {}
notation to tell the regular expression matcher how many we want to
find. The syntax of the {} match is:
• {m} – Match exactly m times
• {m,} – Match as least a minimum of m times
• {m,n} – Match at least a minimum of m times and no more than n
times
So the "+" symbol is equivalent to the following regular expression using
the {} syntax.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(an){1,}');
STATION
--------------------
Hanger Lane
Holland Park
Lancaster Gate
Chancery Lane
Bank
Wanstead
Gants Hill
Grange Hill
If we want to match exactly 2 'an' patterns in a string, we would think
that changing the expression to {2} would work.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(an){2}');
No records found.
Sadly, we get no results! This would appear to be the wrong result, but
it's because we got lucky with our first search! The best way to figure out
Chapter 9: Regular Expressions 92
STATION
--------------------
Hanger Lane
Chancery Lane
You should find that two stations match the pattern. The following SQL
shows which pattern is matched first in the STATIONS names.
SELECT STATION,
REGEXP_INSTR(STATION,'((an).*){2}') AS LOCATION,
REGEXP_EXTRACT(STATION,'((an).*){2}') AS EXTRACT
FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'((an).*){2}');
Chapter 9: Regular Expressions 94
STATION
--------------------
Ruislip Gardens
Perivale
Queensway
Redbridge
Roding Valley
If you wanted to include all stations that have the letter P-R or p-e, you
could add the condition within the brackets.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'[p-rP-R]');
STATION
--------------------
West Ruislip
Ruislip Gardens
South Ruislip
Northolt
Greenford
Perivale
Hanger Lane
Ealing Broadway
North Acton
Shepherd's Bush
The number pattern can be represented with the bracket expression [0-
9]. To specify the number of characters that need to be found, we use the
braces {} to specify the exact number required.
For the three numbers in the pattern we can use [0-9]{3}, [0-9]{2}, and [0-
9]{4}. Adding in the dashes gives us the final pattern. The SQL below
checks to see if a SSN is correct.
VALUES
CASE
WHEN REGEXP_LIKE('123-34-1422','[0-9]{3}-[0-9]{2}-[0-9]{4}')
THEN 'Valid'
ELSE 'Invalid'
END;
1
-------
Valid
The SSN is valid in the example above. Here are some other examples to
show whether or not the regular expression picks up all of the errors.
WITH SSNS(SSN) AS (
VALUES
'123-34-1322',
'ABC-34-9999',
'X123-44-0001',
'123X-Y44-Z0001',
'111-222-111'
)
SELECT SSN,
CASE
WHEN REGEXP_LIKE(SSN,'[0-9]{3}-[0-9]{2}-[0-9]{4}')
THEN 'Valid'
ELSE 'Invalid'
END
FROM SSNS;
SSN 2
-------------- -------
123-34-1322 Valid
ABC-34-9999 Invalid
X123-44-0001 Valid
123X-Y44-Z0001 Invalid
111-222-111 Invalid
If you check closely, one of the strings was marked as valid, although it is
not correct (X123-44-0001). The reason this occurred is that the pattern
was found after the "X" and it was correct. To prevent this from
happening, we need to anchor the pattern at the beginning to avoid this
situation. A better pattern would be to anchor both ends of the pattern
Chapter 9: Regular Expressions 96
SSN 2
-------------- -------
123-34-1322 Valid
ABC-34-9999 Invalid
X123-44-0001 Invalid
123X-Y44-Z0001 Invalid
111-222-111 Invalid
Special Patterns
The previous example used the [0-9] syntax to request that only numbers
be found in the pattern. There are some predefined patterns that define
these common patterns. The first argument is Posix format (if it exists),
the second is the escape character equivalent, and the final one is the
raw pattern it represents.
Table 13: Regular Expression Special Patterns
Posix Escape Pattern Meaning
[:alnum:] [A-Za-z0-9] Alphanumeric characters
\w [A-Za-z0-9_] Alphanumeric characters plus "_"
\W [^A-Za-z0-9_] Non-word characters
[:alpha:] \a [A-Za-z] Alphabetic characters
[:blank:] \s, \t Space and tab
\b Word boundaries
[:cntrl:] [\x00-\x1F\x7F] Control characters
[:digit:] \d [0-9] Digits
\D [^0-9] Non-digits
[:graph:] [\x21-\x7E] Visible characters
[:lower:] \l [a-z] Lowercase letters
[:print:] \p [\x20-\x7E] Visible characters and the space character
Chapter 9: Regular Expressions 97
1 SSN 3
-------- -------------- -------
Original 123-34-1322 Valid
Original ABC-34-9999 Invalid
Original X123-44-0001 Invalid
Original 123X-Y44-Z0001 Invalid
Original 111-222-111 Invalid
Posix 123-34-1322 Valid
Posix ABC-34-9999 Invalid
Posix X123-44-0001 Invalid
Posix 123X-Y44-Z0001 Invalid
Posix 111-222-111 Invalid
Negating Patterns
Up to this point in time, the patterns that have been used are looking for
a positive match. In some cases, you may want to find values that do not
Chapter 9: Regular Expressions 98
No records found.
Adding the NOT modifier in front of the REGEXP function gives us the
stations that do not begin with West.
SELECT STATION FROM CENTRAL_LINE
WHERE NOT REGEXP_LIKE(STATION,'^West$');
STATION
--------------------
West Ruislip
Ruislip Gardens
South Ruislip
Northolt
Greenford
Perivale
Hanger Lane
Ealing Broadway
West Acton
North Acton
You can also negate some of the searches in a pattern by using the [^...]
syntax where the ^ tells the regular expression not to match the
following characters. The expression [^0-9] would mean match any
characters which are not numeric.
However, regular expressions have something called negative
lookarounds which basically mean find the pattern which does not
match. You create this pattern by adding the (?!..) at the beginning of the
string. The same query (finding stations that don't start with West) would
be written with this lookaround logic found in the SQL below.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'^(?!West)');
STATION
--------------------
Ruislip Gardens
South Ruislip
Northolt
Greenford
Perivale
Hanger Lane
Ealing Broadway
Chapter 9: Regular Expressions 99
North Acton
East Acton
White City
Capturing Parenthesis
The previous example used something called a negative lookaround with
capturing parenthesis. When you place a pattern within a set of brackets
(...) the string that matches this pattern is "remembered". The strings
that are matched can be used in subsequent parts of your regular
expression. This allows a form of programming within your regular
expression!
Each set of parentheses that are matched are associated with a number,
starting at one and incrementing for each subsequent pattern match. For
instance, the following pattern will have three matches:
^([0-9]{3})-([0-9]{3})-([0-9]{3})$
This is like the SSN example used earlier on in this section. The difference
in this example is that each block of numbers is the same (3 digits). This
pattern will match any sequence of numbers in the format 123-456-789.
WITH SSNS(SSN) AS (
VALUES
'123-456-789',
'123-555-123',
'890-533-098',
'123-456-456'
)
SELECT SSN,
CASE
WHEN REGEXP_LIKE(SSN,'^([0-9]{3})-([0-9]{3})-([0-9]{3})$')
THEN 'Valid'
ELSE 'Invalid'
END
FROM SSNS;
SSN 2
----------- -------
123-456-789 Valid
123-555-123 Valid
890-533-098 Valid
123-456-456 Valid
These numbers fit the pattern and should be valid. When one of the
capturing parenthesis matches, it will remember the string that it
matched. For instance, in the first example (123-456-789), the first match
will find the string '123'. The second match will find '456' and so on. We
can refer to these matched strings with the special control characters \n
Chapter 9: Regular Expressions 100
SSN 2
----------- -------
123-456-789 Valid
123-555-123 Invalid
890-533-098 Valid
123-456-456 Valid
In many cases, it may be easier to find the patterns that match and then
negate the REGEXP statement!
The (?...) syntax is used for a variety of purposes in regular expressions:
Table 14: Capturing Parentheses
Pattern Description Details
(?: ... ) Non-capturing Groups the included pattern, but does not provide
parentheses capturing of matching text. More efficient than
capturing parentheses.
(?> ... ) Atomic-match First match of the parenthesized subexpression is
parentheses the only one tried. If it does not lead to an overall
pattern match, back up the search for a match to a
position before the "(?>"
(?# ... ) Free-format (?# comment )
comment
(?= ... ) Look-ahead True if the parenthesized pattern matches at the
assertion current input position, but does not advance the
input position.
(?! ... ) Negative look-ahead True if the parenthesized pattern does not match
assertion at the current input position. Does not advance the
input position.
Chapter 9: Regular Expressions 101
STATION
--------------------
West Ruislip
West Acton
The following SQL is equivalent, except that the matched pattern is not
kept for future use in matching.
SELECT STATION FROM CENTRAL_LINE
WHERE REGEXP_LIKE(STATION,'(?:West)');
STATION
--------------------
West Ruislip
West Acton
10
Statistical Functions
Statistical Functions
Db2 already has a variety of Statistical functions built in. In Db2 11.1, a
number of new functions have been added including:
• COVARIANCE_SAMP – Sample covariance of a set of number pairs
• STDDEV_SAMP – Sample standard deviation
• VARIANCE_SAMP – Sample variance of a set of numbers
• CUME_DIST - Cumulative Distribution
• PERCENT_RANK - Percentile rank
• PERCENTILE_DISC and PERCENTILE_CONT – Percentiles
• MEDIAN
• WIDTH_BUCKET
Sampling Functions
The traditional VARIANCE, COVARIANCE, and STDDEV functions have
been available in Db2 for a long time. When computing these values, the
formulae assume that the entire population has been counted (N). The
traditional formula for standard deviation is:
+
1
𝜎 = 𝑥' − 𝜇 *
𝑁
',-
N refers to the size of the population and in many cases, we only have a
sample, not the entire population of values.
In this case, the formula needs to be adjusted to account for the
sampling.
+
1
𝑠 = 𝑥' − 𝑥 *
𝑁−1
',-
Chapter 10: Statistical Functions 104
1
------------------------------------------
5428750
STDDEV_SAMP returns the sample standard deviation (division by [n-1])
of a set of numbers.
SELECT STDDEV_SAMP(SALARY)
FROM EMPLOYEE
WHERE WORKDEPT = 'A00';
1
------------------------------------------
46863.03180546474203969595277486485
VARIANCE_SAMP returns the sample variance (division by [n-1]) of a set
of numbers.
SELECT VARIANCE_SAMP(SALARY)
FROM EMPLOYEE
WHERE WORKDEPT = 'A00';
1
------------------------------------------
2196143750
The MEDIAN column function returns the median value in a set of values.
SELECT MEDIAN(SALARY) AS MEDIAN, AVG(SALARY) AS AVERAGE
FROM EMPLOYEE WHERE WORKDEPT = 'E21';
MEDIAN AVERAGE
----------------------------- ------------------------------------
41895 47086.66666666666666666666666666667
CUME_DIST returns the cumulative distribution of a row that is
hypothetically inserted into a group of rows.
SELECT CUME_DIST(47000) WITHIN GROUP (ORDER BY SALARY)
FROM EMPLOYEE WHERE WORKDEPT = 'A00';
1
------------------------------------------
0.5
The PERCENT_RANK column function returns the relative percentile rank
of a row that is hypothetically inserted into a group of rows.
SELECT PERCENT_RANK(47000) WITHIN GROUP (ORDER BY SALARY)
FROM EMPLOYEE WHERE WORKDEPT = 'A00';
1
------------------------------------------
0.4
The PERCENTILE_DISC returns the value that corresponds to the specified
percentile given a sort specification by using discrete (DISC) or continuous
(CONT) distribution.
SELECT PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY SALARY)
FROM EMPLOYEE WHERE WORKDEPT = 'E21';
1
-----------
45370.00
1
----------------------
4.49875000000000E+004
Chapter 10: Statistical Functions 106
EMPNO SALARY 3
------ ----------- -----------
000010 152750.00 14
000020 94250.00 12
000030 98250.00 13
000050 80175.00 10
000060 72250.00 8
000070 96170.00 13
000090 89750.00 11
000100 86150.00 11
000110 66500.00 7
000120 49250.00 3
We can plot this information by adding some more details to the
bucket output.
WITH BUCKETS(EMPNO, SALARY, BNO) AS (SELECT EMPNO, SALARY,
WIDTH_BUCKET(SALARY, 35000, 100000, 13) aS BUCKET
FROM EMPLOYEE ORDER BY EMPNO)
SELECT BNO, COUNT(*) FROM BUCKETS
GROUP BY BNO
ORDER BY BNO ASC;
BNO 2
----------- -----------
0 1
1 10
2 3
3 8
4 2
5 2
6 2
7 5
8 2
10 1
Chapter 10: Statistical Functions 107
11
Hashing Functions
Hashing Functions
Hashing functions are typically used to take larger text strings and map
into a smaller "hashed" value. These hash values can be 4 byte (HASH4),
8 byte (HASH8) or up to 64 bytes long (HASH). The reason for using
various HASH algorithms depends on the size of the data and how many
collisions you expect to get. A HASH function does not necessarily result
in a unique value for every string that is hashed. To avoid collisions, a
large HASH value is often used. The tradeoff is the amount of compute
power required to generate the HASH value as well as the size of the
HASH key.
HASH values can be used for a variety of purposes. HASH values can be
used to create artificial partition keys for a table, create a checksum for
transporting the data to an application or another database, or for
indexing phrases. The application must be aware that duplicates could
occur so additional coding may be required.
The following SQL will HASH a phrase using the default (Adler) algorithm.
VALUES HASH4('Hello there');
1
-----------
409338925
Changing the phrase slightly results in a different HASH value.
VALUES HASH4('Hello therf');
1
-----------
409404462
The SQL is modified to use the CRC32 algorithm.
VALUES HASH4('Hello there',1);
1
-----------
-342989177
1
--------------------
-844407019926684877
Chapter 11: Hashing Functions 111
Supplying anything other than a zero for the second argument will result
in an error being returned.
VALUE HASH8('Hello there',1);
1
------------------------------------------------------------------
E8EA7A8D1E93E8764A84A0F3DF4644DE
726C76553E1A3FDEA29134F36E6AF2EA05EC5CCE
4E47826698BB4630FB4451010062FADBF85D61427CBDFAED7AD0F23F239BED89
567683DDBA1F5A576B68EC26F41FFBCC7E718D646839AC6C2EF746FE952CEF4CBE6DEA...
12
JSON Functions
EXTENDING DB2 TO MANIPULATE JSON
DOCUMENTS WITHIN RELATIONAL TABLES
Chapter 12: Experimental JSON Functions 113
that it is possible that you may generate a set of SQL that may not be
handled properly.
Note: These functions may change in the future to conform to the
SQL standard.
catalogued under the SYSTOOLS schema. Prior to FP2, all of the JSON
functions needed to be catalogued manually. As of FP2, the cataloguing is
done automatically for you when you upgrade the instance or the initial
installation.
If you are at a prior release of Db2 (11.1.1.1 and 10.5) all the Db2 JSON
functions have been placed into the SYSTOOLS schema. This means that
to execute any of these commands, you must prefix the command with
SYSTOOLS, as in SYSTOOLS.JSON2BSON. To remove this requirement, you
must update the CURRENT PATH value to include SYSTOOLS as part of it.
The SQL below will tell you what the current PATH is.
VALUES CURRENT PATH;
1
-------------------------------------------------
"SYSIBM","SYSFUN","SYSPROC","SYSIBMADM","BAKLARZ"
From this point on you won't need to add the SYSTOOLS schema on the
front of any of your SQL statements that refer to these Db2 JSON
functions.
the page size, you can improve the retrieval performance of JSON
columns.
This SQL will create a column that is suitable for storing JSON data:
CREATE TABLE TESTJSON
(
JSON_FIELD BLOB(4000) INLINE LENGTH 4000
);
Note that we are assuming that the size of the JSON object will not
exceed 4000 characters in size.
JSON_FIELD
--------------------
0316000000024E616D65
If you want to extract the contents of a JSON field, you must use the
BSON2JSON function.
SELECT BSON2JSON(JSON_FIELD) FROM TESTJSON;
1
--------------------
{"Name":"George"}
One thing that you will notice is that the retrieved JSON has been
modified slightly so that all the values have quotes around them to avoid
any ambiguity. This is due to the conversion to BSON format and back to
JSON. Note that we didn't necessarily require the quotes when we
inserted the data. For instance, our original JSON document contained
the following field:
{Name:"George"}
Chapter 12: Experimental JSON Functions 118
You must ensure that the naming of any fields is consistent between
documents. "Name", "name", and "Name" are all considered different
fields. One option is to use lowercase field names, or to use camel-case
(first letter is capitalized) in all your field definitions. The important thing
is to keep the naming consistent so you can find the fields in
the document.
1
-----------
1
The following SQL will inject a bad value into the beginning of the JSON
field to test the results from the BSON_VALIDATE function.
UPDATE TESTJSON
SET JSON_FIELD = BLOB('!') || JSON_FIELD;
1
-----------
0
The following command will load the records into the JSON_EMP table.
Only the first INSERT is displayed, but there is a total of 42 records
included. A Db2 script file (DB2-V11-JSON-Examples.sql) containing all the
JSON examples can be found in the same directory as this eBook. See the
Appendix for more details on how to get a copy of this file.
INSERT INTO JSON_EMP(EMP_DATA) VALUES JSON2BSON(
'{
"empno":"000070",
"firstnme":"EVA",
"midinit":"D",
"lastname":"PULASKI",
"workdept":"D21",
"phoneno":[7831,1422,4567],
"hiredate":"09/30/2005",
"job":"MANAGER",
"edlevel":16,
"sex":"F",
"birthdate":"05/26/2003",
"pay": {
"salary":96170.00,
"bonus":700.00,
"comm":2893.00
}
}');
1 2
-------------------- ------------------------------------------
YAMAMOTO 64680
If the size of the field being returned is larger than the field specification,
you will get a NULL value returned, not a truncated value.
SELECT JSON_VAL(EMP_DATA,'lastname','s:7')
FROM JSON_EMP
WHERE
JSON_VAL(EMP_DATA,'empno','s:6') = '200170';
1
--------------------
-
Chapter 12: Experimental JSON Functions 123
In the case of character fields, you may need to specify a larger return
size and then truncate it to get a subset of the data.
SELECT LEFT(JSON_VAL(EMP_DATA,'lastname','s:20'),7)
FROM JSON_EMP
WHERE
JSON_VAL(EMP_DATA,'empno','s:6') = '200170';
1
-------
YAMAMOT
1
-----------
3978
If you specify ":na" after the type specifier, you will get an error if the
field is an array type. Hopefully you already know the format of your
JSON data and can avoid having to check to see if arrays exist. What this
statement will tell you is that one of the records you were attempting to
retrieve was an array type. In fact, all the phone extensions are being
treated as array types even though they have only one value in many
cases.
SELECT JSON_VAL(EMP_DATA, 'phoneno', 'i:na') FROM JSON_EMP;
If you need to access a specific array element in a field, you can use the
"dot" notation after the field name. The first element starts at zero. If we
select the 2nd element (.1) all the employees that have a second
extension will have a value retrieved while the ones who don't will have a
null value.
Chapter 12: Experimental JSON Functions 124
To retrieve these three fields, you need to explicitly name them since
retrieving pay alone will not work.
SELECT JSON_VAL(EMP_DATA,'pay.salary','i'),
JSON_VAL(EMP_DATA,'pay.bonus','i'),
JSON_VAL(EMP_DATA,'pay.comm','i')
FROM JSON_EMP
WHERE
JSON_VAL(EMP_DATA,'empno','s:6') = '200170';
1 2 3
----------- ----------- -----------
64680 500 1974
If you attempt to retrieve the pay field, you will end up with a NULL
value, not an error code. The reason for this is that the JSON_VAL
function cannot format the field into an atomic value so it returns the
NULL value instead.
SELECT JSON_VAL(EMP_DATA,'lastname','s:30'),
JSON_VAL(EMP_DATA,'midinit','u')
FROM JSON_EMP;
The results contain 40 employees who have a middle initial, and two that
do not. The results can be misleading because an employee can have the
midinit field defined, but no value assigned to it:
{
"empno":"000120",
"firstnme":"SEAN",
"midinit":"",
"lastname":"O''CONNELL",...
}
In this case, the employee does not have a middle name, but the field is
present. To determine whether an employee does not have a middle
name, you will need to check for a NULL value (the field does not exist, or
the field is empty) when retrieving the middle initial (9 rows):
SELECT COUNT(*) FROM JSON_EMP
WHERE JSON_VAL(EMP_DATA,'midinit','s:40') IS NULL;
If you only want to know how many employees have the middle initial
field (midinit) that is empty, you need to exclude the records that do not
contain the field (7 rows):
SELECT COUNT(*) FROM JSON_EMP
WHERE JSON_VAL(EMP_DATA,'midinit','s:40') IS NULL AND
JSON_VAL(EMP_DATA,'midinit','u') IS NOT NULL;
You need to ensure that the data types from both JSON functions are
compatible for the join to work properly. In this case, the department
number and the work department are both returned as 3-byte character
strings. If you decided to use integers instead or a smaller string size, the
join will not work as expected because the conversion will result in
truncated or NULL values.
If you plan on doing joins between JSON objects, you may want to
consider created indexes on the documents to speed up the join process.
More information on the use of indexes is found at the end of this
chapter.
ID TYPE ID TYPE
7 Object id 16 32-bit integer
8 Boolean 17 Timestamp
9 Date 18 64-bit integer
The next SQL statement will create a table with standard types within it.
CREATE TABLE TYPES
(DATA BLOB(4000) INLINE LENGTH 4000);
The following SQL will generate a list of data types and field names found
within this document.
SELECT 'STRING',JSON_TYPE(DATA, 'string', 2048) FROM TYPES
UNION ALL
SELECT 'INTEGER',JSON_TYPE(DATA, 'integer', 2048) FROM TYPES
UNION ALL
SELECT 'NUMBER',JSON_TYPE(DATA, 'number', 2048) FROM TYPES
UNION ALL
SELECT 'DATE',JSON_TYPE(DATA, 'date', 2048) FROM TYPES
UNION ALL
SELECT 'BOOLEAN', JSON_TYPE(DATA, 'boolean', 2048) FROM TYPES
UNION ALL
SELECT 'ARRAY', JSON_TYPE(DATA, 'array', 2048) FROM TYPES
UNION ALL
SELECT 'OBJECT', JSON_TYPE(DATA, 'object', 2048) FROM TYPES;
1 2
--------- -----------
ARRAY 4
BOOLEAN 8
DATE 9
NUMBER 1
INTEGER 16
STRING 2
OBJECT 3
Chapter 12: Experimental JSON Functions 128
The JSON_TYPE function will verify that this is an integer field (Type=16).
SELECT JSON_TYPE(DATA,'count',2048) AS TYPE
FROM SANDBOX;
TYPE
-----------
16
Chapter 12: Experimental JSON Functions 129
You can retrieve an integer value with either the 'i' flag or the 'l' flag. This
first SQL statement retrieves the value as an integer.
SELECT JSON_VAL(DATA,'count','i') FROM SANDBOX;
1
-----------
9782333
We can ask that the value be interpreted as a BIGINT by using the 'l' flag,
so JSON_VAL will expand the size of the return value.
SELECT JSON_VAL(DATA,'count','l') FROM SANDBOX;
1
--------------------
9782333
The next SQL statement will create a field with a BIGINT size. Note that
we don't need to specify anything other than have a very big number!
DELETE FROM SANDBOX;
The JSON_TYPE function will verify that this is a big integer field
(Type=18).
SELECT JSON_TYPE(DATA,'count',2048) AS TYPE
FROM SANDBOX;
TYPE
-----------
18
We can check to see that the data is stored in the document as a BIGINT
by using the JSON_TYPE function.
SELECT JSON_TYPE(DATA,'count',2048) FROM SANDBOX;
1
-----------
18
Returning the data as an integer type 'i' will fail since the number is too
big to fit into an integer format. Note that you do not get an error
message - a NULL value gets returned.
Chapter 12: Experimental JSON Functions 130
1
-----------
-
Specifying the 'I' flag will make the data be returned properly.
SELECT JSON_VAL(DATA,'count','l') FROM SANDBOX;
1
--------------------
94123512223422
Since we have an integer in the JSON field, we also have the option of
returning the value as a floating-point number (f) or as a decimal number
(n). Either of these options will work with integer values.
SELECT JSON_VAL(DATA,'count','n') AS DECIMAL,
JSON_VAL(DATA,'count','f') AS FLOAT
FROM SANDBOX;
DECIMAL FLOAT
---------------------------------- ----------------------
94123512223422 9.41235122234220E+013
The JSON_TYPE function will verify that this is a numeric field (Type=1).
SELECT JSON_TYPE(DATA,'salary',2048) AS TYPE
FROM SANDBOX;
TYPE
-----------
1
Chapter 12: Experimental JSON Functions 131
You may wonder why number format (n) results in an answer that has a
fractional component that isn't exactly 92342.20. The reason is that Db2
is converting the value to DECFLOAT(34) which supports a higher
precision number, but can result in fractions that can't be accurately
represented within the binary format. Casting the value to DEC(9,2) will
properly format the number.
SELECT DEC(JSON_VAL(DATA,'salary','n'),9,2) AS DECIMAL
FROM SANDBOX;
DECIMAL
-----------
92342.20
TYPE
-----------
1
TYPE
-----------
8
You can also retrieve a Boolean field as a character or binary field, but the
results are not what you would expect with binary.
Chapter 12: Experimental JSON Functions 133
Querying the data type of this field using JSON_VAL will return a value of
9 (date type).
SELECT JSON_TYPE(DATA,'today',2048) FROM SANDBOX;
1
-----------
9
If you decide to use a character string to represent a date, you can use
either the "s:x" specification to return the date as a string, or use "d" to
have it displayed as a date. This first SQL statement returns the date as a
string.
INSERT INTO SANDBOX VALUES
JSON2BSON('{"today":"2016-07-01"}');
1
----------
2016-07-01
1
----------
2016-07-01
Chapter 12: Experimental JSON Functions 134
1
------------------------------
2016-09-17-06.27.00.945000
Retrieving it as a Date will also work, but the time portion will be
removed.
SELECT JSON_VAL(DATA,'today','d') FROM SANDBOX;
1
----------
2016-09-17
You can also ask for the timestamp value by using the 'ts' specification.
Note that you can't get just the time portion unless you use a SQL
function to cast it.
SELECT JSON_VAL(DATA,'today','ts') FROM SANDBOX;
1
--------------------------
2016-09-17 06:27:00.945000
To force the value to return just the time portion, either store the data as
a time value (HH:MM:SS) string or store a timestamp and use the TIME
function to extract just that portion of the timestamp.
SELECT TIME(JSON_VAL(DATA,'today','ts')) FROM SANDBOX;
1
--------
06:27:00
JSON Strings
For character strings, you must specify what the maximum length is. This
example will return the size of the lastname field as 10 characters long.
Chapter 12: Experimental JSON Functions 135
1
----------
HAAS
You must specify a length for the 's' parameter otherwise you will get an
error from the function. If the size of the character string is too large to
return, then the function will return a null value for that field.
SELECT JSON_VAL(DATA, 'lastname', 's:8') FROM JSON_EMP;
1
--------
HAAS
JSON_TABLE Function
The following query works because we do not treat the field phoneno as
an array:
SELECT JSON_VAL(DATA, 'phoneno', 'i') FROM JSON_EMP;
1
-----------
3978
By default, only the first number of an array is returned when you use
JSON_VAL. However, there will be situations where you do want to
return all the values in an array. This is where the JSON_TABLE function
must be used.
The format of the JSON_TABLE function is:
JSON_TABLE(document, field, type)
ID TYPE ID TYPE
1 Double 10 Null
2 String 11 Regular Expression
3 Object 12 Future use
4 Array 13 JavaScript
5 Binary data 14 Symbol
6 Undefined 15 Javascript (with scope)
7 Object id 16 32-bit integer
8 Boolean 17 Timestamp
9 Date 18 64-bit integer
The TYPE field is probably something you wouldn't require as part of your
queries since you are already specifying the return type in the function.
The format of the JSON_TABLE function is like JSON_VAL except that it
returns a table of values. You must use this function as part of FROM
clause and a table function specification. For example, to return the
contents of the phone extension array for just one employee (000230)
we can use the following JSON_TABLE function.
SELECT PHONES.* FROM JSON_EMP E,
TABLE( JSON_TABLE(E.EMP_DATA,'phoneno','i') ) AS PHONES
WHERE JSON_VAL(E.EMP_DATA,'empno','s:6') = '000230';
TYPE VALUE
----------- --------------------
16 2094
16 8999
16 3756
The TABLE( ... ) specification in the FROM clause is used for table
functions. The results that are returned from the TABLE function are
treated the same as a traditional table.
To create a query that gives the name of every employee and their
extensions would require the following query.
SELECT JSON_VAL(E.EMP_DATA, 'lastname', 's:10') AS LASTNAME,
PHONES.VALUE AS PHONE
FROM JSON_EMP E,
TABLE( JSON_TABLE(E.EMP_DATA,'phoneno','i') ) AS PHONES;
LASTNAME PHONE
---------- --------------------
HAAS 3978
THOMPSON 3476
THOMPSON 1422
Chapter 12: Experimental JSON Functions 137
KWAN 4738
GEYER 6789
STERN 6423
STERN 2433
PULASKI 7831
PULASKI 1422
PULASKI 4567
Only a subset of the results is shown above, but you will see that there
are multiple lines for employees who have more than one extension.
The results of a TABLE function must be named (AS ...) if you need to
refer to the results of the TABLE function in the SELECT list or in other
parts of the SQL.
You can use other SQL operators to sort or organize the results. For
instance, we can use the ORDER BY operator to find out which employees
have the same extension. Note how the TABLE function is named
PHONES and the VALUES column is renamed to PHONE.
SELECT JSON_VAL(E.EMP_DATA, 'lastname', 's:10') AS LASTNAME,
PHONES.VALUE AS PHONE
FROM JSON_EMP E,
TABLE( JSON_TABLE(E.EMP_DATA,'phoneno','i') ) AS PHONES
ORDER BY PHONE;
LASTNAME PHONE
---------- --------------------
THOMPSON 1422
PULASKI 1422
SCHNEIDER 1422
O'CONNELL 1533
MEHTA 1533
ALONZO 1533
SCOUTTEN 1682
ORLANDO 1690
You can even found out how many people are sharing extensions! The
HAVING clause tells Db2 to only return groupings where there is more
than one employee with the same extension.
SELECT PHONES.VALUE AS PHONE, COUNT(*) AS COUNT
FROM JSON_EMP E,
TABLE( JSON_TABLE(E.EMP_DATA,'phoneno','i') ) AS PHONES
GROUP BY PHONES.VALUE HAVING COUNT(*) > 1
ORDER BY PHONES.VALUE
PHONE COUNT
-------------------- -----------
1422 3
1533 3
Chapter 12: Experimental JSON Functions 138
1793 2
2103 2
2167 2
2890 2
3332 2
3780 2
JSON_LEN Function
The previous example showed how we could retrieve the values from
within an array of a document. Sometimes an application needs to
determine how many values are in the array itself. The JSON_LEN
function is used to figure out what the array count is.
The format of the JSON_LEN function is:
count = JSON_LEN(document,field)
LASTNAME PHONE_COUNT
---------- -----------
HAAS 1
THOMPSON 2
KWAN 1
GEYER 1
STERN 2
PULASKI 3
HENDERSON 1
Chapter 12: Experimental JSON Functions 139
SPENSER 1
JSON_GET_POS_ARR_INDEX Function
The JSON_TABLE and JSON_LEN functions can be used to retrieve all the
values from an array, but searching for a specific array value is difficult to
do. One way to search array values is to extract everything using the
JSON_TABLE function.
SELECT JSON_VAL(E.EMP_DATA, 'lastname', 's:10') AS LASTNAME,
PHONES.VALUE AS PHONE
FROM JSON_EMP E,
TABLE( JSON_TABLE(E.EMP_DATA,'phoneno','i') ) AS PHONES
WHERE PHONES.VALUE = 1422;
This function only tests for equivalence and the data type should match
what is already in the field. The return value is the position within the
array that the value was found, where the first element starts at zero.
In our JSON_EMP table, each employee has one or more phone numbers.
The following SQL will retrieve all employees who have the extension
1422:
SELECT JSON_VAL(EMP_DATA, 'lastname', 's:10') AS LASTNAME
FROM JSON_EMP
Chapter 12: Experimental JSON Functions 140
WHERE JSON_GET_POS_ARR_INDEX(EMP_DATA,
JSON2BSON('{"phoneno":1422}')) >= 0;
LASTNAME
----------
THOMPSON
PULASKI
SCHNEIDER
If we used quotes around the phone number, the function will not match
any of the values in the table.
Retrieving the document shows that the lastname field has now been
added to the record.
SELECT BSON2JSON(DATA) FROM SANDBOX;
1
----------------------------------------------
{"phone":[1111,2222,3333],"lastname":"HAAS"}
If you specify a field that is an array type and do not specify an element,
you will end up replacing the entire field with the value.
UPDATE SANDBOX
SET DATA =
JSON_UPDATE(DATA,'{ $set: {"phone":9999}}');
1
----------------------------------
{"phone":9999,"lastname":"HAAS"}
Running the SQL against the original phone data will work properly.
UPDATE SANDBOX
SET DATA =
JSON_UPDATE(DATA,'{ $set: {"phone.0":9999}}');
1
------------------------------------------------------------
{"phone":[9999,2222,3333]}
To delete a field, you must use $unset instead of $set and use null as
the value for the field. To remove the lastname field from the record:
UPDATE SANDBOX
SET DATA =
JSON_UPDATE(DATA,'{ $unset: {"lastname":null}}');
Chapter 12: Experimental JSON Functions 142
LASTNAME
--------------------
HAAS
Cost = 13.628412
Rows
Operator
(ID)
Cost
1.68
RETURN
( 1)
13.6284
|
1.68
TBSCAN
( 2)
13.6283
|
42
Table:
BAKLARZ
JSON_EMP
Rows
Operator
(ID)
Cost
1.68
RETURN
( 1)
Chapter 12: Experimental JSON Functions 143
6.81141
|
1.68
FETCH
( 2)
6.8113
/ \
1.68 42
IXSCAN Table:
( 3) BAKLARZ
0.00484089 JSON_EMP
|
42
Index:
BAKLARZ
IX_JSON
Db2 can now use the index to retrieve the record.
From a debugging perspective, we can keep both the CLOB and BLOB
values in this table if we want. The trigger will set the JSON column to
null after the BSON column has been populated.
CREATE TABLE BASE_EMP_TXS (
SEQNO INT NOT NULL GENERATED ALWAYS AS IDENTITY,
INFO VARCHAR(4000),
BSONINFO BLOB(4000) INLINE LENGTH 4000
);
On UPDATES, the sequence number remains the same, and the BSON
field is updated with the contents of the JSON field.
CREATE OR REPLACE TRIGGER U_EMP_TXS
INSTEAD OF UPDATE ON EMP_TXS
REFERENCING NEW AS NEW_TXS OLD AS OLD_TXS
FOR EACH ROW MODE DB2SQL
BEGIN ATOMIC
UPDATE BASE_EMP_TXS
SET (INFO, BSONINFO) = (NULL,
SYSTOOLS.JSON2BSON(NEW_TXS.INFO))
WHERE
BASE_EMP_TXS.SEQNO = OLD_TXS.SEQNO;
END
Chapter 12: Experimental JSON Functions 145
Applications will only deal with the EMP_TXS view. Any inserts will use
the text version of the JSON and not have to worry about using the
JSON2BSON function since the underlying INSTEAD OF trigger will take
care of the conversion.
The following insert statement only includes the JSON string since the
sequence number will be generated automatically as part of the insert.
INSERT INTO EMP_TXS(INFO) VALUES (
'{
"empno":"000010",
"firstnme":"CHRISTINE",
"midinit":"I",
"lastname":"HAAS",
"workdept":"A00",
"phoneno":[3978],
"hiredate":"01/01/1995",
"job":"PRES",
"edlevel":18,
"sex":"F",
"birthdate":"08/24/1963",
"pay" : {
"salary":152750.00,
"bonus":1000.00,
"comm":4220.00}
}');
Selecting from the EMP_TXS view will return the JSON in a readable
format:
SELECT SEQNO, CAST(LEFT(INFO,50) AS VARCHAR(50)) FROM EMP_TXS;
SEQNO 2
----------- ---------------------------------------------------
1 {"empno":"000010","firstnme":"CHRISTINE","midinit"}
Chapter 12: Experimental JSON Functions 146
The base table only contains the BSON but the view translates the value
back into a readable format.
An update statement that replaces the entire string works as expected.
UPDATE EMP_TXS SET INFO = '{"empno":"000010"}' WHERE SEQNO = 1;
SELECT SEQNO, CAST(LEFT(INFO,50) AS VARCHAR(50)) FROM EMP_TXS;
SEQNO 2
----------- --------------------------------------------------
1 {"empno":"000010"}
If you want to manipulate the BSON directly (say change the employee
number), you need to refer to the BASE table instead.
UPDATE BASE_EMP_TXS
SET BSONINFO = JSON_UPDATE(BSONINFO,
'{$set: {"empno":"111111"}}')
WHERE SEQNO = 1;
Summary
The current Db2 11.1 release (and Db2 10.5) includes several user-
defined functions (UDFs) that were originally designed to be used by
internal JSON interfaces. Based on feedback from several customers,
we've documented what is used internally within Db2. While these
routines have not been officially published, they are available for
customer use.
Note: Remember that these functions may change in the future to
conform to the SQL standard.
A
Appendix
Appendix A: Additional Resources for Db2 148
IBM Training
IBM is committed to helping our clients achieve the skills and expertise to
take their careers to the next level. We offer a comprehensive portfolio
of technical training and education services designed for individuals,
companies, and public organizations to acquire, maintain, and optimize
their IT skills in IBM Software and IBM Systems. Visit
ibm.com/software/data/education for details and course availability.
BigData University
Learn about Db2 and various big data technologies at your pace and at
your place. Big Data University offers helpful online courses with
instructional videos and exercises to help you master new concepts.
Course completion is marked with a final exam and a certificate. Visit
bigdatauniversity.com.
Appendix A: Additional Resources for Db2 149
The Db2 11.1 release delivers several George Baklarz, B. Math, M. Sc.,
significant enhancements including Database Ph.D. Eng., has spent 31 years at
IBM working on various aspects
Partitioning Feature (DPF) for BLU columnar of database technology. George
technology, improved pureScale performance has written 10 books on Db2 and
and High Availability Disaster Recovery (HADR) other database technologies.
support, and numerous SQL features. George is currently part of the
Worldwide Core Database
Technical Sales Team.
This book was written to highlight many of the
new features and functions that are now Enzo Cialini, B.Sc., is a Senior
available in this release, without you having to Technical Staff Member and
search through various forums, blogs, and Master Inventor in the
online manuals. We hope that this book gives Worldwide Core Database
Technical Sales Team and
you more insight into what you can now formerly the Chief Quality
accomplish with Db2 11.1, and include it on Assurance Architect for Db2 &
your shortlist of databases to deploy, whether PureData in the IBM Toronto
it is on premise, in the cloud, or in Db2 Development Team. He is
virtualized environments. also a published book author and
written various papers on Db2.
Enzo has 25 years of experience
Coverage Includes: in database technology, software
development, testing, support,
• An overview of the new packaging competitive analysis and
changes in Db2 11.1, along with pre- production deployments.
requisites for installing the product
• Enhancements to Db2 BLU, including
support for BLU columnar tables in a
DPF (Data Partitioning Feature)
environment
• pureScale capabilities that provide
continuous availability in a production
environment, along with all the
enhancements that have been made to
simplify installation and management
of a cluster
• SQL and compatibility enhancements