DNFS Tech16

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

dNFS for DBA’s

Marcin Przepiorowski
December 2016
About me

Oracle consultant/DBA since 2000

co-developer of OraSASH – free ASH/AWR like repository

Blogger

© 2016 Delphix Corporation 2


Delphix Data as a Service Platform
Collect
Non-disruptively sync with source data.
Control
Mask copies. Manage all changes to
Consume
Create ten copies in the space of one.
Compress an initial copy and store only the source and its copies via a single Deliver data in minutes with powerful self-
change data. point of control. service features.
Mask Retain
Mar. 15, Branch Bookmark
3:30:15 PM!

Application Integrate
Refresh Rewind
Files

Compress Provision

10:1 10:1 10:1


Databases
Dev QA Stage

Any Storage

© 2016 Delphix Corporation 3


Agenda
1 Network
2 Configuration
3 Examples

© 2016 Delphix Corporation 4


Michael Coles https://pixabay.com/en/water-hose-garden-wet-gardening-815475/

© 2016 Delphix Corporation 5


Network

8 Gb Fiber Channel = 10 Gb Ethernet NFS

© 2016 Delphix Corporation 6


Network

Throughput is one dimension to measure.

Latency is even more important.

Latency has impact on:


- single block reads
- real throughput

© 2016 Delphix Corporation 7


100 m

Dog can run 70 km/h – 100 m in 5 sec


Dog can carry a 2 TB SSD drive

Throughput = 2 * 1024 / 5 sec = 409 GB/s


Latency – 5 sec
RTT – 10 sec
Other – RFC 1149 IP over Avian Carriers
© 2016 Delphix Corporation 8
Network

Recommended latency for (d)NFS < 1 ms


Jumbo Frames enabled

© 2016 Delphix Corporation 9


Network

1 TCP stream vs multiple TCP streams

© 2016 Delphix Corporation 10


1 TCP stream vs multiple TCP streams

http://www.livemint.com/rf/Image-621x414/LiveMint/Period1/2015/08/11/Photos/ http://www.ishn.com/ext/resources/todaysnews3/traffic-422.jpg
[email protected]

© 2016 Delphix Corporation 11


Network

Download Managers

© 2016 Delphix Corporation 12


Network

NFS opens one TCP stream


dNFS opens multiple TCP streams

© 2016 Delphix Corporation 13


Direct NFS

• dNFS support >=11g File Type   Supported  


Control file   YES  
Data file   YES  
Redo log file   YES  
• Unix/Linux, Windows Archive/Flashback log file   YES  
Backup files   YES  
Temp file   YES  
Datapump dump file   YES  
• Using ODM for file system calls OCR files   NO  
spfile   YES  
passwd file   YES  
ASM files   YES  
• Talking directly to filer Voting files   NO  
Audit files   NO  
Database trace files   NO  
External tables   YES (12c)  

© 2016 Delphix Corporation 14


dNFS configuration

Enable

$ cd $ORACLE_HOME/rdbms/lib/
$ make -f ins_rdbms.mk dnfs_on
rm -f /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so; cp /u01/app/oracle/
11.2.0.4/db1/lib/libnfsodm11.so /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so

Disable

$ cd $ORACLE_HOME/rdbms/lib/
$ make -f ins_rdbms.mk dnfs_off
rm -f /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so; cp /u01/app/oracle/
11.2.0.4/db1/rdbms/lib/libodm11.so.dummy /u01/app/oracle/11.2.0.4/db1/lib/
libodm11.s

© 2016 Delphix Corporation 15


dNFS configuration

§ There is no configuration required for basic use case.

§ Configuration file oranfstab is optional

§ But sometimes ALTER DATABASE MOUNT ends up with

ORA-00600: internal error code

UEK Kernel bug ID 1460787.1

© 2016 Delphix Corporation 16


dNFS configuration

$ORACLE_HOME/dbs/oranfstab

/etc/oranfstab

/etc/mtab

server: Delphix # This is only name


path: 192.168.166.141 # IP of NFS server
local: 192.168.166.142 # IP of interface on DB server
export: /oraclenfs mount: /oradata1 # mount points

© 2016 Delphix Corporation 17


dNFS configuration

§ dNFS is looking for each data file path in oranfstab

§ If not found, information from /etc/mtab is used

§ Multipath configuration requires an entry about all file systems in oranfstab

§ File systems without matching entry won’t use multipath

© 2016 Delphix Corporation 18


dNFS configuration

Multipath

server:DE
local:172.16.169.142 path:172.16.169.141
local:192.168.166.24 path:192.168.166.23
export:/orc_timeflow-79/datafile mount:/mnt/provision/SLOB/datafile
export:/orc_timeflow-79/temp mount:/mnt/provision/SLOB/temp
export:/orc_timeflow-79/archive mount:/mnt/provision/SLOB/archive

© 2016 Delphix Corporation 19


dNFS configuration

§ MOS
- Recommended Patches for Direct NFS Client (Doc ID 1495104.1)
11.2.0.4 and 12.1.0.2 looks OK
- How To Setup DNFS (Direct NFS) On Oracle Release 11.2 (Doc ID 1452614.1)
- Step by Step - Configure Direct NFS Client (DNFS) on Linux (11g) (Doc ID 762374.1)
- Step by Step - Configure Direct NFS Client (DNFS) on Windows (Doc ID 1468114.1)

§ Blogs
- http://blog.oracle48.nl/wordpress/direct-nfs-configuring-and-network-considerations-in-practise/
- http://www.slideshare.net/yvelikanov/sharing-experience-implementing-direct-nfs#

© 2016 Delphix Corporation 20


dNFS configuration

§ TCP stack parameters


- Oracle recommend increase buffers to 4 MB
- This is not enough for fast LAN’s (10 Gb)
- Other settings are recommended as well

§ For fast LAN network 16 MB buffers can be fully utilized

§ Some systems require a change of NFS block size

https://docs.oracle.com/database/122/LADBI/checking-tcp-network-protocol-buffer-for-direct-nfs-client.htm

© 2016 Delphix Corporation 21


dNFS configuration

Linux ( Red Hat >= 6.3 )

net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_rmem = 4096 16777216 16777216
net.ipv4.tcp_wmem = 4096 4194304 16777216

https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance

© 2016 Delphix Corporation 22


dNFS configuration

Solaris 11

ipadm set-prop -p max_buf=16777216 tcp


ipadm set-prop -p _cwnd_max=4194304 tcp
ipadm set-prop -p send_buf=4194304 tcp
ipadm set-prop -p recv_buf=16777216 tcp

NFS block size – change from 32 kB to 1MB


/etc/system
set nfs:nfs3_bsize=0x100000
https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance

© 2016 Delphix Corporation 23


dNFS configuration

AIX
Disable delayed ACK
tcp_nodelayack=1

NFS
nfs_max_read_size=524288
nfs_max_write_size=524288

Fix for 64k NFS block size


6.1 IV24594
7.1 IV24688
https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance

© 2016 Delphix Corporation 24


Monitoring

NFS

nfsiostat, netstat, ss, wireshark

dNFS

v$dnfs_stats, v$dnfs_channels

netstat, ss, wireshark

© 2016 Delphix Corporation 25


Examples
IOPS
dNFS dNFS NFS NFS

80,000.00

70,000.00

60,000.00

50,000.00
IOPS

40,000.00

30,000.00

20,000.00

10,000.00

0.00
1 2 4 8 16 24 32 40 48

Number of SLOB processes

© 2016 Delphix Corporation 27


IOPS – response time – 40 processes

1.0 ms

0.61 ms

© 2016 Delphix Corporation 28


IOPS CPU utilization

dNFS NFS

40 k IOPS

© 2016 Delphix Corporation 29


Impact of Oracle parameters

FILESYSTEMIO_OPTIONS

- SETALL

- NONE / ASYNCH

NFS can use OS cache depend on value of parameter

dNFS is not using OS file system and OS cache

© 2016 Delphix Corporation 30


IOPS
NFS big dataset NFS small dataset dNFS big dataset dNFS small dataset

700,000.00

600,000.00

500,000.00

400,000.00
IOPS

300,000.00

200,000.00

100,000.00

0.00
1 2 4 8 16 24 32 40 48

Number of SLOB processes

© 2016 Delphix Corporation 31


INVESTIGATION
Real life example
Block #2: noparallel FULL table scan NFS dNFS

Stat: query - elapsed (s) : 57.43 146.04

Stat: query - row count : 648,300 648,300


Stat: query - physical reads (MB/s) : 176.38 69.362

Block #3: parallel FULL table scans…

Stat: parallel4 - MB/s : 179.43 260.75


Stat: parallel8 - MB/s : 177.24 450.42
Stat: parallel12 - MB/s : 173.45 459.39
Stat: parallel16 - MB/s : 171.98 453.24

© 2016 Delphix Corporation 33


Examples - NFS - RedHat 6.8 UEK 3.8.13
Block #2: noparallel FULL table scan

Stat: query - elapsed (s) : 39.49


Stat: query - row count : 5,067,786
Stat: query - physical reads (MB/s) : 1,002.59
Block #3: parallel FULL table scans...

Stat: parallel4 - MB/s : 1,174.03


Stat: parallel8 - MB/s : 1,177.87
Stat: parallel12 - MB/s : 1,173.68
Stat: parallel16 - MB/s : 1,176.12

© 2016 Delphix Corporation 34


Examples - dNFS - RedHat 6.8 UEK 3.8.13
Block #2: noparallel FULL table scan...

Stat: query - elapsed (s) : 74.71

Stat: query - row count : 5,067,786

Stat: query - physical reads (MB/s) : 529.943

Block #3: parallel FULL table scans...

Stat: parallel4 - MB/s : 1,013.95

Stat: parallel8 - MB/s : 1,152.86

Stat: parallel12 - MB/s : 1,169.18

Stat: parallel16 - MB/s : 1,174.38

© 2016 Delphix Corporation 35


Example – table full scan

NFS
Tota Wait % DB
Event Waits Time Avg(ms) time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
direct path read 384,380 1341 3 97.4 User I/O
DB CPU 35.1 2.5

dNFS
Tota Wait % DB
Event Waits Time Avg(ms) time Wait Class
------------------------------ ------------ ---- ------- ------ ----------
direct path read 164,147 1402 9 98.8 User I/O
DB CPU 119. 8.4

© 2016 Delphix Corporation 36


© 2016 Delphix Corporation 37
Investigation

§ Top function : copy_user_generic_unrolled

§ It used when there is no optimization on CPU level

§ Fast string operations ( Enhanced REP MOVSB/SROSB) are unsupported.

Linux version 3.8.13-118.14.1.el6uek.x86_64 (mockbuild@x86-ol6-builder-04)


(gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #2 SMP Mon Oct 31
17:33:13 PDT 2016
Command line: ro root=/dev/mapper/vg_dnfstargetdb1-lv_root rd_NO_LUKS
Disabled fast string operations

© 2016 Delphix Corporation 38


Investigation

§ perf record -g -F9999 -p 6101

§ perf script | gzip > dnfs2.txt.gz

§ stackcollapse-perf.pl dnfs2.txt > dnfs2_1.folded

§ flamegraph.pl dnfs2_1.folded > dnfs2_1.svg

© 2016 Delphix Corporation 39


Top function

OS call

OS call

© 2016 Delphix Corporation 40


http://www.newsweek.pl/biznes/animacja-reklamy,artykuly,41747,1,1,1.html

© 2016 Delphix Corporation 41


Investigation - Network

“bandwidth-delay product refers to the product of a data link's capacity


(in bits per second) and its round-trip delay time (in seconds)”

https://en.wikipedia.org/wiki/Bandwidth-delay_product

© 2016 Delphix Corporation 42


Investigation - Network

If

BDP (in bytes) > TCP window

then the TCP session will not be able to use all of the available bandwidth

Bandwidth = (window size *8)/RTT

© 2016 Delphix Corporation 43


Bytes in flight
600000

500000

400000
Bytes

300000 NFS
dNFS
200000

100000

0
1
6
11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
© 2016 Delphix Corporation 44
Investigation - Network

TCP settings on OS level are some for both runs

and TCP windows can be as big as 16M

net.ipv4.tcp_rmem = 4096 16777216 16777216


net.ipv4.tcp_wmem = 4096 4194304 16777216

© 2016 Delphix Corporation 45


Investigation - Network

NFS

LADDR LPORT RADDR RPORT SWND CWND RWND


192.168.166.23 2049 192.168.166.24 971 8379904 1744860 4196612

dNFS

LADDR LPORT RADDR RPORT SWND CWND RWND


192.168.166.23 2049 192.168.166.24 36189 71608 44740 4196612
192.168.166.23 2049 192.168.166.24 48924 247052 16777216 4196612
192.168.166.23 2049 192.168.166.24 50202 247052 16777216 4196612
192.168.166.23 2049 192.168.166.24 38596 247052 16777216 4196612

© 2016 Delphix Corporation 46


Investigation - Network

strace output of Oracle process

setsockopt(32, SOL_SOCKET, SO_SNDBUF, [262144], 4) = 0


setsockopt(32, SOL_SOCKET, SO_RCVBUF, [262144], 4) = 0

bind(32, {sa_family=AF_INET, sin_port=htons(0),


sin_addr=inet_addr("192.168.166.24")}, 16) = 0

connect(32, {sa_family=AF_INET, sin_port=htons(2049),


sin_addr=inet_addr("192.168.166.23")}, 16) = 0
setsockopt(32, SOL_SOCKET, SO_SNDBUF, [1056768], 4) = 0
setsockopt(32, SOL_SOCKET, SO_RCVBUF, [1056768], 4) = 0

© 2016 Delphix Corporation 47


Investigation - Network

§ Kernel allocate 2 x SO_RCVBUF for overhead

§ Parameter tcp_adv_win_scale is controlling an overhead buffer


Default value is 1

TCP Window = buffer - buffer/2^tcp_adv_win_scale

TCP Window = 512k – 256/2^1 = 256k

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt, http://man7.org/linux/man-pages/man7/socket.7.html

© 2016 Delphix Corporation 48


Investigation - Network

tcp_adv_win_scale = 2

Stat: query - physical reads (MB/s) : 721.704

LADDR LPORT RADDR RPORT SWND CWND RWND


192.168.166.23 2049 192.168.166.24 34770 375056 16777216 4196612
192.168.166.23 2049 192.168.166.24 46897 375056 16777216 4196612

tcp_adv_win_scale = 4

Stat: query - physical reads (MB/s) : 712.985

LADDR LPORT RADDR RPORT SWND CWND RWND


192.168.166.23 2049 192.168.166.24 44832 471056 6970492 4196612
192.168.166.23 2049 192.168.166.24 39377 471056 2925996 4196612
© 2016 Delphix Corporation 49
Investigation - Latency
% of max throughput
100.00
90.00
80.00
70.00
60.00
50.00
%

NFS
40.00
dNFS
30.00
20.00
10.00
0.00
0 0.1 0.2 0.4 0.5 1 2
Additional latency

tc qdisc add dev eth1 root netem delay Xms

© 2016 Delphix Corporation 50


Who is a winner ?

It depends

Run a test with your workload


and network

© 2016 Delphix Corporation 51


Thank you for attending my session
Q&A

Marcin Przepiorowski
@pioro
[email protected]

Look on my blog for a white paper

You might also like