AIX Virtual Memory Tuning
AIX Virtual Memory Tuning
AIX Virtual Memory Tuning
2/20/2005
Template Documentation
Disclaimer
The suggestions contained in this presentation are general suggestions formulated by the author not as a recommendation from IBM. These recommendations should be carefully examined for your environment and tested rigorously prior to implementing in production. All environments differ and requirements vary given application and system nuances. Always use YOUR best judgment.
2/20/2005
Template Documentation
Unless you: Put fewer cars on the road Widen the road Reroute the cars You just move the bottleneck to a different location!
2/20/2005
Template Documentation
disk technology
point A
disk layout
tuning
adapter
2/20/2005
LTG
High thruput
Network
no nfso
# of procs Async I/O
CPU
Workload Manager
vmo
PSALLOC
Vir t u
schedo
LVM Tuning
2/20/2005
Template Documentation
nt me
System Tuning
filesystem layout
Memor al
Ma nage
ioo
AUS
es
CPU
tprof pprof sar -P -u -q ps aux ps topas
Number of processes Process-Priorities WLM managed
Memory
svmon vmtune vmo ipcs PSALLOC
Disk I/O
lvmstat iostat lvm map wlmmon wlmstat ioo
Network
netpmon no lsattr nfso entstat netstat
2/20/2005
Template Documentation
Check CPU
topas vmstat sar -q | -u | -P tprof pprof wlmstat sar -d topas iostat filemon lvmstat wlmstat wlmmon
no
yes
High CPU % no
Check memory
Check disk
no
Template Documentation
Disk I/O
Monitor: filemon, fileplace lvm mapping, lvmstat, iostat wlmstat, wlmmon Tune: ioo AIO max/min servers adapter spread, file layout
CPU/Kernel General
Monitor topas, vmstat, sar wlmstat, wlmmon xmperf / PTX tprof, pprof, nmon (download) CURT, SPLAT Tune schedo, system parms
Memory
Monitor svmon, vmstat, sar wlmstat, wlmmon tprof Tune vmo paging controls
Applications
Monitor: Profiling: tprof, pprof, Xprof fdpr CURT, SPLAT Tune: database calls file calls good programming
Networking
Monitor iptrace, ipfilter, ipreport netpmon netstat, nfsstat (entstat) Tune no, isno nfso adapters - chent
2/20/2005
Template Documentation
Network Options
Network options are set by executing the command:
no -a nfso -a
Prior to AIX 5.2, these options should be placed where they will be re-executed on boot. e.g., an /etc/rc.tune file or /etc/rc.local AIX 5.2 supports permanent and reboot values retention in /etc/tunables or /etc/tunables/nextboot | /etc/tunables/lastboot
Interface Specific Network Options (ISNO) Allows some options to be configured differently for the following network interfaces.
10/100/1000 BaseT, 10/100 BaseT ATM Gigabit Ethernet
2/20/2005 Template Documentation 9
= = = = =
20 15 12288 90 95
directed_broadcast ipignoreredirects ipsrcroutesend ipsrcrouterecv ipsrcrouteforward ip6srcrouteforward ip6_defttl ndpt_keep ndpt_reachable ndpt_retrans ndpt_probe ndpt_down ndp_umaxtries ndp_mmaxtries ip6_prune ip6forwarding multi_homed main_if6 main_site6 site6_index maxnip6q llsleep_timeout tcp_timewait tcp_ephemeral_low tcp_ephemeral_high
udp_ephemeral_low udp_ephemeral_high delayack delayackports sack use_isno
= = = = = = = = = = = = = = = = = = = = = = = = =
= = =
thewall =
sockthresh
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
1048576
85
sb_max
somaxconn
1310720
1024 0 0 1 64 7 13 16 7 25 100 8 1 50 1 300 200 0 0 131072 1 0 524288 0 1024 8 85 20 20
psecache
subnetsarelocal
maxttl
ipfragttl
1
1 255 60 1 1 30 60
arpt_killc =
arptab_nb tcp_ndebug
ifsize
arpqsize ndpqsize route_expire send_file_duration fasttimo routerevalidate nbc_limit nbc_max_cache nbc_min_cache nbc_pseg nbc_pseg_limit strmsgsz strctlsz nstrpush strthresh psetimers psebufcalls
1448
1448 0 150
icmpaddressmask =
tcp_keepini t ie5_old_multicast_mapping
= =
rfc1323 =
= pmtu_rediscover_interval = udp_pmtu_discover = tcp_pmtu_discover = ipqmaxlen =
pmtu_default_age
1
0 10 30 0 0 100
= = =
{} 0 1
2/20/2005
Template Documentation
11
2/20/2005
Template Documentation
13
Frame 4KB
Working Segment
Paging Space
Persistent Segment
Client Segment
CLIENT PAGES
NFS JFS2
Template Documentation 14
Real Add r aaa1 aaa2 aaa3 aaa4 bbb1 bbb2 bbb3 bbb4 ccc1 ccc2 ccc3 ccc4
2/20/2005
Seg Typ e W W W W P P P P C C C C
Ref ? + + + +
Mod ? + +
File System Paging Space
+ +
New PFT
NFS/JFS2 + + + +
Template Documentation
+ +
Seg Type W W P P C
Ref?
Mod? +
+ 15 +
120 + 8 120
maxfree VMM Page Stealing continues until the maxfree value is reached. minfree when # of frames on the free list reaches this value Page Replacement Algorithm wakes up and begins stealing pages
On a large memory system or SMP, the defaults of 120 and 128 are a very small amount of the real memory available. If memory demand continues after minfree value is reached, then processes may be suspended or killed. When the number of free pages is = or < than maxfree, algorithm no longer frees pages. There will be insufficient pages relative to the total system memory to satisfy demand.
2/20/2005 Template Documentation 17
defaults
100%
maxperm
Comp Pages
numperm maxperm
Comp Pages
50%
File Pages File Pages
minperm
Template Documentation
minperm
2/20/2005
0%
18
2/20/2005
Template Documentation
19
Number of memory pools can be determined through: vmtune a (pre 5.2) vmstat v for AIX 5.2
2/20/2005 Template Documentation 20
The nmon tool is similar to "topas", which displays real-time AIX performance statistics. But unlike "topas", nmon presents more information and can capture data for analysis and presentation.
The nmon_analyzer tool analyzes the captured performance data. It can create a spreadsheet showing graphs of performance trends.
2/20/2005 Template Documentation 21
NMON Analyzer performs analyses of the nmon data to produce the following:
Calculation of weighted averages for hot-spot analysis Distribution of CPU utilization by processor over the collection interval - useful in identifying single-threaded processes Additional sections for ESS vpaths showing device busy, read transfer size, and write transfer size by time of day Total system data rate by time of day, adjusted to exclude double-counting of EMC hdiskpower devices - useful in identifying I/O subsystem and SAN bottlenecks Separate sheets for EMC hdiskpower and FAStT dac devices Analysis of memory utilization to show the split between computational and non-computational pages Total data rates for each network adapter by time of day Summary data for the TOP section showing average CPU and memory utilization for each command
2/20/2005
Template Documentation
22
2/20/2005
Template Documentation
23
2/20/2005
Template Documentation
24
2/20/2005
Template Documentation
26
2/20/2005
Template Documentation
27
I/O Tuning
Over 35% I/O wait should be investigated Oracle databases like async I/O, DB2 & Sybase do not care (a good place to start would be AIO PARMS of
MINSERVERS = 80 MAXSERVERS = 200 MAXREQUESTS = 8192)
Recent technology disks will support higher ltg numbers lvmstat (must be enabled prior to usage) provides detailed information for I/O contention filemon is an excellent I/O tool (trace ensure you turn it off) numfsbufs and hd_pbuf_cnt adjusted to reduce wait counts in vmtune or vmstat -v
2/20/2005
Template Documentation
28
VMSTAT AIX 5
# vmstat - I -t 1 10 kthr memory --------------r b p avm fre 0 0 0 35169 98866 0 1 0 35169 98863 1 1 0 35169 98863 1 1 0 35169 98863 1 1 0 35169 98863 0 1 0 35169 98863 0 1 0 35169 98863 0 1 2 35169 98863 0 1 2 35169 98863 0 1 0 35169 98863
2/20/2005
page faults cpu time ------------------------ ----------------------------fi fo pi po fr sr in sy cs us sy id wa hr mi se 0 0 0 0 0 16 118 231 30 0 1 99 0 12:41:52 0 0 0 0 0 0 222 100 27 0 0 99 0 12:41:53 5 0 0 0 0 0 229 88 38 2 0 91 7 12:41:54 6 0 0 0 0 0 218 58 26 4 5 91 0 12:41:55 7 0 0 0 0 0 227 58 30 6 0 94 0 12:41:56 4 0 0 0 0 0 236 72 34 0 0 99 0 12:41:57 0 9 0 0 0 0 223 72 34 0 0 99 0 12:41:58 20 7 0 0 0 0 221 60 28 1 0 89 10 12:41:59 18 4 0 0 0 0 213 58 30 1 5 84 10 12:42:00 0 0 0 0 0 0 221 72 34 0 0 99 0 12:42:01
Template Documentation 29
VMSTAT AIX 5
/@test1 $ vmstat hdisk0 hdisk1 1 10 kthr memory page -------------------------------------r b avm fre re pi po fr sr cy 1 1 51459 110720 0 0 0 0 0 0 3 0 51465 110714 0 0 0 0 0 0 1 0 51465 110714 0 0 0 0 0 0 3 0 51465 110714 0 0 0 0 0 0 1 0 51466 110713 0 0 0 0 0 0 1 1 51467 110712 0 0 0 0 0 0 1 1 51467 110712 0 0 0 0 0 0 1 1 51467 110712 0 0 0 0 0 0 2 0 51467 110712 0 0 0 0 0 0 1 0 51467 110712 0 0 0 0 0 0 faults -----------in sy cs 208 2484 1177 303 5371 1609 300 5502 1725 305 5273 1613 310 5330 1654 308 5341 1643 313 5392 1665 308 5421 1677 308 5271 1635 302 5432 1697 cpu ----------us sy id 26 10 64 26 11 64 27 8 65 27 9 64 21 15 65 28 7 65 28 10 62 23 13 64 27 8 66 29 9 62 disk xfer ----------wa 1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 0 0 5 4 0 0 8 0 0 0 0 0 0
The number of transfers per second to the specified physical volumes that occurred in the sample interval. One to four physical volume names can be specified. Transfer statistics are given for each specified drive in the order specified. This count represents requests to the physical device. It does not imply an amount of data that was read or written. Several logical requests can be combined into one physical request.
2/20/2005
Template Documentation
30
2/20/2005
Template Documentation
31
Whats Changed?
Consolidated access to performance tuning values in SMIT and Web-base System Manager Perf toolbox and iostat support for ESS vpaths Include Xprofiler (GUI-based profiling tool) in AIX base Performance tools support for LPAR, large pages and memory affinity New thread analysis tools: CURT and SPLAT tprof enhancements
Support for emulation and alignment interrupts Improved threads support Multiple process profiling
ioo
vmtune
vmo
Command Consistency schedtune schedo Options for display or change Ability to control changes now, next boot, all no nfso
2/20/2005
Ability to return to defaults, check consistency, save or propogate Commands supported from SMIT or WSM Template Documentation 33
Network Tuning no a
2/20/2005 Template Documentation
AIX 5.2/5.3 Tuning: /etc/tunables Promotes reusability Flags are now consistant Automatic saving of parameters
etc/tunables
tuncheck - checks ranges, dependencies, bosboot if required tunsave - saves current values to a file (optionally nextboot) tunrestore - restore from a file (now or at reboot) tundefault - restores to default values
2/20/2005 Template Documentation 35
2/20/2005
Template Documentation
36
# lparstat 5 10 System configuration: type=Shared mode=Capped smt=On lcpu=2 mem=2048 psize=1.0 ent=0.50 %user %sys ----- ---4.8 1.2 21.8 1.8 31.2 2.2 84.9 5.4 85.1 5.4 77.1 4.9 2.9 6.2 2/20/2005 4.8 13.6 4.4 12.3 %wait ----0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 %idle physc %entc lbusy app ----- ----- ----- -------94.0 0.04 7.0 1.7 0.9 76.4 0.13 26.3 13.3 0.8 66.5 0.19 37.0 16.1 0.8 9.7 0.50 99.4 48.7 0.5 9.5 0.50 99.5 48.1 0.5 18.0 0.45 90.2 44.5 0.4 90.9 0.06 Template Documentation 11.4 2.2 0.9 81.6 0.11 22.6 10.0 0.8 83.3 0.10 20.5 10.5 0.9 vcsw phint ---- ----1378 0 1580 0 1461 0 1472 0 1477 0 1546 0 1425 1 1810 0 1773 1
38
3 MODES: information (-i) shows static configuration information detailed hypervisor (-H) breakdown of hypervisor time by hcall type 2/20/2005 monitoring mode (default) Documentation Template 39
| | | | | | |############################| %Entc= 1.2 I-Pack 1.0 0.0 KBPS 0.0 0.0 0.0 0.0 PID 13180 9366 22452 2322 2580 CPU% 0.0 0.0 0.0 0.0 0.0 O-Pack 1.0 0.0 KB-In 0.0 0.0 KB-Out 0.1 0.0
TPS KB-Read KB-Writ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 PgSp 1.6 0.5 0.3 0.0 0.0 Owner root root root root root
0 0 0 0 0 0 0
PAGING SPACE Size,MB 512 % Used 0.6 % Free 99.3 Press: "h" for help "q" to quit
New metrics are added automatically when running in shared mode CPU utilization metrics are automatically calculated using new purr-based data and formula when running in SMT or shared mode
2/20/2005
Template Documentation
41
2/20/2005
Template Documentation
42
Filesystem buffers: Insufficient buffers will degrade I/O performance. The default AIX setting for these buffers is typically too low for database servers. JFS/JFS2 are tuned separately. JFS uses vmtune b, while JFS2 uses vmtune Z. Be careful not to set the value too high if running a 32 bit kernel with a large number of filesystems (50+). The buffer setting is per filesystem, and you can run out of kernel memory if this is set too high. (This does not apply to the 64 bit kernel, which supports larger kernel memory sizes.) Tune the filesystem buffers when the system is under peak load. Run the following command multiple times: AIX 5.1: /usr/samples/kernel/vmtune a | grep fsbufwaitcnt AIX 5.2: vmstat v | grep filesystem I/Os blocked with no fsbuf
2/20/2005
Template Documentation
43
Disk Layout: The most important I/O tuning step is to spread all of the data over all of the physical disk drives*. If you have a SAN, work closely with the SAN administrator to understand the logical to physical disk layout. In a SAN, two or more hdisks may reside on the same physical disk. (*-The one exception is when you back up to disk. Be sure the backup disks are on a separate storage system to avoid having a single point of failure.) Queue depth for fibre channel adapter: This setting depends on the storage vendor. For IBM Shark storage, I set this around 100. If using non-IBM storage, check with your vendor for their queue depth recommendation. High queue depth settings have been known to cause data corruption on some non-IBM storage. If unsure, use the default value. Asynch I/O: Improves write performance for JFS and JFS2 file systems. It does not apply to raw partitions. AIO is implemented differently in AIX 5.1 and 5.2. In 5.1, the min/max AIO settings are for the entire system. In 5.2, the AIO settings are per CPU. In AIX 5.1, I set the max server to 1000. On a 5.2 system, divide 1000 by the number of CPUs. I tend to over configure AIO, as it requires a reboot. Over configuring max server doesnt use any extra resources, as AIO servers are only created when needed. The max server just sets the maximum, not the actual number used. If you plan to use DLPAR and dynamically add CPUs, contact Supportline to discuss the implications.
2/20/2005 Template Documentation 44
[Entry Fields] Maximum number of PROCESSES allowed per user Maximum number of pages in block I/O BUFFER CACHE Maximum Kbytes of real memory allowed for MBUFS Automatically REBOOT system after a crash Continuously maintain DISK I/O history HIGH water mark for pending write I/Os per file LOW water mark for pending write I/Os per file Amount of usable physical memory in Kbytes State of system keylock at boot time Enable full CORE dump Use pre-430 style CORE dump CPU Guard ARG/ENV list size in 4K byte blocks [10000] *** [20] [0] false t rue *** [0] [0] 524288 normal false false enable *** [6]
Default max processes per user (128) too low Continuously maintain disk I/O - sar and iostat to record disk CPU Guard - CPU Deallocation 2/20/2005 Template Documentation 45
Type or select values in entry fields. Press Enter AFTER making all desired changes.
[Entry Fields] Ethernet Adapter Description Status Location TRANSMIT queue size HARDWARE RECEIVE queue size RECEIVE buffer pool size Media Speed Inter-Packet Gap Enable ALTERNATE ETHERNET address ALTERNATE ETHERNET address Enable Link Polling Time interval for Link Polling Apply change to DATABASE only ent0 IBM 10/100 Mbps Ethern Available 10-60 [8192] [256] [384] 10/100,full-duplex**** [96] no [0x000000000000] no [500] no
Template Documentation
46
JFS2 cio option removes i-node serialization With JFS, limit size of datafiles (containers) on change-intensive tablespaces to 2 GB.
2/20/2005 Template Documentation 48
2/20/2005
Template Documentation
49
Redbook References
Managing AIX Server Farms,
SG24-6606-00
2/20/2005
Template Documentation
50
Other References
Performance Management Guide
http://publib16.boulder.ibm.com/pseries/en_US/infocenter/base/aix52.htm SG24-6039-00
Direct I/O:
http://www-106.ibm.com/developerworks/eserver/articles/DirectIO.html
2/20/2005
Template Documentation
51
SUMMARY
Each environment will have different challenges Rules of Thumb are just that; suggestions If you dont know what your performance was BEFORE you made the change, you wont know what affect you had on performance. Carefully define ALL boundaries that you must operate under. Best-case throughput is always controlled by the slowest common denominator!
2/20/2005
Template Documentation
52