Session 10516 WebSphere Application Server Z - OS L2 Update

Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

WebSphere Application Server for

z/OS - Level 2 Update


Michael Stephen
IBM
Friday, March 16, 2012 8:00 AM
Session #: 10516
Outside (of WebSphere) factors

Inside (WebSphere) factors

Repeat (from last SHARE) factors


Outside (of WebSphere) factors
MQ APAR IZ94777 causing loop in
WebSphere Control Region

• MQ APAR IZ94777
• WebSphere App Server looping using high CPU
• MQ connectivity lost
• high GC in CR
• Issue seen at MQ-JMS 7.0.1.4 level (zWAS 7.0.0.17)
• Fixed in MQ-JMS 7.0.1.6 level (zWAS 7.0.0.21)
• Several error symptoms listed in IZ94777
• How to tell what level of MQ-JMS is running:
• BBOO0222I: WMSG1611I: The installed level of the WebSphere MQ
messaging provider is 7.0.1.5.
• Techdoc from MQ:
• http://www.ibm.com/support/docview.wss?uid=swg21248089
4
DB2 APAR PM56361 can cause
WebSphere native out of storage

• ABEND04E in Servant Region


• IEA794I SVC DUMP HAS CAPTURED: DUMPID=002 REQUESTED BY
JOB (WSP1A11S) DUMP TITLE=D1TP,ABND=04E-00E20015,U=xxxxxx,
M=C8 ,C=101.ASMC-D SNVCRTH,M=DSNVCFRR,LOC=DSNSLD1
.DSNSVSTK+05C6
• exception seen in SR:
• Exception caught in DBServlet: com.ibm.db2.jcc.am.SqlException:
jcc■50053■12312■3.61.96 T2zOS exception:
jcc■T2zosT2zosConnection.flowConnect:initRRSAFAtta ch:2347: Abend occurred in RRSAF,
Driver successfully retry, RRSAF Call:IDENTIFY, Subsystem ID:D1TP, Plan Name:,
Pklist:NULLID.*, Error Message:
"jccFunc:initRRSAFAttach,rrsaf:IDENTIFY,sig:18,acode:0004E000,reas:00E20015,
attach:20349728,tcb:2034AF28" ERRORCODE=-44 99, SQLSTATE=null
• 8 bytes of storage orphaned in SubPool 229 Key 7

5
WLM APAR OA38367
WAS SERVERS NOT PROCESSING
TRANSACTIONS AFTER POLICY ACTIVATION

• Change classification rules


• change the service class where WebSphere transactions

are classified
• New WLM policy installed and activated
• Work in the new service class workload times out
• Service Class is bound to a Servant Region
• New Service Class is not getting bound to a Servant

Region

6
LE APAR PM38867 - DB2 04E ABEND, SIGABND
SIGNAL NOT RAISED BY LE

• DB2 shows:
+DSNX908I DSNX9TIM PROCEDURE OR FUNCTION xxxxx WITH LOAD MODULE
xxxxx EXCEEDED CPU RESOURCE LIMIT SSN= xxxx PROC=DB2xxxxx
ASID=nnn WLM_ENV=DB2xxxxx

• ABEND04E may be ‘expected’ in certain situations


• prior to z/OS 1.12 was handled by LE

• post z/OS 1.12 percolates to WebSphere

• WebSphere SR ABENDS
03.12.14 STC25715 BPXP018I THREAD 21FF9E0000000046, IN PROCESS 66477,
ENDED WITHOUT BEING UNDUBBED WITH COMPLETION CODE 0404E000,
AND REASON CODE 00E50013

7
zFS APAR OA37950 can cause Poor
performance / High GCP usage in WebSphere

• zFS HIPER OA37950


• z/OS 1.11; z/OS 1.12, and z/OS 1.13
• can occur when ZFS vnodecache is defined with too small value
• z/OS 1.13 and usage of SYSPLEX_AWARE ZFS mounts
minimum vnodecache value of 32000 is recommended.
• when high I/O in USS (typical for WAS z/OS workloads)
• file I/O operations on USS are slow and have a high GCP usage
• file I/O is GCP workload, so it can't be offloaded to zAAPs
• Unnoticed this problem can drive a WebSphere z/OS LPAR into
a GCP MSU capping

8
zFS APAR OA37950 can cause Poor
performance / High GCP usage in WebSphere

• Symptoms:
• WebSphere App Server restart times increase
• WebSphere App Servers show higher zAAP_On_GCP usage in
RMF Mon III
• ZFS shell command 'zfsadm query -vnodecache' reports higher #
of Vnodes usage than the actual configured size (66k vs. 5k)
• zFS shell command ‘zfsadm query –usercache’ reports ‘zero’ or
very small number of allocated segments in the end section
Dataspace Allocated Free
Name Segments Pages
-------- ---------- ----------
ZFSUCD00 0 4000
ZFSUCD01 1 3999

9
Problem Prevention tips from zFS L2

• From zFS Support team:


• Shut down properly using F OMVS,SHUTDOWN and let it complete

• If filesystem will grow >4G


• Define with extended format / extended addressability

• zFS are VSAM linear and need dataclass definition for

Extended Addressability (EA)


• BACKUP, using logical dump, not physical dump of volume

• APARs OA37950 and OA37796 should be applied


• Especially if zFS is in a sysplex

10
Inside (WebSphere) factors
PE APAR PM58377

• PROBLEMS USING ADMIN CONSOLE AFTER MOVING


FROM FIX PACK 7.0.0.19 TO 7.0.0.21
• Admin console panels may be missing server information
• Error 404
• An error occurred while processing
request:%2Fibm%2Fconsole%2Fwebcontainer.config.view
• Message:SRVE0190E: File not found:
• ÝException in:null¨ null
• In DefinitionsXmlParser parse Exception occurred org.xml.sax.
SAXParseException: The value of attribute "extends" associated
with an element type "definition" must not contain the '<' character.

12
PE APAR PM58377

• Local Fix #1
• rebuild console-defs.idx using iscdeploy.sh -restore
• If this is a base server, then stop the application server and perform the

following:
1. cd /<WAS_HOME>/AppServer/profiles/default/bin
2. ./iscdeploy.sh -restore
3. Copy the output to a text file.
4. Restart application server
• If this is a ND environment, then stop the deployment manager
and perform the following:
1. cd
/<WAS_HOME>/DeploymentManager/profiles/default/bin
2. ./iscdeploy.sh -restore
3.Copy the output to a text file.
4.Restart the deployment manager.

13
PE APAR PM58377

• Local Fix #2
• relink console-defs.idx in the config root and install root.
Note: these commands should be entered on one line

1. rm <config_root>/systemApps/isclite.ear/isclite.war/WEB-INF/console-
defs.idx

2. ln –s <install_root>/systemApps/isclite.ear/isclite.war/WEB-INF/console-
defs.idx <config_root>/systemApps/isclite.ear/isclite.war/WEB-
INF/console-defs.idx

14
PM58366 – WebSphere V8 Server Startup
may hang in shell utility (z/OS only)

• From JCL of the startup of the server:


//APPLY EXEC PGM=BPXBATCH,REGION=0M,
// PARM='SH &ROOT./&ENV..HOME/bin/applyPTF.sh inline'
IEFC653I SUBSTITUTION JCL - GM=BPXBATCH,REGION=0M,PARM='SH
/WebSphere/ND/WAS00.WAS00.BBODMGR.HOME/bin/applyPTF.sh inline'

• postinstall actions for ifixes or FixPacks


• Prevents JCL from moving to the next step which initialize the runtime
• you can use /bin/ps –ef command (as UID 0) to find a /bin/chmod
command that is not progressing over period of time (mins / hrs)
/bin/chmod –R a+rx,u_w,g+w
<WAS_HOME>/profiles/default/properties/service/productDir

• Note: /bin/ps does not present the entire command line,


so the above path name may be truncated
15
PM58366 – WebSphere V8 Server Startup
may hang in shell utility (z/OS only)

• Workarounds are available (doc’d in the APAR)

Workaround 1:
It is safe to use /bin/kill -9 against the PID (process ID) for
the /bin/chmod utility. This will terminate the chmod command
without causing harm to the processing being performed by
applyPTF.sh. The server will then complete its startup.

Workaround 2:
Examine the number of files in directory
<WAS_HOME>/profiles/default/properties/service/productDir/PreConfigActions/logs
Each server startup will leave a file in this directory of the form
postinstallerConfigActions#############.log.

Back up these files to another location, and then delete them.


This has the effect of substantially reducing the number of files being processed
by /bin/chmod, and will probably avoid the hang.

16
Loop during migration job BBOWMG3B during
PREUPGRD step

• Migration code has issues when a directory name is a single letter


followed by a colon e.g. - c:

• interpreted as a ‘/’ and an infinite loop gets triggered when searching for
‘config’ since ‘c:’ analyzed first and finds the intended file

• Will be fixed in a future release


• Pervasive throughout the Migration code

• Upcoming Doc change

• Work around by renaming or deleting any directories with a name like a:, b:, c:
etc that may reside under the profile home directory.

17
WebSphere Application Server creates
files with 660 permission

• FixPack 7.0.0.17
• Temp files generated by WebSphere applications may no longer be
readable by other applications

• WebSphere Application Server Version 7 and above declare the server


umask differently than Versions 6.1 and prior do
• V6.1 and below used env variable _EDC_UMASK_DFLT
• V7 and above use new env variable _BPX_BATCH_UMASK
• Doc APAR PK88245 (6/8/09) describes this change

• If you do not supply a _BPX_BATCH_UMASK variable, then the


server's resulting umask value will allow new files it creates to be world
readable

18
WebSphere Application Server creates
files with 660 permission

• APAR PM32622 (7.0.0.17)


• changes default _BPX_BATCH_UMASK value to 007
• if you do NOT have a specific value set for _BPX_BATCH_UMASK
• new files created by the server will no longer be world readable
• ICH408I messages may be seen when other applications attempt to read files
created by the WebSphere App Server
• You can set _BPX_BATCH_UMASK variable to generate desirable umask
• a value of 022 will cause the files created by the server to have read and
execute bits set on for "other“
• Create in Admin console
• Environment > WebSphere variables
• Select correct ‘scope’ of the variable

• http://www.ibm.com/support/docview.wss?uid=swg21572240&acss=wasz121511

19
Idle Server using CPU ??

• Why is my server using CPU when the applications are not


being used ??

• Multiple tuning possibilities


• Application

• Application Server

• Node Agent

• High Availability Manager

• More detail in Whitepaper


• WebSphere Application Server - Idle Server Tuning
• http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/WP101894

20
Idle Server using CPU ??

• Node Agent
• Automatic file Synchronization
• Node Agent and Deployment Manager
• Security NEEDS sync
• Propagate updated Certs / LTPA Changes
• Application Deployment
• you will have to remember to sync manually
• High Availability Manager
• HA runs in every App/Proxy Server, Node Agent and Deployment
Manager in a cell
• cells can be divided into several high avail domains aka. core groups
• Disable IF you can.. some WAS services/features use HA Manager
as well as some stack products
21
Idle Server using CPU ??

• Application Server
• Start components as needed
• Dynamic cache service background processing
• EJB cache and pool background processing

• Application
• Class Loading and Update Detection
• JSP (Java Server Pages) Reloading
• good for development, production apps should be stable

• if disabled application will have to be stopped and restarted

manually if updated classes or changes to JSP’s occur

22
Idle Server using CPU ??
Servlet class Deployment.xml ibm-web.xmi Load new servlet class
Reload enabled reloadingEnabled
reloading interval >0
false false no

false true Yes, interval in xmi

true & false Yes, interval in XML


interval >0
true & true Yes, interval in XML
interval >0
true & true or false no
interval=0
JSP Reloading

Deployment.xml – Ibm-web-ext.xmi jspattributes – Change JSP (translate,


reloadEnabled reloadEnabled interval >0 recompile, reload)
true or false; false No
interval = 0 or >0
true or false; true Yes, interval in xmi
interval =0 or >0
23
Migration - IbmPKIX Trustmanager
Revocation Checking enabled
Problem: After migrating to WebSphere V7.0, SSL communication fails with message:

CWPKI0022E: SSL HANDSHAKE FAILURE:


The extended messages indicated the exception is:

PKIX path validation failed: java.security.cert.CertPathValidatorException:


The revocation status of the certificate with subject
(CN=company.hostname, OU=company, O=company L=NYC, ST=NY, C=US)
could not be determined.

Cause:
WebSphere V6.1 default enabled Trustmanager is IbmX509
WebSphere V7.0 default enabled TrustManager is IbmPKIX
For some customers....
Revocation checking was enabled in V6.1 for the IbmPKIX TrustManager,
but not enforced since the IbmX509 TrustManager was in use.
Solution:
•Disable Revocation checking if not needed (most common solution)
•Diagnose why the revocation status could not be determined (multiple reasons)
http://www14.software.ibm.com/webapp/wsbroker/redirect?version=…
24 …compass&product=was-nd-zos&topic=csec_sslx509certtrustdecisions
Migration - IbmPKIX Trustmanager
Revocation Checking
Prior to migrating to WebSphere V7.0,
if revocation checking is enabled, disable if not needed

Click Security > SSL certificate and key management.


Under Related Items, click Trust managers.
Click IbmPKIX.

Under Additional Properties, click Custom properties and set


com.ibm.jsse2.checkRevocation=false
Security Bulletin for WebSphere
Application Server

• Consolidated link you can use to obtain security risk assessment


information for APARs that are considered Security Integrity
• http://www.ibm.com/support/docview.wss?uid=swg21368398

• PM53930: Collisions in HashTable May Cause DoS Vulnerability


• http://www.ibm.com/support/docview.wss?uid=swg24031821

• Remember L2 cannot give any additional information than what


is published externally in the FLASH

26
Repeat (from last SHARE) factors
Versions, Dates, and Service Levels...

GA End of Marketing End of Support


Version 6.0 3/25/2005 2/23/2009 9/30/2010
Version 6.1 6/30/2006 7/25/2011 9/30/2012
Version 7 9/26/2008
Version 8 6/17/2011

• If delivered by Stack Products, EOS is the Stack Product


• Service Level Naming Convention Change
• V6.1; V7 – even #’s z/OS ONLY, odd #’s common

• V8 – all levels are common

• http://www.ibm.com/support/docview.wss?uid=swg21570083
• http://www.ibm.com/software/support/lifecycle/index_a_z.html
28
Transaction Partner Logs
• When they have entries in them, can cause problems
• At server startup, checks to see if there are any to recover
• If so it will try to recover them, and KEEP TRYING until it can
• has been found to cause high cpu (how many are out there)
• BBOT0009I: TRANSACTION SERVICE RESTART UR STATUS COUNTS
FOR W6SR02A: IN-BACKOUT=0, IN-DOUBT=0, IN-COMMIT=0
• If there are entries, you will have to resolve them with RRS
• STOP the WebSphere App Server
• delete UR’s associated with this server
• delete partner logs(log1 and log2)
<WAS_HOME>/profiles/default/tranlog/cellname/clustername/servername/transaction/partnerlog/
• Start the WebSphere App Server
may see message about ‘epoch mismatch’ now that RRS and WAS logs out of sync

infocenter article: Updating resources for an application server


http://www14.software.ibm.com/webapp/wsbroker/redirect?version=matt&product=was-nd-
zos&topic=trun_svr_updateresource
29
What’s Slowing WebSphere Down ??

• RACF AUDIT was active for the following classes:


• DIRACC, DIRSRCH, FSOBJ, FSSEC - AUDIT ALL.
• None of these classes were RACLISTed
• Issued command SETR LOGOPTIONS(NEVER(DIRACC))
• for all above classes to turn off auditing
• Following the change
• Portal restarted in 4 minutes compared to 30 minutes
• F ZFS,QUERY,ALL showed avg access time 0.003 instead of 1.6
• CPU usage returned to normal which means that the zAAPs were
being used instead of the GCP.
• When running a load the GCP% is now close to zero.
”The total response times are now excellent”

30
JESSPOOL management

• WebSphere Application Server for z/OS provides several improved


message routing capabilities:

• Routing BBO messages to specific SPOOL datasets instead of to


SYSLOG, thereby relieving the “clutter” on SYSLOG

• Spinning off SYSOUT and SYSPRINT data sets to relieve spool


resources

• Routing these datasets to HFS files instead of to JES Spool


• App Developers like this (UNIX flat files)

• http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD103695
• Techdoc describes how to implement these facilities
• Includes a sample python script to update the WebSphere variables.
31
QUESTIONS ??

You might also like