P1-P2 H I: Revision History

P1-P2 HANDLING INSTRUCTIONS
Authors Ramanesh Venkatarayar / Sudhir Kalavagunta

Department Operations : Database Support
Version 1.0
Date 23-Apr-2014
Revision History:
Version Date Remarks
1.0 23-Apr-2014 Initial Document created
1.1 03-Jun-2014 Shrirang Samudralwar updated with OS commands and SQL
Scripts
1.2 28-Jul-2014 Corrections and Reviewed
Reviewers / Approvers:
Name Role
Venkat Chandragiri Reviewer / Approver
Anthony Sanchez Reviewer/ Approver
Ernie Lacson Reviewer/ Approver
Jeff Oliveto Reviewer/ Approver
Sendhil Thanigaimalai Reviewer/ Approver
Mona Mohan George Reviewer/ Approver
Luven Desouza (IMC) Reviewer
Deborah Kuznik (Rolta PM) Reviewer
Symptoms: Application started showing up high latency , Applications

experiencing connection issues and OR Database server load is high and or
having high number of connections
DB Threshold :
 Load average of 50 and beyond for the database Server (depends on no. of CPU’s in that server)
 Total number of Active Database sessions exceed 750
Triggering Point :
 Rolta / App – who notices the issue first , should take it up with IMC for further actions.
Phase – I ( 0 – 20 Minutes )
 Inform IMC and make sure you get the L2 App team and monitoring team involved immediately.
Find out if monitoring is showing any services down. Troubleshooting should be a joint effort.
 Have Application Team to run the following connection checks – if Application is experiencing
connectivity issues
o Oracle_Client_Home is installed
 Perform TNSPING <SCAN NAME> , (< 100ms)
 Connect to database using application username thru sqlplus
o Perform JDBC based connection check (Example ..)
o Perform telnet <SCAN NAME> 1521
o nslookup SCAN NAME , should resolve with 3 IP’s
o Ping SCAN NAME (4 times) , it should result in resolving 3 different IP’s of Scan. (< 50ms)
What DBA Team need to do (DBA to report the below findings to IMC and update in Jira) :
 Check Insight tickets for that database (session wait, log scan , unresponsive, monitor fault)
Steps: Insight tool is configured for all Prod/Non-Prod databases. Will get tickets in case of
session waits, log scan, Unresponsive, Monitor faults, Table space space issues etc.
 Run the Health Check report (Version X) from the $HOME/rolta/directory

o Check Database Alert Log / ASM Alert Logs for recent Errors
o Check Listener / Scan Listener logs for connections coming in / Errors
o Check the DB registration with Scan and Listener (all listeners in grid home)
o Check connecting from Rolta Monitoring Servers thru Scan / direct node (< 100ms)
Steps: cd $HOME/rolta
$ sh hourly_db_status_report.sh
Generated report will be delivered to all ROLTA DBAs mailbox and from that report
above parameters can be validated.
 Capture the server load from OSWATCHER archive directory for the last hour or for last 15
minutes using TOP command:
2
Steps: From OS prompt and Oracle login, execute below command:
$ top OR $ uptime
load average: 12.92, 14.07, 14.37
It will show current load average, 5 minutes, 15 minutes before
TOP will give other useful info like highly consuming OS processes ( Oracle/root or other users)
and Memory usage/RAM and Swap usage.
 Check the OS logs for any error messages (/var/log/message)
Steps: Logon to server with root and then go to /var/log folder and tail 30 lines, to check for any
errors or messages.
[root@ashorap1n4 log]# tail -30 messages
 Check the status of CRS / ASM
Steps: Check ASM instance is running or not using:
$ ps –ef|grep pmon
OR
Check ASM background processes using:
$ ps –ef|grep asm
Logon to ASM instance by sourcing it and then check whether its running or not using SQL
commands.
FOR CRS status, need to execute below command with root login:
# ps –ef|grep crs
It will show CRS daemon is running with crsd.bin from GRID HOME.
[c5195962@ashorap1n4 ~]$ ps -ef|grep crs

c5195962 2093 32567 0 08:16 pts/1 00:00:00 grep crs
root 20057 1 0 May01 ? 02:01:27 /app/oracle/11.2.0/grid/bin/crsd.bin reboot
3
Execute below command for checking CRS status (With root login and from $GRID_HOME/bin).
It will show Cluster resources and status on different nodes of the cluster.
# ./crs_stat –t
Sample Output:
Name Type Target State Host

---------------------------------------------------------------------------------
ora.DG01.dg ora....up.type ONLINE ONLINE ashorap1n2
ora....ER.lsnr ora....er.type ONLINE ONLINE ashorap1n2
ora....N1.lsnr ora....er.type ONLINE ONLINE ashorap1n4
ora....SM4.asm application ONLINE ONLINE ashorap1n2
 Check and get the Log file sync elapsed time
Steps: Go to Alert log location and execute below command

$ ls -ltr *lgwr*
Look for the latest lgwr .trc file
Example:
oracle@ashorap1n4[ASHPCE1C1]$ ls -ltr *lgwr*

-rw-r----- 1 oracle asmadmin 20475 Jun 2 07:35 ASHPCE1C1_lgwr_21330.trm
-rw-r----- 1 oracle asmadmin 180906 Jun 2 07:35 ASHPCE1C1_lgwr_21330.trc
oracle@ashorap1n4[ASHPCE1C1]$ tail -30 ASHPCE1C1_lgwr_21330.trc
*** 2014-06-02 05:53:03.402

Warning: log write elapsed time 628ms, size 585KB
*** 2014-06-02 05:53:08.475

Check how much time lgwr process is taking to synch from Log buffer to Log files.
 In case of ASM – Run IOSTAT in ASMCMD prompt
Steps: Source ASM instance and execute asmcmd. Execute iostat command.
4
Following is the sample output:
ASMCMD> iostat
Group_ Name Dsk_Name Reads Writes
DG01 POD1_PROD02_DG01_01 8656176320512 351804456960
DG01 POD1_PROD02_DG01_02 7886433042432 352465548288
DG01 POD1_PROD02_DG01_03 9159568773120 361441565696
DG01 POD1_PROD02_DG01_04 9008070394368 373149604864
DG01 POD1_PROD02_DG01_05 7625927325184 353863700992
DG01 POD1_PROD02_DG01_06 7647393869312 358348650496
DG01 POD1_PROD02_DG01_07 8277035104256 345368769536
DG01 POD1_PROD02_DG01_08 7873794318336 357313810944
 Check File system usage (df –kh) , look for any 100% filled mount point
Steps:
$ df –kh
Always look for Used/Available and used%. Need to ensure that used% should not cross
threshold limit, which is generally 90%.
 Check Flash Recovery Area size utilization
Steps: Execute below commands to check FRA utilization:
SQL> show parameter DB_RECOVERY_FILE_DEST
NAME TYPE VALUE

------------------------------------ ----------- ---------------------
db_recovery_file_dest string +DG4
db_recovery_file_dest_size big integer 200G
Since db_recovery_file_dest is set to Disk group +DG4, logon to ASM instance and then check
space using asmcmd
 Check Archive log destination utilization
Steps: Log on to sqlplus as SYSDBA. Example below:
SQL> archive log list;

Database log mode Archive Mode
Automatic archival Enabled
Archive destination +DG04
Alternatively can check archive destination by using “show parameter archive”
5
1. If it’s a ASM Disk group then check Total_MB, Free_MB using below command:
SQL> select GROUP_NUMBER,NAME,TOTAL_MB,FREE_MB from v$asm_diskgroup;
2. If Archive destination is server mount, then need to check with df –k command:
 Check RMAN backups running (other than arch log backups) , Terminate backups
Steps: With Oracle login, execute command:
$ ps –ef|grep rman
Above command will show all backups running on server.
For checking level0/level1 backups running on server, check:
$ ps –ef|grep level0*
$ ps –ef|grep level1*
We can validate scheduled backups configured on server using:
$ crontab –l | grep rman
 Check for EXP / Imports running , Consult with CFDBA team for termination.
Steps: Check for Exports/Imports running on server using below command:
$ ps –ef|grep expdp
$ ps –ef|grep impdp
Discuss with CFDBA team and if export/import is not critical, then we can kill the job during
critical issues.
 Query for TOP Wait events from Database for last 30 minutes (or 1 hour of the issue was more
than an hour)
Steps: Use ASH views to query for top wait events for last 1 Hr. using below commands:
TOP events:
SQL> select event,

sum(wait_time +time_waited) ttl_wait_time
from v$active_session_history
where sample_time between sysdate - 60/2880 and sysdate
6
group by event order by 2;
Top sessions:
select sesion.sid,
sesion.username,
sum(ash.wait_time + ash.time_waited)/1000000/60 ttl_wait_time_in_minutes
from v$active_session_history ash, v$session sesion
where sample_time between sysdate - 60/2880 and sysdate
and ash.session_id = sesion.sid
group by sesion.sid, sesion.username order by 3 desc;
Top queries:
SELECT active_session_history.user_id,
dba_users.username,sqlarea.sql_text,
SUM(active_session_history.wait_time +
active_session_history.time_waited)/1000000 ttl_wait_time_in_seconds
FROM v$active_session_history active_session_history,
v$sqlarea sqlarea,dba_users
WHERE active_session_history.sample_time BETWEEN SYSDATE - 1 AND SYSDATE
AND active_session_history.sql_id = sqlarea.sql_id
AND active_session_history.user_id = dba_users.user_id
and dba_users.username <>'SYS'
GROUP BY active_session_history.user_id,sqlarea.sql_text, dba_users.username
ORDER BY 4 DESC;
Top segments
SELECT dba_objects.object_name,
dba_objects.object_type,
active_session_history.event,
SUM(active_session_history.wait_time +
active_session_history.time_waited) ttl_wait_time
FROM v$active_session_history active_session_history,
dba_objects
WHERE active_session_history.sample_time BETWEEN SYSDATE - 1 AND SYSDATE
AND active_session_history.current_obj# = dba_objects.object_id
GROUP BY dba_objects.object_name, dba_objects.object_type, active_session_history.event
ORDER BY 4 DESC;
Most IO:
SELECT sql_id, COUNT(*)

FROM gv$active_session_history ash, gv$event_name evt
WHERE ash.sample_time > SYSDATE - 1/24
AND ash.session_state = 'WAITING'
7
AND ash.event_id = evt.event_id

AND evt.wait_class = 'User I/O'
GROUP BY sql_id ORDER BY COUNT(*) DESC;
Phase – II (20 – 60 Minutes)

Session/User Info from gv$session:
Check and record total number of database connections with Active and Inactive status. Find out what
queries the active sessions are running. Are they all the same? From which schemas? From which app
servers?
Execute below commands to find above info:
i). Session/User Info from gv$session:

set lines 1000;
set pages 500;
col machine format a30;
col event format a30;
col username format a26;
select
sid,serial#,username,event,machine,status,sql_id,last_call_et,to_char(logon_time,'mm/dd/yyyy
hh24:mi:ss') from gv$session where username is not null and type <> 'BACKGROUND' and
username not like '%SYS%' order by logon_time;
ii). Event count based on v$session_wait:

set linesize 800
set pages 500
col event for a30
col event for a50
select Inst_ID,event,state,count(*) from gv$session_wait group by Inst_ID,event,state order by
count(*) desc;
iii). SQL version count greater than 500:

select count(*),sql_id from gv$sql group by sql_id having count(*) > 1000;
iv). Find SQLs through PID/SID at OS/Oracle level:

set lines 1000
set pages 100
set heading on
SET COLSEP |
column su format a20 heading 'User ID'
column stat format a8 heading 'Session Status'
column sid format 99999 heading 'sid'
column ssid format 999999 heading 'Session ID'
column sser format 999999 heading 'Serial No'
8
column spid format 9999999

column machine format a26
column txt format a25 heading 'SQL Text'
column username format a35
SELECT s.username su,s.status,s.sid,s.serial# Serial#,lpad(p.spid,7)
PID,sa.sql_id,Round(s.last_call_et/60,2) Minutes,substr(sa.sql_text,1,540)
txt,prev_sql_id,s.machine,s.last_call_et
FROM v$process p, v$session s, v$sqlarea sa
WHERE p.addr=s.paddr AND s.username is not null
AND s.sql_address=sa.address(+) AND s.sql_hash_value=sa.hash_value(+)
and s.username not like '%SYS%' and s.status='ACTIVE' ORDER BY s.last_call_et;
 In case of LMS DB - Check for SQL_ID’s marked hot

o Check version count of SQL_ID’s
o If > 500 , pin them and add them to startup trigger
Execute below query:
i). SQL version count greater than 500:

select count(*),sql_id from v$sql group by sql_id having count(*) > 500;
Procedure name is: Markhot_objects

Trigger name: pin_markhot_objects
 Check for Blocking locks in the database.

o Note down the Machine, SQL_ID, Last_Call_ET, wait time, Event , Status of the blocked
and Blocking sessions
o Kill Blocking Sessions , Ensure that blocking session count does not increase , If it
increases then get the Machine and SQL_ID from which the connection was initiated.
o Keep terminating the blocking locks if wait times exceeds more than a minute.
Execute below queries:
i). Blocking sessions query:

set linesize 300
set pagesize 100
col blocking_machine format a20
col blocking_session_cur_sql format a40
col blocking_session_prev_sql format a40
col locked_table format a20
col blocked_machine format a20
col blocked_sql format a40
select * from
(select
(select machine from v$session where sid = s.blocking_session)
blocking_machine,s.blocking_session blocking_session,
(select sql.sql_text from v$sql sql, v$session s2 where sql.sql_id =
s2.sql_id and s2.sid = s.blocking_session and rownum < 2)
9
blocking_session_cur_sql,(select sql.sql_text from v$sql sql,

v$session s2 where sql.sql_id = s2.prev_sql_id and s2.sid =
s.blocking_session and rownum < 2) blocking_session_prev_sql,
obj.name locked_table, s.machine blocked_machine,
s.sid blocked_session, (select sql_text from v$sql
where
sql_id = s.sql_id and rownum < 2) blocked_sql,
s.seconds_in_wait blocked_seconds
from
v$session s, v$lock lck, sys.obj$ obj
where
s.blocking_session is not null
and s.blocking_session = lck.sid
and s.seconds_in_wait > 300
and obj.obj# = lck.id1
)
where replace(upper(blocked_sql),locked_table) != upper(blocked_sql)
and blocking_session not in
select sid from v$session where blocking_session is not null);
ii).
set lines 1000;
set pages 400;
col sql_text format a35
col username format a15
column machine format a24;
column event format a24;
select username,sid,serial#,machine,event,status,s.sql_id,
seconds_in_wait/60 mins,ss.sql_text,blocking_session
from gv$session s,v$sql ss
where type <> 'BACKGROUND' and
username not like ('%SYS%') and
username is not null and status<>'KILLED'
and seconds_in_wait> 60 and s.sql_id=ss.sql_id and
blocking_session is not null order by 8;
 Check for library cache events and analyze if it creates issue in the database
o Flush Shared pool (after permission from Bridge)
i). Library cache locks query:
set linesize 1000;

select inst_id,sid,serial#,username,
to_char(logon_time,'mm/dd/yyyy hh24:mi'),
seconds_in_wait,sql_id,machine,event,status
10
from gv$session where event like '%library cache%' and

status<>'KILLED';
ii). Flush shared pool:

SQL> Alter system flush shared_pool;
11
 Check for number of invalid objects , owner - object type wise:
select owner,object_type,count(*) from dba_objects where

status='INVALID' group by owner,object_type order by
owner,object_type;
 Count of Application logins from application tables company wise (last 24 hours , by hours)
select access_hour,count(*) from

sfv4.sf_sysaudit s,sfv4.sf_companies c
where s.company_id=c.company_id
and request_type=1
and upper(substr(company_schema,1,length(company_schema)-
1))=upper('&schema')
--and trunc(access_date)=to_date('01/08/2014', 'mm/dd/yyyy')
and trunc(access_date)=trunc(sysdate)
and access_month=(to_number(to_char(sysdate,'mm')) - 1 )
and access_day=to_number(to_char(sysdate,'dd'))
--and access_month=8
--and access_year=2014
group by access_hour
order by access_hour
 Check for number of Sessions that does not run any sql statements (SQL_ID is null)
select sid,serial#,username,event,last_call_et,machine from

v$session where sql_id is null;
 Check and Capture the ACTIVE Sessions details

(SID,SERIAL#,MACHINE,USERNAME,LOGON_TIME,EVENT,LAST_CALL_ET,SQL_ID)
set lines 1000;

set pages 500;
col username format a26;
select sid,serial#,username,event,machine,sql_id,last_call_et,
to_char(logon_time,'mm/dd/yyyy hh24:mi:ss')
from gv$session where username is not null and
type <> 'BACKGROUND' and status='ACTIVE' and
username not like '%SYS%' order by logon_time;
12
 Check for long running sessions in the database
set heading on;

SET PAGESIZE 1000;
SET LINESIZE 1000;
COLUMN sid FORMAT 99999
COLUMN serial# FORMAT 9999999
column LAST_CALL_ET format 99999999
COLUMN Logon_Time FORMAT a25
COLUMN MACHINE FORMAT a25
COLUMN Username Format a25
column event format a30
column status format a10
column Inst_id format 9999
SET COLSEP |
Select s.Inst_ID,s.sid,s.serial#,s.username,s.event,s.machine,
TO_CHAR(s.last_call_ET/60,99999999999.99) Session_Active_Minutes
from gV$session s where s.status = 'ACTIVE' and
s.username NOT IN ('SYS','SYSTEM') and
(s.username is not null) and s.last_call_ET/60>5 and
rownum <16 Order by s.last_call_ET/60 DESC;
 If the total number of sessions / processes exceed > 800 then we have to pay more attention
and make sure that server load does not shoot up beyond 100
set lines 1000

set pages 100
set heading on
SET COLSEP |
column su format a20 heading 'User ID'
column stat format a8 heading 'Session Status'
column sid format 99999 heading 'sid'
column ssid format 999999 heading 'Session ID'
column sser format 999999 heading 'Serial No'
column spid format 9999999
column machine format a26
column txt format a25 heading 'SQL Text'
column username format a35
SELECT s.username su,s.status,s.sid,s.serial#
Serial#,lpad(p.spid,7) PID,
sa.sql_id,Round(s.last_call_et/60,2) Minutes,
substr(sa.sql_text,1,540) txt,
prev_sql_id,s.machine,s.last_call_et
FROM v$process p, v$session s, v$sqlarea sa
WHERE p.addr=s.paddr AND s.username is not null
13
AND s.sql_address=sa.address(+) AND

s.sql_hash_value=sa.hash_value(+)
and s.username not like '%SYS%' and s.status='ACTIVE' and
s.last_call_et>800 ORDER BY s.last_call_et;
 Notify long running jobs to Apps team with session start time, current sql , event , machine
and last_call ET and ask for instructions to terminate
set heading on;

SET PAGESIZE 1000;
SET LINESIZE 1000;
COLUMN sid FORMAT 99999
COLUMN serial# FORMAT 9999999
column LAST_CALL_ET format 99999999
COLUMN Logon_Time FORMAT a25
COLUMN MACHINE FORMAT a25
COLUMN Username Format a25
column event format a30
column status format a10
column Inst_id format 9999
SET COLSEP |
Select s.Inst_ID,s.sid,s.serial#,s.username,s.event,s.machine,
TO_CHAR(s.last_call_ET/60,99999999999.99) Session_Active_Minutes
from gV$session s where s.status = 'ACTIVE'
and s.username NOT IN '%SYS%' and
(s.username is not null) and
s.last_call_ET/60> 15 and
rownum <16 Order by s.last_call_ET/60 DESC;
 The key here is to identify whether the load is attributed to one company or one feature. If so,
there is a possibility to have the company stop the job and re-schedule during non-peak
season
If too many long running jobs are scheduled over same period of time during the peak business
hours (Like EUSF1 – Quartz jobs), then we need to request Apps team to schedule Non-critical
jobs during night time when most of the DB resources are idle.
 Find out if you can attribute the load to a schema/company. If you can, focus on controlling SQL
from the one schema. They could be just one job.
 Find out if load is coming from customer facing app servers, quartz servers, or reporting servers.
 Generate AWR’s in text version and do quick analysis.

o TOP 5 events (Foreground and background)
o Review TOP SQL’s by elapsed
o Review TOP SQL’s by execution
14
Following different AWR reports can be generated:
i). Normal AWR report on Standalone databases – awrrpt.sql

ii). RAC AWR report for entire RAC nodes - awrgrpt.sql
iii). AWR Difference report - awrddrpt.sql
iv). AWR specific SQL ID report for analysis – awrsqrpi.sql
15
 Check “free –m “ and swapping (OS Commands)
$ free –m
$ From /var/log/sa, execute command – sar –S ( With root login)
Following is sample output:
$ sar -S 1 3
Linux 2.6.19-194.el5SMP (host) 09/26/2012 _i686_ (8 CPU)
07:31:06 AM kbswpfree kbswpused %swpused kbswpcad %swpcad
07:31:07 AM 8385920 0 0.00 0 0.00
07:31:08 AM 8385920 0 0.00 0 0.00
07:31:09 AM 8385920 0 0.00 0 0.00
Average: 8385920 0 0.00 0 0.00
 Check hugepages – value from /proc/meminfo , /etc/sysctl .conf
 Check “Puppet “
o ps –ef |grep puppet , Logs /var/log/puppet* , look for current timestamp file and
review.
o Check “ulimit “ (Add reference ulimit values here ..)
From OS prompt execute below commands. Primarily look for following parameter
settings at OS level.
$ ulimit –a
Sample output:
stack size (kbytes, -s) 32768

open files (-n) 131072
cpu time (seconds, -t) unlimited
max user processes (-u) 131072
virtual memory (kbytes, -v) unlimited
open files (-n) 131072
16
Storage specific checks
 From Phase 1 – Review the IOSTAT results

 Review OSWatcher IOSTAT
 Copy 200mb file from local storage to SAN / NAS – Threshold < 2 seconds
Execute command from OS prompt:
$ time dd if=200mb.file of=200mb.file2 bs=1024k count=200
 Review Logfile Sync from “lgwr_X “.trc file
Go to Alert log location and execute:
$ ls –ltr *lgwr*
$ tail –30 ASHPCE1D3_lgwr_7112.trc
Sample output:

kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1
current_sync_count_delta=32 switch_sync_count_delta=68
*** 2014-06-03 00:53:24.337

Log file sync switching to post/wait
Current approximate redo synch write rate is 10 per sec
Check how much time its taking to sync log files.
 Run – DB_Read_Write_SelfTest (PLSQL Block to be developed)
Network related Checks
 DBA’s to Check  High number of - TNS Timeout errors in listener / alert log – Indication of
probable issue in network layer
o Count of connections for the hour from Scan / listener logs
 More inputs from Network / Infra team needed here
What Application Team need to do
 Check for Quartz / Reports Queue , Threshold ?

 Check for any abnormal conditions
17
Phase – III (Deep dive troubleshooting and issue resolution)

Engage – Other Operation teams as appropriate
 L3 System Admin team for Server and H/w Health Check

 L3 Storage Team for Storage Health Check
 L3 Network Team for Network Health Check
 Engineering Teams
SQL_ID Specific Checks
If one single SQL_ID is running in more sessions and are active do the following checks
 Get the Explain Plan of the SQL_ID
Execute below query for specific SQL ID having issue reported:
SQL> select * from TABLE(dbms_xplan.display_awr('0srj720xy3mx7'));
 Get Historical execution of the SQL_ID from AWR Table
col snap_id format 999999

col BEGIN_INTERVAL_TIME format a25 heading "Snapshot|Start Time"
select sq.snap_id, sn.BEGIN_INTERVAL_TIME, sq.SQL_ID, sq.EXECUTIONS_DELTA,

sq.DISK_READS_DELTA, sq.BUFFER_GETS_DELTA, sq.CPU_TIME_DELTA
from DBA_HIST_SQLSTAT sq, dba_hist_snapshot sn
where sql_id='&1'
and sq.snap_id = sn.snap_id
and sq.dbid = sn.dbid
and sq.instance_number = sn.instance_number
and BEGIN_INTERVAL_TIME > trunc(sysdate) - 7
order by snap_id
 Check the last stats collected on the tables
SQL> select owner,table_name,last_analyzed,sample_size from dba_tables order by

last_analyzed;
In the where clause we can specify for specific owner or table name as per the requirement.
 Involve Engineering team member.
18

P1-P2 H I: Revision History

Uploaded by

Copyright:

Available Formats

P1-P2 H I: Revision History

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

P1-P2 H I: Revision History

Uploaded by

Copyright:

Available Formats

P1-P2 HANDLING INSTRUCTIONS

Authors Ramanesh Venkatarayar / Sudhir Kalavagunta

Symptoms: Application started showing up high latency , Applications

 Run the Health Check report (Version X) from the $HOME/rolta/directory

Steps: From OS prompt and Oracle login, execute below command:

load average: 12.92, 14.07, 14.37

It will show current load average, 5 minutes, 15 minutes before

 Check the OS logs for any error messages (/var/log/message)

[root@ashorap1n4 log]# tail -30 messages

 Check the status of CRS / ASM

Steps: Check ASM instance is running or not using:

Check ASM background processes using:

[c5195962@ashorap1n4 ~]$ ps -ef|grep crs

Name Type Target State Host

 Check and get the Log file sync elapsed time

Steps: Go to Alert log location and execute below command

oracle@ashorap1n4[ASHPCE1C1]$ ls -ltr *lgwr*

oracle@ashorap1n4[ASHPCE1C1]$ tail -30 ASHPCE1C1_lgwr_21330.trc

*** 2014-06-02 05:53:03.402

*** 2014-06-02 05:53:08.475

 In case of ASM – Run IOSTAT in ASMCMD prompt

Following is the sample output:

 Check Flash Recovery Area size utilization

Steps: Execute below commands to check FRA utilization:

SQL> show parameter DB_RECOVERY_FILE_DEST

NAME TYPE VALUE

 Check Archive log destination utilization

Steps: Log on to sqlplus as SYSDBA. Example below:

SQL> archive log list;

Alternatively can check archive destination by using “show parameter archive”

SQL> select GROUP_NUMBER,NAME,TOTAL_MB,FREE_MB from v$asm_diskgroup;

2. If Archive destination is server mount, then need to check with df –k command:

Steps: With Oracle login, execute command:

Above command will show all backups running on server.

For checking level0/level1 backups running on server, check:

We can validate scheduled backups configured on server using:

$ crontab –l | grep rman

Steps: Check for Exports/Imports running on server using below command:

SQL> select event,

group by event order by 2;

SELECT sql_id, COUNT(*)

AND ash.event_id = evt.event_id

Phase – II (20 – 60 Minutes)

Execute below commands to find above info:

i). Session/User Info from gv$session:

ii). Event count based on v$session_wait:

iii). SQL version count greater than 500:

iv). Find SQLs through PID/SID at OS/Oracle level:

column spid format 9999999

 In case of LMS DB - Check for SQL_ID’s marked hot

Execute below query:

i). SQL version count greater than 500:

Procedure name is: Markhot_objects

 Check for Blocking locks in the database.

Execute below queries:

i). Blocking sessions query:

blocking_session_cur_sql,(select sql.sql_text from v$sql sql,

Execute below query:

oracle@ashorap1n4[ASHPCE1C1]$ ls -ltr lgwr