Isilon Troubleshooting Guide File System Locking

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Isilon Troubleshooting Guide: File Systems - Locking

This troubleshooting guide applies to OneFS 7.2 - 8.0 Revised December 5, 2016

IMPORTANT!
If you arrived at this
Start guide from a Protocols
guide, consult a coach
or SME.

Are you Go to:


connecting Isilon Troubleshooting Guide:
to the cluster over SMB, Yes Protocols - Protocol Routing
NFS, HTTP,
or FTP?

No

Is the cluster
unresponsive to Yes Go to 2A
isi commands?

No

Is the issue related to


a node that has split from the
Yes Go to 4A
cluster, and that cannot
reconnect?

No

Are hangdumps appearing in


Yes Go to 5A
the /var/log/messages log?

No

Consult a coach. If a coach is not


available, consult your supervisor or
manager for further direction.

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 1 of 6 © Copyright EMC Corporation. All rights reserved.
Isilon Troubleshooting Guide: File Systems - Locking
Indeterminate transactions Revised December 5, 2016

2A

Check for indeterminate transactions by running the following command:

isi_for_array -sX sysctl efs.journal.indeterminate_txns

See example output at the bottom of this page.

Refer to:
OneFS: Node expectedly reboots and/or any
of the following errors are seen in messages:
Did the command "Double failure detected for txn_p" "txn
return anything Yes (X:xxxxxxxxxx) is not resolved" "error =
other than zero? 98dexitcode: XXXX: EJDEADLOCK",
467837

No

Refer to:
Isilon OneFS: Nodes that have run for more than 248.5
consecutive days may restart without warning which may lead to
______
potential data unavailability, 462835 Caution:
After initiating a Code Red Engagement, per the
and previous KB, do not make changes to the cluster until
you get a response to your escalation.
UPDATE: ETA 202452: Isilon OneFS: Nodes that have run for
______
497 consecutive days may restart without warning, 301837 Continue through this guide, checking for known
issues and gathering as much information as you can.

Go to 3A

Example isi_for_array -sX sysctl efs.journal.indeterminate_txns output:


cluster-2# isi_for_array -sX sysctl efs.journal.indeterminate_txns
cluster-1: efs.journal.indeterminate_txns: 0
cluster-2: efs.journal.indeterminate_txns: 0
cluster-3: efs.journal.indeterminate_txns: 0

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 2 of 6 © Copyright EMC Corporation. All rights reserved.
Isilon Troubleshooting Guide: File Systems - Locking
Deadlocks Revised December 5, 2016

Note CAUTION!
Two common symptoms of deadlocks: If isi commands are not responding when you receive
 isi commands are unresponsive the case, do not run any additional isi commands as
 Clients cannot access the cluster you try to troubleshoot.

For more information, see:


Researching the causes of lock contention and
deadlock, ______
471792

How to recover from a cluster-wide deadlock, 303990


______
Note
If the terminal appears blank,
3A press Ctrl + C. This method
prevents running a command
unintentionally.

Go to:
Is more than one OneFS: How to recover
from a cluster-wide Did documentation
node in the cluster Yes Yes End
deadlock?, 303990 solve the problem?
unresponsive?

No
No

Consult a coach. If a coach is not


available, consult your supervisor or
manager for further direction.

Do hangdumps appear in
the /var/log/messages Yes Go to 4A
log?

No

Consult a coach. If a coach is not


available, consult your supervisor or
manager for further direction.

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 3 of 6 © Copyright EMC Corporation. All rights reserved.
Isilon Troubleshooting Guide: File Systems - Locking
Merge Lock
Revised December 5, 2016

4A

Verify that a shared merge lock is failing by running the following command:

isi_for_array -s sysctl efs.gmp.merge_lock_state

Output similar to the following appears:

node 1: efs.gmp.merge_lock_state: NO_EXCLUSIVE


node 2: efs.gmp.merge_lock_state: NO_EXCLUSIVE
node 3: efs.gmp.merge_lock_state: EXCLUSIVE_WAITING

Note the output for node 3, whose state is EXCLUSIVE_WAITING.


This status indicates a merge lock failure. (There may be more than one node in
this state.)

Rule out hardware issues, go


to: Isilon Troubleshooting
Are any nodes in a merge Guide: Hardware - Top Level
Yes No
lock state? and continue troubleshooting.

Run the following command on each node that has a merge


lock:

isi_bug_info > /var/crash/isi_bug_info.txt


If Hardware was not the cause of
This command gathers information about the merge lock, and the issue, consult a coach. If a coach is
redirects it to a text file in the not available, consult your supervisor or
/var/crash log. manager for further direction.

Consult a coach. If a coach is not


available, consult your supervisor or
manager for further direction.

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 4 of 6 © Copyright EMC Corporation. All rights reserved.
Isilon Troubleshooting Guide: File Systems - Locking
Hangdumps Revised December 5, 2016

5A

Is this an active issue,


or are you performing
Post-event Active Issue
post-event root-cause
analysis?

Consult a coach. If a coach is not


Establish an SSH connection to: available, consult your supervisor or
manager for further direction.
elvis.igs.corp

Change the directory to the location of the log set by Note


running the following command, where <path The path name will be included in
name> is the path to the cluster logs: the case notes. Alternatively, you
can locate the path name with the
cd /logs/<path name> Log Search engine.

Run the following command to launch the


log analysis application:

nilp

At the prompt, type hang, and then press


Enter.

Type the number that corresponds to the


date of the hangdump you want to graph,
and then press Enter.

Go to 6A

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 5 of 6 © Copyright EMC Corporation. All rights reserved.
Isilon Troubleshooting Guide: File Systems - Locking
Hangdumps, continued Revised December 5, 2016

6A

Follow the instructions provided in the


"Identifying the cause of lock contention or
deadlocks" section of Researching the
causes of lock contention and deadlock,
471792.

In the log directory, find the folder called


lockviz. Within that directory, find the
.svg file. This file contains output similar to
the diagram at right.

Examine the diagram to identify


the source of the lock contention.
Add your findings to the case notes, and
then consult a coach. If a coach is not
available, consult your supervisor or
manager for further direction.
Fig. 1: Lock contention diagram

How to interpret lock contention diagrams

 The green boxes identify the two entities in contention


for a lock. The entity at the top of the diagram is trying to
obtain the lock. It is known as the locker. The entity at
the bottom of the diagram has possession of the
contested object. This entity is known as the owner.
 The red oval is known as a waiter. Waiters are lockers
that have requested a lock, but not yet acquired it.
 The yellow diamond identifies the object the locker and
owner are contesting.
 The blue oval identifies the type of lock the owner has
placed on the contested object.

For a more thorough explanation, see a coach or SME.

We appreciate your help in improving this document.


_________________
Submit your feedback at http://bit.ly/isi-docfeedback.
Page 6 of 6 © Copyright EMC Corporation. All rights reserved.

You might also like