Deadlocks Concept 2

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 5

For the customer, sometimes users in the Engineering group were unable to log in to

Tc.
When I checked ORACLE's v$lock, v$session, etc. when this problem occurred,
tcserver.exe kept the Engineering group record in ppom_object locked for about 2
hours.
If the above record remains DB locked for a long time, the user belonging to the
Engineering group cannot log in to Tc with error ORA-30006
(PPOM_OBJECT:RhCh8vpiRnTm7D).

Solution
1) Avoid running CLEARLOCKS utility during production time / working hours ( POM
does this automatically)

If any cron job or scheduled task is set as maintenance activity, remove CLEARLOCKS
utility from that cron job or scheduled task

2) Turn off External transaction ( completely turned off from tc13.1 onward) with
setting the variable: TC_DISABLE_TRANSACTIONS=1.

POM removes locks held by dead session automatically.

What is locking mechanism:

Lock challenger ( User1 locks Itemrevision1 ), his session is dead


User2 tries to lock Itemrevision1 find it locked by User1 session,
POM will challenge User1 session and ask if session is dead or al life.
if User1 session is a live, User2 session will sleep for <n seconds> and try
again, but if User1 session is dead, we will clear it automatically and give the
lock to User2.

Historically, in Teamcenter the only way to remove locks held by dead sessions was
to run clearlocks [-verbose]. Starting Teamcenter 11.x, this behavior has changed.
POM has a lock challenger that removes any locks held by dead sessions
automatically.

Since POM automatically clears any locks held by dead sessions, Teamcenter does not
recommend customers to execute clearlocks during production hours.

Similarly the clear timestamp functionality can cause lock contention issues if run
in production time. To prevent this starting Teamcenter 11.x onwards clear
timestamp functionality has been modified to clear old timestamps.

-----------------------------------------------------------------------------------
-----------------------------

From last two days users are frequently facing the below issue:
"A database action has failed. The database may have timed out waiting for another
user's lock to be released or the database may be unavailable due to a hardware or
network failure. Please try your action again later. If this error continues,
please contact your system administrator."

Impact of this error is at the start some users sessions freezes while performing
any action like copy paste.
Then after sometime some users will not be able to login to 4 T RAC.
What can be the possible reason and solution for this. ?

Got one IR in solution center where suggestion is to modify the


TC_retry_time & TC_max_number_of_retries preferences

http://gtac/view.php?sort=desc&p=1&q=A+database+action+has+failed.
+The+database+may+have+timed+out+waiting+for+another+user
%27s+lock+to+be+released&file_type=text&i=ir-9511412&k=9&o=0
<http://gtac/view.php?sort=desc&p=1&q=A+database+action+has+failed.
+The+database+may+have+timed+out+waiting+for+another+user
%27s+lock+to+be+released&file_type=text&i=ir-9511412&k=9&o=0 rel=noopener
noreferrer target=_blank>

Current value for those preferences are TC_retry_time=2 & TC_max_number_of_retries


=12
Can we change the value and what values should we set and try to resolve this
issue?

Answer:
Please check if clear_locks, list_users utilities are run by any means of custom
code, scheduled jobs , etc during production working hours?
Perform any activity on the DB like reindexing, or any other activity?

Stop any background cron job or scheduled job or running maintenance activity /
utility like clear_locks, list_users during production hours.

This seems issue seems to be introduced due to External transaction a new


functionality was introduced in Teamcenter 11.
This feature is controlled by the high-level application and it is known to cause
this issue.

You can turn off he external transaction by setting TC_DISABLE_TRANSACTIONS=1


Avoid running clearlocks, list users utility during production working hours.

Also, we have introduced enhanced locking mechanism in TC 11.6.0.8, TC12.2, TC


12.1.0.6 which will take care of this lock timeout if it is happening due to
external transaction functionality.
===================================================================================
===================================================================================
======

Customer does not want to kill all processes on the server with clearlocks -
assert_all_dead.
What is the correct command line syntax to clearlocks for single a machine?

Solution
Please use synatx: clearlocks -assert_dead -u= -p= -g= <NodeName>

===================================================================================
==================================================

Customer is using Oracle12.2 RAC for Production environment.


We would like to know when switching ORACLE RAC (between Node1 & Node2), already
connected Teamcenter Session also be switched or need to restart the session?

Oracle RAC is used to avoid the downtime during Windows Server maintenance
activity, however we are facing the issue for already connected Teamcenter
Session. Customer expecting that Teamcenter session should not be affected when
switching the oracle Node. Kindly clarify the Teamcenter behavior for ORACLE RAC
nodes.
clearlock
Search HistoryBasic SearchAdvanced Search
Translate to:
Select language
| Toggle to the original sourceA- | A | A+
Did you find your answer?
Yes
No
Don't know
FAQ
Questions regarding switching of Oracle RAC Nodes
Symptom
Customer is using Oracle12.2 RAC for Production environment.
We would like to know when switching ORACLE RAC (between Node1 & Node2), already
connected Teamcenter Session also be switched or need to restart the session?

Oracle RAC is used to avoid the downtime during Windows Server maintenance
activity, however we are facing the issue for already connected Teamcenter
Session. Customer expecting that Teamcenter session should not be affected when
switching the oracle Node. Kindly clarify the Teamcenter behavior for ORACLE RAC
nodes.

Does Teamcenter session need to restart after switching the nodes in RAC
configuration?

Please recommend some operation which will resend request to Oracle to confirm the
"fail over"?

Solution
The Teamcenter sessions should "fail over" to the node that remains up. But there
are qualifications:

The session loses its Oracle connection when the original node goes down, so it
only tries to reconnect when there is some Oracle request thus recommend:
- Don't run clearlocks or list_users when waiting for sessions to fail-over.
Until they try to reconnect they will be listed as not-connected and thus tidied
up. If the locks are used in another session then the original session will be
marked read-only upon reconnect.
- Sqlnet expire_time=10 (per Tdoc on Oracle config) is set so that there's a
keep-alive on the connection and we'll realize it's gone.

Do you mean to say that if user not making any operation (in Teamcenter Client)
which will send request to Oracle, the session will be tied up?
>> The TC server has the Oracle connection. If the connection is dropped (on fail-
over) the reconnect only happens if the TC client makes a request of the TC server
that requires a database call. We get an error from the DB client (Oracle OCI) to
say the connection is lost, and that prompts doing a reconnect. If the connection
disappears and the TC server for a given user isn't active then it'll show up as
not connected & risk of tidy-up via clearlocks.

Can you please explain the scenario for this case how lock will be used
in another session and it ended up with read-only upon reconnect? We would like to
understand this scenario to avoid this situation.

Basically for end customer, session should continue as before switching


the node.
This follows from above notes. If the session is seen to be unconnected then
another user how has reconnected and asks for a lock will check if the owning
session is live (running clearlocks logic on that one process) before saying inst-
in-use. If the process is considered dead it is tidied up and the lock is granted
to the connected session. To demonstrate you need an object that stays locked under
the users control. Most objects edits are all within a single SOA trip. But one
example that comes to mind would be PSM (BomViewRevision - add an occurrence to get
the BVR locked, it should stay locked until "save PSM session", so if fail-over
after getting the lock and leave that session quiet, then in another session try to
edit that same BVR, it should show it finds it's locked, but checks the original
session is "dead" and tidies it up and gives the lock).
===================================================================================
================================================================

When attempting to patch from Teamcenter 10.1.1.1 to 10.1.6, TEM hangs at the
step where the update is running the clearlocks utility. There are no errors
reported in the UI and TEM does not quit/exit/or report errors in log files.

Solution
Upon investigation of the upgrade logs, it was discovered that multiple node names
no longer in use had processes that clearlocks was trying to kill.
Furthermore, running
clearlocks -verbose
from a tc cmd prompt also hung and did not complete. When testing a variety of
command line utilities, we found that those hitting the database also hung.
ie: list_users.

Using SQL to query how many process locks existed, it was discovered there were
over 250,000 locks. The majority of them were pointing to the node name no longer
in use.

Upon allowing TEM to run to completion (completed the clearlocks utility), the same
SQL query showed these locks cleared. Upon completion of the patch, Teamcenter
functioned properly.

===================================================================================
==============================================================================

Set Password:

set the password in an env. var.

e.g. set PWF=infodba

then run then encryptpwf command

install -encryptpwf -e=PWF -f=pwf.dat

then run the clearlocks cmd


clearlocks -assert_all_dead -u=infodba -pf=pwf.dat -g=dba

then immediately unset it after encryption

=============================================================================
Clearlocks is trying to tidy old timestamps and it found around 8.7 million
rows to be deleted
Hence it is taking huge amount of time.

SELECT puid FROM POM_TIMESTAMP WHERE pdbtimestamp < (systimestamp AT TIME ZONE
'GMT' - 96/24);
INFO - 2018/09/16-02:43:21.422 UTC - NoId - ===>Took 214.623 seconds to
execute that SQL (returning 8735657 rows)

Perhaps they can set TC_TIMESTAMP_THRESHOLD=96000


This will not find anything to tidy up and leave the table contents as is.
=============================================================================

You might also like