Being a database application developer, a lot of times we come across deadlock errors. We know that a deadlock detected message is actually a clue to understand that it is a application design level issue. But just saying that doesn’t fix anything, and the herculean task is to replicate the deadlock situation. Proper exception handling and error logging could lead us to the database objects involved in creating the deadlock.
Let’s take a look at a simple example of deadlock situation using a single table with two rows :
SQL> CREATE TABLE t_test(col_1 NUMBER, col_2 NUMBER);
Table created
SQL> INSERT INTO t_test VALUES(1,2);
1 row inserted
SQL> INSERT INTO t_test VALUES(3,4);
1 row inserted
SQL> COMMIT;
Commit complete
SQL> SELECT * FROM t_test;
COL_1 COL_2
---------- ----------
1 2
3 4
Note the time of each transaction, I have set time on timing on for a better understanding.
SESSION : 1
12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;
1 row updated.
Elapsed: 00:00:00.00
SESSION : 2
12:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;
1 row updated.
Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;
At this point, SESSION 2 keeps waiting.
SESSION : 1
12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;
At this point, SESSION 2 is the victim of deadlock, SESSION 1 is still waiting.
Let’s look at the session details from SESSION 2 –
12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';
SID STATUS PROGRAM SQL_ID STATE WAIT_CLASS BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
14 ACTIVE sqlplus.exe 60qmqpmbmyhxn WAITED SHORT TIME Network NOT IN WAIT SQL*Net message to client
134 ACTIVE sqlplus.exe 5x0zg4qwus29v WAITING Application VALID enq: TX - row lock contention
Elapsed: 00:00:00.00
12:22:18 SQL>
So, v$session
details when viewed in SESSION 2, i.e. SID 14, says the status is ACTIVE.
Let’s look at the session details from another session, lets call it SESSION 3 for the sake. Remember, SESSION 1 is still waiting.
SQL> set time on timing on
12:24:41 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe'
SID STATUS PROGRAM SQL_ID STATE WAIT_CLASS BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- ---------- ----------- ------------------------------
13 ACTIVE sqlplus.exe 60qmqpmbmyhxn WAITED SHORT TIME Network NOT IN WAIT SQL*Net message to client
14 INACTIVE sqlplus.exe WAITING Idle NO HOLDER SQL*Net message from client
134 ACTIVE sqlplus.exe 5x0zg4qwus29v WAITING Applicatio VALID enq: TX - row lock contention
n
Elapsed: 00:00:00.01
12:24:44 SQL>
So, for other sessions, SESSION 2, i.e. SID 14, is INACTIVE. SESSION 1 is still WAITING with event enq: TX - row lock contention
.
Let’s commit SESSION 2 –
12:22:18 SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
12:25:43 SQL>
At this point, the lock is released for SESSION 1, let’s commit session 1 as well –
12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;
1 row updated.
Elapsed: 00:08:27.29
12:25:43 SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
12:26:26 SQL>
Elapsed: 00:08:27.29
shows SESSION 1 was waiting that long till SESSION 2 was committed.
To summarize, here is the entire story of session 1 –
12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;
1 row updated.
Elapsed: 00:00:00.00
12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;
1 row updated.
Elapsed: 00:08:27.29
12:25:43 SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
12:26:26 SQL>
To summarize, here is the entire story of session 2 –
100:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;
1 row updated.
Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;
UPDATE t_test SET col_1 = 7 WHERE col_2=2
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
Elapsed: 00:00:24.47
12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';
SID STATUS PROGRAM SQL_ID STATE WAIT_CLASS BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
14 ACTIVE sqlplus.exe 60qmqpmbmyhxn WAITED SHORT TIME Network NOT IN WAIT SQL*Net message to client
134 ACTIVE sqlplus.exe 5x0zg4qwus29v WAITING Application VALID enq: TX - row lock contention
Elapsed: 00:00:00.00
12:22:18 SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
12:25:43 SQL>
Now, let’s see which transaction actually got rolled back and which got committed –
12:25:43 SQL> select * from t_test;
COL_1 COL_2
---------- ----------
5 2
8 4
Elapsed: 00:00:00.00
12:30:36 SQL>
Usually, Oracle takes a second or two to detect a deadlock and throws the error.
Conclusion
In my opinion, the best way to know the session details of a deadlock is to log the details as verbose as possible. Else, it is a nightmare for a DBA to investigate without proper information logged. For that matter, even a Developer would find it to be an herculean task to rectify and fix the actual design flaw if the deadlock error details are not logged verbosely. And to conclude with a one liner statement, A deadlock is due to design flaw, Oracle is just the victim and the application being the culprit. Deadlocks are scary, but they point out the design flaws that must be rectified sooner or later.
Hope it helps!