Tags

, , ,

Being a database application developer, a lot of times we come across deadlock errors. We know that a deadlock detected message is actually a clue to understand that it is a application design level issue. But just saying that doesn’t fix anything, and the herculean task is to replicate the deadlock situation. Proper exception handling and error logging could lead us to the database objects involved in creating the deadlock.

Let’s take a look at a simple example of deadlock situation using a single table with two rows :

SQL> CREATE TABLE t_test(col_1 NUMBER, col_2 NUMBER);

Table created
SQL> INSERT INTO t_test VALUES(1,2);

1 row inserted
SQL> INSERT INTO t_test VALUES(3,4);

1 row inserted

SQL> COMMIT;

Commit complete

SQL> SELECT * FROM t_test;

     COL_1      COL_2
---------- ----------
         1          2
         3          4

Note the time of each transaction, I have set time on timing on for a better understanding.

SESSION : 1

12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;

1 row updated.

Elapsed: 00:00:00.00

SESSION : 2

12:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;

1 row updated.

Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;

At this point, SESSION 2 keeps waiting.

SESSION : 1

12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

At this point, SESSION 2 is the victim of deadlock, SESSION 1 is still waiting.

Let’s look at the session details from SESSION 2

12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS      BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
        14 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network         NOT IN WAIT SQL*Net message to client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Application     VALID       enq: TX - row lock contention

Elapsed: 00:00:00.00
12:22:18 SQL>

So, v$session details when viewed in SESSION 2, i.e. SID 14, says the status is ACTIVE.

Let’s look at the session details from another session, lets call it SESSION 3 for the sake. Remember, SESSION 1 is still waiting.

SQL> set time on timing on
12:24:41 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe'

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS BLOCKING_SE EVENT
---------- -------- --------------- ------------- ------------------- ---------- ----------- ------------------------------
        13 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network    NOT IN WAIT SQL*Net message to client
        14 INACTIVE sqlplus.exe                   WAITING             Idle       NO HOLDER   SQL*Net message from client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Applicatio VALID       enq: TX - row lock contention
                                                                      n


Elapsed: 00:00:00.01
12:24:44 SQL>

So, for other sessions, SESSION 2, i.e. SID 14, is INACTIVE. SESSION 1 is still WAITING with event enq: TX - row lock contention.

Let’s commit SESSION 2

12:22:18 SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
12:25:43 SQL>

At this point, the lock is released for SESSION 1, let’s commit session 1 as well –

12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

1 row updated.

Elapsed: 00:08:27.29
12:25:43 SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
12:26:26 SQL>

Elapsed: 00:08:27.29 shows SESSION 1 was waiting that long till SESSION 2 was committed.

To summarize, here is the entire story of session 1

12:16:06 SQL> UPDATE t_test SET col_1 = 5 WHERE col_2=2;

1 row updated.

Elapsed: 00:00:00.00
12:16:15 SQL> UPDATE t_test SET col_1 = 8 WHERE col_2=4;

1 row updated.

Elapsed: 00:08:27.29
12:25:43 SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
12:26:26 SQL>

To summarize, here is the entire story of session 2

100:16:04 SQL> UPDATE t_test SET col_1 = 6 WHERE col_2=4;

1 row updated.

Elapsed: 00:00:00.00
12:16:31 SQL> UPDATE t_test SET col_1 = 7 WHERE col_2=2;
UPDATE t_test SET col_1 = 7 WHERE col_2=2
                                  *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource


Elapsed: 00:00:24.47
12:22:15 SQL> select sid,status,program,sql_id, state, wait_class, blocking_session_status, event from v$session where schemaname='LALIT' and program='sqlplus.exe';

       SID STATUS   PROGRAM         SQL_ID        STATE               WAIT_CLASS BLOCKING_SE EVENT
----------   -------- --------------- ------------- ------------------- --------------- ----------- ----------------------------------------------------------------
        14 ACTIVE   sqlplus.exe     60qmqpmbmyhxn WAITED SHORT TIME   Network         NOT IN WAIT SQL*Net message to client
       134 ACTIVE   sqlplus.exe     5x0zg4qwus29v WAITING             Application     VALID       enq: TX - row lock contention

Elapsed: 00:00:00.00
12:22:18 SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
12:25:43 SQL>

Now, let’s see which transaction actually got rolled back and which got committed

12:25:43 SQL> select * from t_test;

     COL_1      COL_2
---------- ----------
         5          2
         8          4

Elapsed: 00:00:00.00
12:30:36 SQL>

Usually, Oracle takes a second or two to detect a deadlock and throws the error.

Conclusion

In my opinion, the best way to know the session details of a deadlock is to log the details as verbose as possible. Else, it is a nightmare for a DBA to investigate without proper information logged. For that matter, even a Developer would find it to be an herculean task to rectify and fix the actual design flaw if the deadlock error details are not logged verbosely. And to conclude with a one liner statement, A deadlock is due to design flaw, Oracle is just the victim and the application being the culprit. Deadlocks are scary, but they point out the design flaws that must be rectified sooner or later.

Hope it helps!