Monday, February 20, 2017

How to Resolve In-Doubt Transactions

In-Doubt Transactions

Distributed transactions perform DML on multiple databases which is a bit more complicated task because the database must coordinate the consitency in those seperate or even perhaps between different DBMSs (like Oracle - MS SQL). To ensure the transaction atomicity, Oracle implements a 2-phase commit mechanism through which the distributed transactions undergo some phases like prepare, commit, forget, etc. This phases constitute the hand-shake mechanism of the distributed transaciton.

However, sometimes things may go wrong (due to some network, system problem or even a reconfiguration of the underlying objects) and one of the phases fails while others are ok. Here, we say that the transaction becomes in-doubt. Normallly this problem should be handled by the RECO process itself, but in some cases this cannot br performed.

 Why RECO cannot perform in some cases?

One of the databases involved in the distributed transaction might be unreachable (netowrk, system issues etc.) while the RECO was trying to resolve the problem (even when retrying to recover). (UNSTUCK)

The lookup tables of the “2-phase commit” mechanism might become inconsistent with the transaction itself.  (STUCK)


Handling UnStuck Transactions

Hopefully, there is no inconsistency between the lookup tables and the transaction and the following code resolves the problem:

To see the waiting transactions -> DBA_2PC_PENDING view.
SQL> select local_tran_id,global_tran_id, state,mixed, commit# from dba_2pc_pending;

97.33.166765     ORCL.781a8889.97.33.166765     prepared                 no    60787107482

Here,  '97.33.166765' is the transaction id of the distributed transaction, which will be used in the following commands.

If the state of the transaction is “prepared” and there is no inconsistency, the transaction can be forced to rollback, or maybe if the underlying problem which caused the in-doubt transaction is resolved the transaction can be forced to commit as follows:
SQL> ROLLBACK FORCE  '97.33.166765' /* ->replace with ur own trx_id */
or
SQL> COMMIT FORCE  '97.33.166765' /* ->replace with ur own trx_id */

Note: If the command hangs, go to the “Handling Stuck DBA_2PC_PENDING” section.

If the state of the transaction is “collecting” and you execute the above command, you may see an error like:
ERROR at line 1:
ORA-02058: no prepared transaction found with ID 97.33.166765

Execute the following command to purge the transaction
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765'); /* ->replace with ur own trx_id */

PL/SQL procedure successfully completed.

Test to confirm that the transaction has gone
SQL> SELECT * FROM DBA_2PC_PENDING;

No rows returned.

The DBA_2PC_PENDING view is a lookup view, so it might be misleading sometimes. The actual transaction entry view is X$KTUXE ([K]ernel [T]ransaction [U]ndo Transa[X]tion [E]ntry).
This view gives info about the state of transactions that require transaction recovery:

X$KTUXE
COLUMN
TYPE
DESCRIPTION
ADDR
RAW
Address of this row in SGA
INDX
NUMBER
Index of this row in SGA
INST_ID
NUMBER
Instance Number
KTUXEUSN
NUMBER
Undo Segement #
KTUXESLT
NUMBER
Slot Number
KTUXESQN
NUMBER
Wrap Number
KTUXERDBF
NUMBER
Relative File
KTUXERDBB
NUMBER
Relative Block
KTUXESCNB
NUMBER
SCN Base prepare/commit
KTUXESCNW
NUMBER
SCN Wrap prepare/commit
KTUXESTA
VARCHAR2(16)
Transaction Status
KTUXECFL
VARCHAR2(24)
Transaction Flags
KTUXEUEL
NUMBER
Link to Coommit List

The concat of KTUXEUSN, KTUXESLT and KTUXESQN gives us the transacion number:

KTUXEUSN. KTUXESLT. KTUXESQN  = 97.33.166765

So, we can query the transaction view like:
SQL> SELECT * FROM X$KTUXE WHERE
           KTUXEUSN=97
AND KTUXESLT=33
AND KTUXESQN =166765;

No rows returned.

It should return no value, since the transaction has gone...

Handling Stuck Transactions

Our ultimate goal is not seeing the transaction in X$KTUXE table;  and ensuring that the dictionary tables like PENDING_TRANS$ to be consistent with this information.

Stuck transactions can be examined under the below conditions:

Cond 1: DBA_2PC_PENDING view have entries about our transaction but there is no transaction in reality

The condition is that; when we issue select to the dictionary views like the DBA_2PC_PENDING, PENDING_TRANS$, etc. we see the transaction, but the transaction does not exist in X$KTUXE view.

If the state of the  transaction  (in DBA_2PC_PENDING) is committed,  rollback forced or commit forced then it can be cleaned by:
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765');  /* ->replace with ur own trx_id */

PL/SQL procedure successfully completed.

If the state of the  transaction is prepared, we have to clean manually as follows:
SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '97.33.166765'; /* ->replace with ur own trx_id */
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='97.33.166765' ; /* ->replace with ur own trx_id */
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '97.33.166765'; /* ->replace with ur own trx_id */
SQL> COMMIT;

Note:  DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY command does not purge the actual transaction in X$KTUXE view. It touches the dictionary views.
To really purge the transaction, we need a commit or rollback. In the above case, however, there is no real transaction so we don’t need to worry about rollback or commit, all we do is cleaning the dictionary...


Cond 2: DBA_2PC_PENDING view does NOT have entries about our transaction but there IS A transaction

This is something like a orphan transaction that the dictionary is not aware of.

Trying to force commit or rollback this transaction may result in error like below, since the dictionary is not aware:
SQL> ROLLBACK FORCE  '97.33.166765' /* ->replace with ur own trx_id */

ORA-02058: no prepared transaction found with ID 97.33.166765

What we need to do at this point is; recovering our transaction from being an orphan by inserting some dummy records into dictionay tables (so the views...) and then force a rollback or commit:
You do not have to change the parameters in the insert command other than the transaction id.
SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
VALUES
(
     '97.33.166765',   /* ->replace with ur own trx_id */
     306206,           
     'XXXXXXX.12345.1.2.3', 
     'prepared','P',        
     hextoraw( '00000001' ), 
     hextoraw( '00000000' ), 
     0, sysdate, sysdate
);


SQL> INSERT INTO PENDING_SESSIONS$
VALUES
(
      '97.33.166765',  /* ->replace with ur own trx_id */
      1, hextoraw('05004F003A1500000104'),
      'C', 0, 30258592, '',
      146
);

                 
COMMIT;

Now, we should be able to rollback or commit.
SQL> ROLLBACK FORCE  '97.33.166765' /* ->replace with ur own trx_id */
or
SQL> COMMIT FORCE  '97.33.166765' /* ->replace with ur own trx_id */

Lastly, we remove the dummy entry from the dictionary:
SQL> ALTER SYSTEM ENABLE  DISTRIBUTED RECOVERY;

SQL> ALTER SYSTEM SET "_smu_debug_mode" = 4;

SQL> COMMIT;

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765');  /* ->replace with ur own trx_id */

SQL> ALTER SYSTEM SET "_smu_debug_mode" = 0;

SQL> COMMIT;


Check to see whether the transaction has gone:
SQL> SELECT * FROM X$KTUXE WHERE
           KTUXEUSN=97
AND KTUXESLT=33
AND KTUXESQN =166765;

No rows returned.


Cond 3: DBA_2PC_PENDING has entry and there is a transaction but COMMIT or ROLLBACK HANGS!

In the situation, where COMMIT FORCE or ROLLBACK FORCE hangs,

Trying to purge the transaction will give an error like:
SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765');  /* ->replace with ur own trx_id */

ERROR at line 1:
ORA-06510: PL/SQL: unhandled user-defined exception
ORA-06512: at "SYS.DBMS_TRANSACTION", line 94
ORA-06512: at line 1

Solution: The solution is the combination of Cond1 and Cond2:

First, delete the dictionary entries.
SQL> DELETE FROM SYS.PENDING_TRANS$ WHERE LOCAL_TRAN_ID = '97.33.166765'; /* ->replace with ur own trx_id */
SQL> DELETE FROM SYS.PENDING_SESSIONS$ WHERE LOCAL_TRAN_ID ='97.33.166765' ; /* ->replace with ur own trx_id */
SQL> DELETE FROM SYS.PENDING_SUB_SESSIONS$ WHERE LOCAL_TRAN_ID = '97.33.166765'; /* ->replace with ur own trx_id */
SQL> COMMIT;

Then, insert dummy record, force commit and finally purge the transaction.
SQL> ALTER SYSTEM DISABLE DISTRIBUTED RECOVERY;

SQL> INSERT INTO PENDING_TRANS$ (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME,RECO_TIME)
VALUES
(
     '97.33.166765',   /* ->replace with ur own trx_id */
     306206,           
     'XXXXXXX.12345.1.2.3', 
     'prepared','P',        
     hextoraw( '00000001' ), 
     hextoraw( '00000000' ), 
     0, sysdate, sysdate
);


SQL> INSERT INTO PENDING_SESSIONS$
VALUES
(
      '97.33.166765',  /* ->replace with ur own trx_id */
      1, hextoraw('05004F003A1500000104'),
      'C', 0, 30258592, '',
      146
);

                 
COMMIT;

SQL> COMMIT FORCE  '97.33.166765' /* ->replace with ur own trx_id */

SQL> EXECUTE DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('97.33.166765');  /* ->replace with ur own trx_id */


No comments:

Post a Comment