Wednesday, December 10, 2008

Disentangling Nested Transactions in TSQL

The following TSQL error (# 266) surfaced while using TSQLUnit to test a recently altered stored procedure for a legacy database:
"Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 1, current count = 0."
After struggling for several hours to pinpoint the problem, the debugging and troubleshooting process provided me with a better understanding of how nested transactions are managed in SQL Server. First and foremost, truly just "one" transaction can exist per connection, a fact previously not known to me. This ultimately plays a pivotal role in the above error as revealed in the the Sybase Product Manual entry on Error 266:
When a stored procedure is run, Adaptive Server maintains a count of open transactions, adding 1 to the count when a transaction begins, and subtracting 1 when a transaction commits. When you execute a stored procedure, Adaptive Server expects the transaction count to be the same before and after the stored procedure execution. Error 266 occurs when the transaction count is different after execution of a stored procedure than it was when the stored procedure began...
Error 266 occurs when you are using nested procedures, and procedures at each level of nesting include begin, commit, and rollback transaction statements. If a procedure at a lower nest level opens a transaction and one of the called procedures issues a rollback transaction, Error 266 occurs when you exit the nested procedure.
The sproc under test is the "lower nest level" procedure in relation to the TSQLUnit sproc. What I discovered was that the TSQL for most of the existing stored procedures in the legacy database (such as the one being tested) contained this statement:
which was commonly found in two separate and distinct locations in the code:
  1. at very beginning of the sproc before anything interesting occurs
  2. at the very end when it's too late for it to be effective
As an example:
create proc someProc
/* main body of the sproc */
This slightly odd coding convention was interfering with TSQLUnit's native stored procs' ability to perform rollbacks of all changes once an individual unit test completes execution. The aforementioned rollback statement sabotaged the outer transaction of the TSQLUnit sproc (or any other calling sproc, for that matter) by resetting the @@trancount value causing the error to be raised.

Truthfully, unless I'm missing something, that rollback code does not really provide any benefit for the original sprocs themselves. I do not fully understand the rationale for placing these lines of code in most of the database's sprocs. Perhaps it is some overcautious (and overzealous) attempt to handle any unforeseen data failures forcing cleanups at every step. However, nothing indicates this to be true or probable. Point # 2 listed above is especially puzzling since in most cases an explicit COMMIT has already taken place right before it reaches the offending bit of code. Once a commit occurs, why bother attempting a rollback?

The bug fix was simply to remove all occurrences of the rollback transaction code since it caused more harm than good. Maybe the reason for its existence was to intentionally prevent unwanted calls from other sprocs thereby keeping them isolated and independent. As with most legacy code written by others long gone, I (or any other maintenance developer that comes after me) may never know.