Sunday, November 30, 2008

Speak to me...Interpreting C#

I came across A C# REPL (in Clojure) which discusses a means by which to interact with and run C# code via an interactive command-line by using Clojure and IKVM.NET . Now, Clojure I have heard of (a Lisp implementation that runs on the JVM) but this is my first time hearing about IKVM.NET (which I now know to be a .NET implementation for JVM). The post describes how combining these two technologies gives you the potential of working with a static language like C# in a way that is quite common in the world of dynamic languages such as Python, Ruby, Boo, etc.

Having an interactive code interpreter is a huge productivity boost. It allows you to easily run and test your code as you write and modify it while not needing to pay the dreaded compilation tax which can disrupt your development flow. This dramatically tightens and shortens the feedback loop on how well your code works ranging from whether it is behaving as intended for meeting spec requirements to much more quickly identifying any runtime bugs than you would using a development process common to traditionally compiled static languages. (As an aside, these are generally the same reasons that are given for creating and maintaining automated unit tests. Same goal but different methods.)

Not sure how well this C#/Clojure/IKVM.NET approach works or how well it realistically performs (typically, interpreted languages are slower). What is certain is that this unusual implementation requires the use of the very foreign-looking Lisp parentheses. I will openly admit as someone who does not program in Lisp it strikes me as kind of strange to use and see parentheses with C# but aside from this peculiar syntactical idiosyncrasy the general concept of REPL with C# overshadows even this oddity. This quote sums up its overall appeal in the world of C#:

A REPL is a Read-Eval-Print Loop, which is a fancy way of saying "an interactive programming command line". It's like the immediate window in the Visual Studio debugger on steroids, and its absence is one of the increasing number of things that makes C# painful to use as I gain proficiency in more advanced languages.

This is precisely how I felt when I initially started to learn and use Python. Suddenly, coding in C# with Visual Studio certainly seems to now be relatively more restrictive. Similarly, whenever I have had to touch any VBA code (yes, that does happen from time to time) I customarily inhabit the VB Editor's Immediate Window (IW) pushing its limits by attempting to use it in a manner that is similar to how I code in Python.

For example, in the VB language, not only is it not required to declare the data type of a variable but it is even unnecessary to explicitly declare the variables themselves (usually this is done using the 'Dim' keyword but this can be avoided by quite simply not including the 'Option Explicit' statement). Subsequently, the first time a value is assigned to a variable, the variable will automatically and implicitly be defined on the stack just like it does in Python. As a result, you can somewhat attain that same level of interaction with code in VB (via IW) as you would in Python (via its standard interpreter) potentially gaining the productivity benefits of writing less code in contrast to strongly typed languages.

My increased reliance of the Immediate Window also extends to Visual Studio when coding in C# but it requires more work and syntax overhead versus IW in the old VB Editor. Overall, it is not quite the same experience as in Python. Regardless, as I have previously written, frequent use of the VB Editor's IW led me to lean heavily on the one in Visual Studio whenever coding in C#. Prior to that, I had somewhat forgotten it even existed. In fact, in VS 2005, the IW sometimes is missing and difficult to view when not in debug mode (this is allegedly also true for VS 2008). This is discouraging as it probably contributes to most .NET developers not favoring its use in more situations.

While I have seen other attempts at providing an interactive console for C# the following are ones I have noted to possibly try out in the very near future:
I am extremely curious if (and hopeful that) Microsoft will provide an improved implementation for VS's IW when the more dynamic C# 4.0 becomes mainstream. (How long before we have an official C#Script? It worked for VB and VBScript.)

Friday, November 14, 2008

Unit testing databases with TSQLUnit

In preparation to doing a lot of database work (think schema/DDL changes) for the next several months, I thought it would be important to define methodologies and tools to manage the development process. Having been heavily applying a test-driven development (TDD) approach in a recent project for an ASP.NET C# web application, it seemed logical and now quite natural to continue pursuing a similar path and mindset but one now more targeted towards database development.

TSQLUnit and TDD

To facilitate a smooth flow with TDD, it was key to identify a suitable testing framework to support my goal. After some research and comparisons, I eventually settled on TSQLUnit, an xUnit framework for MS SQL Server. It is interesting to note that for every programming language that exists out there, it is expected that someone will eventually create an xUnit style testing framework for it. Why should SQL Server be any different?

Database testing could be performed from the application side but the nature of the development was primarily schema/DDL changes (e.g. mostly dropping and removing tables, columns, etc.) that involved very little or no business logic. Otherwise, app code would be a better and more appropriate candidate to handle testing the db. This particular scenario lent itself to finding and using something that was as close to the database as possible ideally where the unit tests can be written and supported by the TSQL language itself. The intent was to try and follow one of the core tenets of any xUnit framework which is to do testing in the same language as the code that requires to be changed. TSQLUnit seemed the best fit.

Regardless, it is quite probable, in the future, that I might disregard all of above and still switch to testing all aspects of the database using C#. This is not because of TSQLUnit but because the TSQL language itself can be so cumbersome to work with especially if you are accustomed to the power and flexibility of a non-declarative programming language such C#. For the moment, it is satisfies my current needs for testing so it will suffice.

A Few Key Features of TSQLUnit

One feature of TSQLUnit that is critical for any type of database testing is the ability to rollback changes after each test run completes ensuring that the database is in a nice clean state before the next test starts. This simply implies that the the data will not remain altered when the next test runs potentially tainting its results. Not only does this apply to all data/DML changes (inserts, updates, deletes) but DDL changes (creates, alters, drops) as well. For example, if Test A creates a table or alters a column for an existing table then those schema changes will most certainly rollback and be undone before Test B starts to execute. Therefore, the tests themselves are isolated from each other meaning that all of the unit tests can run independently.

In addition, TSQLUnit supports the xUnit features of Setup/Teardown for a test suite. For example, if you have five tests that are logically related and are dependent on the same preconditions then you can prepare all of your data in a single, shared 'Setup' fixture and TSQLUnit will automatically run this before each associated test thereby enforcing no co-dependencies between the tests. Note that one obvious drawback with repeating the same setup multiple times might be performance if the tests rely on loading large volumes of data for the purpose of testing. As for 'Teardown', it provides essentially the same functionality as 'Setup' with the one difference of executing at the end of each test and not at the beginning.

TSQLUnit is fairly decent but I initially struggled not with the framework itself but with the usual problems with database testing such as setup of data, rollback of changes, etc. It is a lot of work because unlike in application code it is hard to stub out dependencies in the database objects such as tables, stored procedires, etc. (Especially when dealing with a 225+ column table in the good ol' legacy db I was so fortunate to be tasked to make modifications.)

Running TSQLUnit using NAnt

Once a few initial unit tests were created using TSQLUnit, the process of running them was incorporated into the project's NAnt database build script. Just like the database build itself, the unit tests makes use of the NAntContrib <sql> task for their creation and execution. These are the high level targets defined in the build script:
<!-- after building database including applying change scripts -->

<call target="InstallTSQLUnit"/>
<call target="LoadUnitTests"/>
<call target="RunUnitTests"/>

'InstallTSQLUnit' is a single sql script that runs against your target db creating all of the necessary tables, stored procedures, etc. that the TSQLUnit framework requires to function. This is simply the basis and source code for the framework itself that is integrated with the database.

'LoadUnitTests' runs the sql scripts that contains the tests themselves located in a 'Tests' folder in the project's directory. To elaborate, it creates all of the unit test sprocs along with, if necessary, injecting any test data into the database that is not handled by the Setup fixtures.

Finally, 'RunUnitTests' calls the TSQLUnit command to run all of the unit tests inside the database:
<target name="RunUnitTests" description="Run all unit tests "
if="${installUnitTesting}">
<sql connstring="${connectionString}" transaction="true" delimiter=";" delimstyle="Line">
exec tsu_runTests;
</sql>
</target>
The NAnt db build is configurable so that you can specify whether or not to include or exclude the unit tests. This might be necessary if you just simply want to only build the database and nothing more perhaps for the sake of reducing the amount of the build time if the intent is to use database for another purpose other than testing.

In the command-line window, as NAnt logs the progress of the database build, at the end it will bubble up the output results of TSQLUnit indicating whether the tests passed or failed:
...
ExecuteSql:

[echo] loading .\Database\Tests\ut_DeleteTables_VerifyTablesDropped.sql

RunUnitTests:

[sql] ====================================================================
============
[sql] --------------------------------------------------------------------
------------
[sql]      SUCCESS!
[sql]
[sql]      14 tests, of which 0 failed and 0 had an error.
[sql]  Summary:
[sql]  Run tests ends:Feb 20 2008 11:44AM
[sql] --------------------------------------------------------------------
------------
[sql] Testsuite:  (14 tests ) execution time: 76 ms.
[sql] ====================================================================
============
[sql]  Run tests starts:Feb 20 2008 11:44AM
[sql] ====================================================================
============

BUILD SUCCEEDED

Total time: 6.1 seconds.
Currently, the overall database build does not fail if the unit tests themselves fail since NAnt can not catch and interpret TSQLUnit results. This codeproject MS SQL Server script validation with NAnt Task seems to provide that level integration between NAnt and SQL Server unit testing. Perhaps, this functionality might be implemented later, but, for now, it is good enough.

Other related posts on TSQLUnit:

It's True (And Not False): Assert Equality in TSQLUnit

Database schema changes unit tested using TSQLUnit

Sunday, November 9, 2008

Relax and "unwind" with recursion

I was recently looking to change some values in test data that was scripted in TSQL insert statements that I was not too fond of. This data is loaded on development and test builds of a database for a system that I work on.

The data changes center around some arbitrary ID numbers. These values are not system generated but can be considered to be natural primary keys since they are defined externally outside the system and they are used and referenced by the end-users signifying meaning in the business domain. However, for development and testing purposes, the values can be anything.

At first, the numbers reflected actual numbers that preexisted outside the testing and dev environments but I found them too difficult to remember when testing the application manually or when defining automated unit test case scenarios. Therefore, I decided on replacing the existing ID numbers with a sequence of numbers that I can recall more easily:
1234567890, 2345678901, 3456789012, ..., 0123456789
I figured this list of numbers would be easier to remember on the fly rather than if they were randomly generated. You can see an obvious pattern here hence lending themselves well to be committed to (human) memory. The initial number in the list is a universally simple value of 1234567890. Each subsequent number is the same except that the first digit is moved from the first position to the last position as compared to the preceding value in the list. For example, the number '1' in 1234567890 gets relocated to the end to become 2345678901.

I decided to write a simple Python script to update all of the data sql files with these new numbers. Now, I could have quite obviously created the list manually in a very short time in my script such as:
new_id_numbers = [1234567890, 2345678901, 3456789012, ..., 0123456789]
However, I always like to find opportunities to challenge my programming abilities even in some inconsequential situations such as this one. The challenge I created for myself was to see how I can programmatically generate this exact same list. Evidently, a pattern exists with these numbers implying code can be written to produce this particular set of values without the need to manually type them out.

My first impulse was that I could write some kind of iterative loop to generate the list:
# iteration version
def get_swapped_numbers(numbers):
first = numbers[0]
for item in numbers:
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return numbers

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
It works. Nothing unusual here. However, I recognized that this particular numerical pattern is indeed recursive in that each value in the list is dependent on the previous value and that recursion could be used to solve this as well. Like probably most programmers, I have never used recursion in actual code but it has been one of those fundamental concepts in computer science that I felt that I needed to better understand and explore. As a result, this simple pattern provided an excellent opportunity to flex my recursive muscles.

After mulling over for some time on how to implement a recursive function for this specific problem here is what I eventually churned out:
# recursion version
def get_swapped_numbers(numbers):
first = numbers[0]
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return get_swapped_numbers(numbers) # recursion!!

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
To build your recursive function, the most important piece is that somewhere in the body of the function the function must call itself by name:
    return get_swapped_numbers(numbers) # recursion!!
Without it, well...you just have a plain ol' vanilla function.

The other very important piece of a recursive function is to include some kind of a conditional statement that acts like a guard clause which is commonly referred to as the 'base case' in recursion lingo. This base case will cause your recursive calls to stop and begin to "unwind" itself as it spirals back up to the first initial call. If a base case is not included then you run the risk of unleashing an ugly infinite loop.

The base case is really no different than the break/return point in the iteration example:
    if len(numbers) == len(first) or len(first) == 0:
return numbers
In fact, any recursive function can be written and expressed as an iterative loop. This is probably why most programmers do not use recursion since you can achieve the same results using more familiar, day-to-day techniques. In addition, traditionally in most languages, recursion tends to perform slower than their loop counterpart.

So, why even use recursion when iteration can suffice? The real benefit is that sometimes a recursive function can end up being more readable and clearer in intent than an iterative function (although in my example it is a wash in this aspect). It also seems that certain types of recursion (e.g. tail recursion) are optimized by compilers so it could truly provide better performance with the added bonus of improved readability.

Truthfully, writing the loop was a lot easier than the recursion version. I am uncertain that the reason is because writing loops are so ingrained and second nature to me having created so many over the years in code coupled with the fact that this was my real first attempt to implement a true recursive function. Admittedly, it was a bit trippy mentally working through each function call to ensure avoiding the dreaded infinite loop. However, the same could be said when I first learned writing loops many years ago. With more time and practice, I can probably create recursive function without exerting any more thought than I would with creating a loop. All in all, even if I never again use recursion in my code, it is still an important technique for programmers to understand and recognize especially if encountered in code you did not write.