Wednesday, December 10, 2008

Disentangling Nested Transactions in TSQL

The following TSQL error (# 266) surfaced while using TSQLUnit to test a recently altered stored procedure for a legacy database:
"Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 1, current count = 0."
After struggling for several hours to pinpoint the problem, the debugging and troubleshooting process provided me with a better understanding of how nested transactions are managed in SQL Server. First and foremost, truly just "one" transaction can exist per connection, a fact previously not known to me. This ultimately plays a pivotal role in the above error as revealed in the the Sybase Product Manual entry on Error 266:
When a stored procedure is run, Adaptive Server maintains a count of open transactions, adding 1 to the count when a transaction begins, and subtracting 1 when a transaction commits. When you execute a stored procedure, Adaptive Server expects the transaction count to be the same before and after the stored procedure execution. Error 266 occurs when the transaction count is different after execution of a stored procedure than it was when the stored procedure began...
Furthermore:
Error 266 occurs when you are using nested procedures, and procedures at each level of nesting include begin, commit, and rollback transaction statements. If a procedure at a lower nest level opens a transaction and one of the called procedures issues a rollback transaction, Error 266 occurs when you exit the nested procedure.
The sproc under test is the "lower nest level" procedure in relation to the TSQLUnit sproc. What I discovered was that the TSQL for most of the existing stored procedures in the legacy database (such as the one being tested) contained this statement:
IF @@TRANCOUNT <> 0 ROLLBACK TRANSACTION
which was commonly found in two separate and distinct locations in the code:
  1. at very beginning of the sproc before anything interesting occurs
  2. at the very end when it's too late for it to be effective
As an example:
create proc someProc
as
begin
IF @@TRANCOUNT <> 0 ROLLBACK TRANSACTION
/* main body of the sproc */
IF @@TRANCOUNT <> 0 ROLLBACK TRANSACTION
end
This slightly odd coding convention was interfering with TSQLUnit's native stored procs' ability to perform rollbacks of all changes once an individual unit test completes execution. The aforementioned rollback statement sabotaged the outer transaction of the TSQLUnit sproc (or any other calling sproc, for that matter) by resetting the @@trancount value causing the error to be raised.

Truthfully, unless I'm missing something, that rollback code does not really provide any benefit for the original sprocs themselves. I do not fully understand the rationale for placing these lines of code in most of the database's sprocs. Perhaps it is some overcautious (and overzealous) attempt to handle any unforeseen data failures forcing cleanups at every step. However, nothing indicates this to be true or probable. Point # 2 listed above is especially puzzling since in most cases an explicit COMMIT has already taken place right before it reaches the offending bit of code. Once a commit occurs, why bother attempting a rollback?

The bug fix was simply to remove all occurrences of the rollback transaction code since it caused more harm than good. Maybe the reason for its existence was to intentionally prevent unwanted calls from other sprocs thereby keeping them isolated and independent. As with most legacy code written by others long gone, I (or any other maintenance developer that comes after me) may never know.

Sunday, November 30, 2008

Speak to me...Interpreting C#

I came across A C# REPL (in Clojure) which discusses a means by which to interact with and run C# code via an interactive command-line by using Clojure and IKVM.NET . Now, Clojure I have heard of (a Lisp implementation that runs on the JVM) but this is my first time hearing about IKVM.NET (which I now know to be a .NET implementation for JVM). The post describes how combining these two technologies gives you the potential of working with a static language like C# in a way that is quite common in the world of dynamic languages such as Python, Ruby, Boo, etc.

Having an interactive code interpreter is a huge productivity boost. It allows you to easily run and test your code as you write and modify it while not needing to pay the dreaded compilation tax which can disrupt your development flow. This dramatically tightens and shortens the feedback loop on how well your code works ranging from whether it is behaving as intended for meeting spec requirements to much more quickly identifying any runtime bugs than you would using a development process common to traditionally compiled static languages. (As an aside, these are generally the same reasons that are given for creating and maintaining automated unit tests. Same goal but different methods.)

Not sure how well this C#/Clojure/IKVM.NET approach works or how well it realistically performs (typically, interpreted languages are slower). What is certain is that this unusual implementation requires the use of the very foreign-looking Lisp parentheses. I will openly admit as someone who does not program in Lisp it strikes me as kind of strange to use and see parentheses with C# but aside from this peculiar syntactical idiosyncrasy the general concept of REPL with C# overshadows even this oddity. This quote sums up its overall appeal in the world of C#:

A REPL is a Read-Eval-Print Loop, which is a fancy way of saying "an interactive programming command line". It's like the immediate window in the Visual Studio debugger on steroids, and its absence is one of the increasing number of things that makes C# painful to use as I gain proficiency in more advanced languages.

This is precisely how I felt when I initially started to learn and use Python. Suddenly, coding in C# with Visual Studio certainly seems to now be relatively more restrictive. Similarly, whenever I have had to touch any VBA code (yes, that does happen from time to time) I customarily inhabit the VB Editor's Immediate Window (IW) pushing its limits by attempting to use it in a manner that is similar to how I code in Python.

For example, in the VB language, not only is it not required to declare the data type of a variable but it is even unnecessary to explicitly declare the variables themselves (usually this is done using the 'Dim' keyword but this can be avoided by quite simply not including the 'Option Explicit' statement). Subsequently, the first time a value is assigned to a variable, the variable will automatically and implicitly be defined on the stack just like it does in Python. As a result, you can somewhat attain that same level of interaction with code in VB (via IW) as you would in Python (via its standard interpreter) potentially gaining the productivity benefits of writing less code in contrast to strongly typed languages.

My increased reliance of the Immediate Window also extends to Visual Studio when coding in C# but it requires more work and syntax overhead versus IW in the old VB Editor. Overall, it is not quite the same experience as in Python. Regardless, as I have previously written, frequent use of the VB Editor's IW led me to lean heavily on the one in Visual Studio whenever coding in C#. Prior to that, I had somewhat forgotten it even existed. In fact, in VS 2005, the IW sometimes is missing and difficult to view when not in debug mode (this is allegedly also true for VS 2008). This is discouraging as it probably contributes to most .NET developers not favoring its use in more situations.

While I have seen other attempts at providing an interactive console for C# the following are ones I have noted to possibly try out in the very near future:
I am extremely curious if (and hopeful that) Microsoft will provide an improved implementation for VS's IW when the more dynamic C# 4.0 becomes mainstream. (How long before we have an official C#Script? It worked for VB and VBScript.)

Friday, November 14, 2008

Unit testing databases with TSQLUnit

In preparation to doing a lot of database work (think schema/DDL changes) for the next several months, I thought it would be important to define methodologies and tools to manage the development process. Having been heavily applying a test-driven development (TDD) approach in a recent project for an ASP.NET C# web application, it seemed logical and now quite natural to continue pursuing a similar path and mindset but one now more targeted towards database development.

TSQLUnit and TDD

To facilitate a smooth flow with TDD, it was key to identify a suitable testing framework to support my goal. After some research and comparisons, I eventually settled on TSQLUnit, an xUnit framework for MS SQL Server. It is interesting to note that for every programming language that exists out there, it is expected that someone will eventually create an xUnit style testing framework for it. Why should SQL Server be any different?

Database testing could be performed from the application side but the nature of the development was primarily schema/DDL changes (e.g. mostly dropping and removing tables, columns, etc.) that involved very little or no business logic. Otherwise, app code would be a better and more appropriate candidate to handle testing the db. This particular scenario lent itself to finding and using something that was as close to the database as possible ideally where the unit tests can be written and supported by the TSQL language itself. The intent was to try and follow one of the core tenets of any xUnit framework which is to do testing in the same language as the code that requires to be changed. TSQLUnit seemed the best fit.

Regardless, it is quite probable, in the future, that I might disregard all of above and still switch to testing all aspects of the database using C#. This is not because of TSQLUnit but because the TSQL language itself can be so cumbersome to work with especially if you are accustomed to the power and flexibility of a non-declarative programming language such C#. For the moment, it is satisfies my current needs for testing so it will suffice.

A Few Key Features of TSQLUnit

One feature of TSQLUnit that is critical for any type of database testing is the ability to rollback changes after each test run completes ensuring that the database is in a nice clean state before the next test starts. This simply implies that the the data will not remain altered when the next test runs potentially tainting its results. Not only does this apply to all data/DML changes (inserts, updates, deletes) but DDL changes (creates, alters, drops) as well. For example, if Test A creates a table or alters a column for an existing table then those schema changes will most certainly rollback and be undone before Test B starts to execute. Therefore, the tests themselves are isolated from each other meaning that all of the unit tests can run independently.

In addition, TSQLUnit supports the xUnit features of Setup/Teardown for a test suite. For example, if you have five tests that are logically related and are dependent on the same preconditions then you can prepare all of your data in a single, shared 'Setup' fixture and TSQLUnit will automatically run this before each associated test thereby enforcing no co-dependencies between the tests. Note that one obvious drawback with repeating the same setup multiple times might be performance if the tests rely on loading large volumes of data for the purpose of testing. As for 'Teardown', it provides essentially the same functionality as 'Setup' with the one difference of executing at the end of each test and not at the beginning.

TSQLUnit is fairly decent but I initially struggled not with the framework itself but with the usual problems with database testing such as setup of data, rollback of changes, etc. It is a lot of work because unlike in application code it is hard to stub out dependencies in the database objects such as tables, stored procedires, etc. (Especially when dealing with a 225+ column table in the good ol' legacy db I was so fortunate to be tasked to make modifications.)

Running TSQLUnit using NAnt

Once a few initial unit tests were created using TSQLUnit, the process of running them was incorporated into the project's NAnt database build script. Just like the database build itself, the unit tests makes use of the NAntContrib <sql> task for their creation and execution. These are the high level targets defined in the build script:
<!-- after building database including applying change scripts -->

<call target="InstallTSQLUnit"/>
<call target="LoadUnitTests"/>
<call target="RunUnitTests"/>

'InstallTSQLUnit' is a single sql script that runs against your target db creating all of the necessary tables, stored procedures, etc. that the TSQLUnit framework requires to function. This is simply the basis and source code for the framework itself that is integrated with the database.

'LoadUnitTests' runs the sql scripts that contains the tests themselves located in a 'Tests' folder in the project's directory. To elaborate, it creates all of the unit test sprocs along with, if necessary, injecting any test data into the database that is not handled by the Setup fixtures.

Finally, 'RunUnitTests' calls the TSQLUnit command to run all of the unit tests inside the database:
<target name="RunUnitTests" description="Run all unit tests "
if="${installUnitTesting}">
<sql connstring="${connectionString}" transaction="true" delimiter=";" delimstyle="Line">
exec tsu_runTests;
</sql>
</target>
The NAnt db build is configurable so that you can specify whether or not to include or exclude the unit tests. This might be necessary if you just simply want to only build the database and nothing more perhaps for the sake of reducing the amount of the build time if the intent is to use database for another purpose other than testing.

In the command-line window, as NAnt logs the progress of the database build, at the end it will bubble up the output results of TSQLUnit indicating whether the tests passed or failed:
...
ExecuteSql:

[echo] loading .\Database\Tests\ut_DeleteTables_VerifyTablesDropped.sql

RunUnitTests:

[sql] ====================================================================
============
[sql] --------------------------------------------------------------------
------------
[sql]      SUCCESS!
[sql]
[sql]      14 tests, of which 0 failed and 0 had an error.
[sql]  Summary:
[sql]  Run tests ends:Feb 20 2008 11:44AM
[sql] --------------------------------------------------------------------
------------
[sql] Testsuite:  (14 tests ) execution time: 76 ms.
[sql] ====================================================================
============
[sql]  Run tests starts:Feb 20 2008 11:44AM
[sql] ====================================================================
============

BUILD SUCCEEDED

Total time: 6.1 seconds.
Currently, the overall database build does not fail if the unit tests themselves fail since NAnt can not catch and interpret TSQLUnit results. This codeproject MS SQL Server script validation with NAnt Task seems to provide that level integration between NAnt and SQL Server unit testing. Perhaps, this functionality might be implemented later, but, for now, it is good enough.

Other related posts on TSQLUnit:

It's True (And Not False): Assert Equality in TSQLUnit

Database schema changes unit tested using TSQLUnit

Sunday, November 9, 2008

Relax and "unwind" with recursion

I was recently looking to change some values in test data that was scripted in TSQL insert statements that I was not too fond of. This data is loaded on development and test builds of a database for a system that I work on.

The data changes center around some arbitrary ID numbers. These values are not system generated but can be considered to be natural primary keys since they are defined externally outside the system and they are used and referenced by the end-users signifying meaning in the business domain. However, for development and testing purposes, the values can be anything.

At first, the numbers reflected actual numbers that preexisted outside the testing and dev environments but I found them too difficult to remember when testing the application manually or when defining automated unit test case scenarios. Therefore, I decided on replacing the existing ID numbers with a sequence of numbers that I can recall more easily:
1234567890, 2345678901, 3456789012, ..., 0123456789
I figured this list of numbers would be easier to remember on the fly rather than if they were randomly generated. You can see an obvious pattern here hence lending themselves well to be committed to (human) memory. The initial number in the list is a universally simple value of 1234567890. Each subsequent number is the same except that the first digit is moved from the first position to the last position as compared to the preceding value in the list. For example, the number '1' in 1234567890 gets relocated to the end to become 2345678901.

I decided to write a simple Python script to update all of the data sql files with these new numbers. Now, I could have quite obviously created the list manually in a very short time in my script such as:
new_id_numbers = [1234567890, 2345678901, 3456789012, ..., 0123456789]
However, I always like to find opportunities to challenge my programming abilities even in some inconsequential situations such as this one. The challenge I created for myself was to see how I can programmatically generate this exact same list. Evidently, a pattern exists with these numbers implying code can be written to produce this particular set of values without the need to manually type them out.

My first impulse was that I could write some kind of iterative loop to generate the list:
# iteration version
def get_swapped_numbers(numbers):
first = numbers[0]
for item in numbers:
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return numbers

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
It works. Nothing unusual here. However, I recognized that this particular numerical pattern is indeed recursive in that each value in the list is dependent on the previous value and that recursion could be used to solve this as well. Like probably most programmers, I have never used recursion in actual code but it has been one of those fundamental concepts in computer science that I felt that I needed to better understand and explore. As a result, this simple pattern provided an excellent opportunity to flex my recursive muscles.

After mulling over for some time on how to implement a recursive function for this specific problem here is what I eventually churned out:
# recursion version
def get_swapped_numbers(numbers):
first = numbers[0]
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return get_swapped_numbers(numbers) # recursion!!

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
To build your recursive function, the most important piece is that somewhere in the body of the function the function must call itself by name:
    return get_swapped_numbers(numbers) # recursion!!
Without it, well...you just have a plain ol' vanilla function.

The other very important piece of a recursive function is to include some kind of a conditional statement that acts like a guard clause which is commonly referred to as the 'base case' in recursion lingo. This base case will cause your recursive calls to stop and begin to "unwind" itself as it spirals back up to the first initial call. If a base case is not included then you run the risk of unleashing an ugly infinite loop.

The base case is really no different than the break/return point in the iteration example:
    if len(numbers) == len(first) or len(first) == 0:
return numbers
In fact, any recursive function can be written and expressed as an iterative loop. This is probably why most programmers do not use recursion since you can achieve the same results using more familiar, day-to-day techniques. In addition, traditionally in most languages, recursion tends to perform slower than their loop counterpart.

So, why even use recursion when iteration can suffice? The real benefit is that sometimes a recursive function can end up being more readable and clearer in intent than an iterative function (although in my example it is a wash in this aspect). It also seems that certain types of recursion (e.g. tail recursion) are optimized by compilers so it could truly provide better performance with the added bonus of improved readability.

Truthfully, writing the loop was a lot easier than the recursion version. I am uncertain that the reason is because writing loops are so ingrained and second nature to me having created so many over the years in code coupled with the fact that this was my real first attempt to implement a true recursive function. Admittedly, it was a bit trippy mentally working through each function call to ensure avoiding the dreaded infinite loop. However, the same could be said when I first learned writing loops many years ago. With more time and practice, I can probably create recursive function without exerting any more thought than I would with creating a loop. All in all, even if I never again use recursion in my code, it is still an important technique for programmers to understand and recognize especially if encountered in code you did not write.

Friday, October 17, 2008

Is JavaScript the "Next Big Language"?

I recently read an old post of Steve Yegge from early 2007 entitled The Next Big Language. In it, he describes what are the fundamental characteristics that a new programming language must have if it is to become popular. Throughout his post, he hints at what he thinks the next big language ("NBL") might be but intentionally does not mention it by name for various reasons.

However, it seems to be quite evident that Yegge is probably referring to JavaScript (and\or the ECMAScript standard). A lot of his later posts seem to support that conjecture. One post in particular entitled Dynamic Languages Strike Back really does emphasize the strong possibility of JavaScript being the NBL:
"...So JavaScript. JavaScript has been really interesting to me lately, because JavaScript actually does care about performance. They're the first of the modern dynamic languages where performance has become an issue not just for the industry at large, but also increasingly for academia.
Why JavaScript? Well, it was Ajax. See, what happened was... Lemme tell ya how it was supposed to be. JavaScript was going away. It doesn't matter whether you were Sun or Microsoft or anybody, right? JavaScript was going away, and it was gonna get replaced with... heh. Whatever your favorite language was.
I mean, it wasn't actually the same for everybody. It might have been C#, it might have been Java, it might have been some new language, but it was going to be a modern language. A fast language. It was gonna be a scalable language, in the sense of large-scale engineering. Building desktop apps. That's the way it was gonna be.
The way it's really gonna be, is JavaScript is gonna become one of the smokin'-est fast languages out there. And I mean smokin' fast..."
Yegge has also mentioned using JavaScript server-side for some project he was working on at Google. Yes, server-side and not client side. If what he is saying is true and that the JavaScript language is continuing to evolve with more and more features transforming it into a powerful multi-paradigm language and programmers start using it not just client side but more on the server side as its performance becomes better and faster then it could really position itself as a first class language on par with C# and Java. Quite a leap from JavaScript's somewhat humble beginnings.

Obviously, Yegge is just one voice (albeit a very popular one) on the topic but the importance of JavaScript as a fundamental language for web apps is not to be taken lightly. This reminded me of a recent conversation about JavaScript with another software developer with whom I used to work. He had stated that he felt like he was currently doing more coding in the client-side/UI layer with JavaScript than he was coding in the middle layer with C# even though the middle tier is where he is expected to be doing most of his development. He also made the observation how the web application he is working on was starting to resemble more like a classic Client-Server application despite the presence of a middle tier.

The implication of this is that this type of architecture is one that more people might be gravitating towards when developing robust web applications without realizing it. Interesting that with the advent and surging popularity of web applications, the decline of desktop applications ("fat", "thick" clients) was greatly exaggerated. They are coming back just in an altered and less recognizable form.

The statements and experiences of the developer I mentioned earlier (combined with Yegge's writings) are extremely telling and is something to which we should pay attention. The increase use of JavaScript and more importantly how and where it's being used is a trend that should be closely watched in the ever changing world of modern software development.

Wednesday, September 24, 2008

Comprehending List Comprehensions

Since using Python to write build scripts (as well as for code generation) to support my development process I have come to increasingly learn and appreciate the power of the Python language. A recent coding situation demonstrated to me how understanding alternatives available in a multi-paradigm language such as Python can amplify the limitations of another language such as C# and can have a real influence on how you think about and write code.

Building working code first...

In my current project, the database for the system is maintained under source control. In the directory of my project's database files, a sub directory exists named 'CurrentReleaseOnly' which contains database unit tests written using the TSQLUnit framework. The purpose of this folder is to segregate out tests that are really just "one time only" with no intention to retain once the db schema changes are implemented in production [1]. In addition, the sub directory contains a plain old text file named 'README.TXT" which serves to explain why the folder exists to other developers working on the project.

Let's say in that folder, 'CurrentReleaseOnly', I have 4 files, three of which are unit test sprocs and one is that 'readme' file:
  1. ut_uspVerifyDroppedColumns.sql
  2. ut_uspVerifyDroppedTable.sql
  3. ut_uspVerifyArchivedData.sql
  4. README.TXT
Since the project files are maintained under Perforce (P4) [2], one of the project maintenance scripts needs to permanently delete all files within that folder from the source code repository with one exception of the aforementioned 'readme' file. In this example, that would imply deleting files # 1 through 3 but keeping # 4. Obviously, from one development cycle to another the number of files eligible for deletion would vary but only one would always be retained (i.e. # 4).

The command line syntax in P4 to open a file for delete is the following:
p4 delete file1.txt file2.txt file3.txt
The goal is to output and execute the above command which can be acheived with, the following was what was originally coded to achieve this action:

def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
for f in files:
cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd


# execute 'open for delete' in source control
p4 = os.popen(cmd)
p4.read()
p4.close()

return True
Quite simply, a list object is first populated with the names of all the files in the directory. Then a loop through each file name is performed incrementally building the P4 command text. Eventually, the output for the command text should look like this:
p4 delete ut_uspVerifyDroppedColumns.sql README.TXT
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
However, as evident, it will also delete the 'readme' file which, if you recall, needs to remain to document the use of the directory. To make this happen, the following conditional statement was added to the loop:
...
for f in files:
if f == exclude_file:
continue

cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd
...
As a result, the P4 output changes to now exclude the 'readme' file:
p4 delete ut_uspVerifyDroppedColumns.sql
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
We now have achieved our desired output and it actually works. All is good except...

Implementing List Comprehensions

Now, you are thinking "So what? What is the big deal? This is rudimentary programming that any four-year-old can do." Yes of course. However, I kept thinking that this was not very "Pythonic". Python is all about manipulating lists in an efficient and concise manner.

I immediately went back and re-read some more about list comprehensions. Armed with a better grasp of this style of programming, the code was altered to now implement this alternative way of building the same P4 command:
import os
...
def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
delete_files = [(delete_files_dir + os.sep + f) for f in all_files \
if f != exclude_file]
cmd = 'p4 delete ' + ' '.join(delete_files)

# open for delete in source control
p4 = os.popen(cmd)
p4.read()
p4.close()

return True
By using Python's implementation of list comprehension, it allowed the building of a new list by filtering out the unneeded items based on some defined criteria. After the new filtered list is created the final p4 command text is generated using the join method of a single space (" ") string.

List comprehensions also support not just filtering but also repeatedly applying the same function against each item in a list. This is considered a variation and more shorthand way of implementing the "map" function commonly found in functional languages. Python does indeed explicitly support the classic functional programming functions of 'map', 'reduce', and 'filter' but it's list comprehensions are an even more "concise" way of implementing map and filter[3].

If you were not impressed with the previous filtering example then here is another more trivial example of applying a 'map' function using list comprehensions. This time the file extension is stripped out from each file name contained within a given list [4]:
>> files = ['ut_uspVerifyDroppedColumns.sql',  'ut_uspVerifyDroppedTable.sql',
'ut_uspVerifyArchivedData.sql', 'README.TXT']
>> print [f[:-4] for f in files] # remove file extension using list comprehensions
['ut_uspVerifyDroppedColumns', 'ut_uspVerifyDroppedTable',
'ut_uspVerifyArchivedData', 'README']
Comparisons to SQL

What really struck me about list comprehensions is how much it reminded me of the ubiquitous database language, SQL. Given my long experience with querying and data manipulation against sql databases I found Python's use and style of list comprehensions to be a much more interesting and maybe even more powerful. Shortly after noting the similarities I subsequently read that list comprehensions were even considered for database querying:
"Comprehensions were proposed as a query notation for databases and were implemented in the Kleisli database query language"
I plan to write a much longer post on my opinions regarding the future of SQL as a language but until then I will say the following. LINQ is a great attempt to bake into the C# language actual data querying features but with one major flaw. It still adopted the SQL syntax in the process which really does need its own makeover (or better yet a replacement).

I'm sure the main reason for Microsoft's decision to closely model LINQ after SQL was to give .NET developers something they were already deeply familiar with and thereby more apt to use it. However, if Microsoft had perhaps used something resembling list comprehensions instead of SQLish syntax it might have made C# an even more powerful language by baking a more initutive and compact syntax [5]
...


[1] If you are someone who is a TDD practitioner (like myself) you might be shouting: "How can you be throwing away unit tests after writing them? That is insanity and completely violates the the very essence of TDD!!!!". Yes, but just like any other methodology, the principles of TDD should not always be followed blindly and adhered to strictly. Sometimes, exceptions have to be made.

In this particular instance, the reasons for dropping certain unit tests after a period of time were primarily due to performance. Since tests against a database tend to be slower than more traditional unit tests found in app code, I decided that after each production release any tests that are no longer valuable beyond the next release would be purged from the project's code base.

For example, unit tests that are used to test such things as "one time only" migration of data from one database to another or tests that simplistically check for the existence (or, conversely, the non-existence) of db objects like columns or tables are good candidates for permanent removal from the suite of db tests. On the other hand, unit tests that assert and validate some complicated logic in stored procedures as an example would be kept and not be removed. Regardless, this is not point of this blog post so I digress...


[2] Admittedly, I would rather be using the significantly less intrusive Subversion.


[3] For some reason, the equivalent of 'reduce' is not supported by Python's list comprehensions. Perhaps it is because the creator of Python was not a fan of map/reduce/filter since the time of its inclusion into the Python language. He especially seems to have a dislike for 'reduce'.


[4] For improved readability, I could have defined a separate function stating more explicitly what exactly it was doing without resorting to using comments (which tend to be a 'code smell'):

def remove_file_extension(f): return f[:-4]
print [remove_file_extension(f) for f in files]
Or I could have used a lambda function for equal effect:
remove_file_extension = lambda f : f[:-4]
print [remove_file_extension(f) for f in files]

[5] Actually, as I recently discovered C# does support map, reduce, and filter as of version 3.0. (respectively, "Enumerable.Select", "Enumerable.Aggregate", and "Enumerable.Where") Not quite list comprehensions but definitely a huge lift for the language. In addition, my understanding is that F# being a functional language does support list comprehensions beyond the standard map/reduce/filter.

Saturday, September 13, 2008

Who needs a good text editor? I write perfect code

The first time I read Pragmatic Programmer (a future classic) it strongly emphasized that programmers choose one good text editor and learn it well. This got me thinking about how important a good text editor is if you write code for a living. I am of the school of thought that code is design. I want to increase the speed by which I write code so as to match my thoughts. Code is the concrete extensions of my thoughts on how to implement software. This means that code should be easy to manipulate and thereby be malleable and fluid in nature. Therefore, it makes perfect sense to me why the book discusses the virtues of using a good editor.

At the time of my first reading PP, my text editors were Visual Studio (the default IDE for .NET developers such as myself) along with the plain vanilla Notepad. Inspired by PP, I went on to use Notepad2 and then eventually moved on to the more robust and extensible Notepad++. However, I recently re-read PP because it is one of those book you need to keep referring back to make certain you are headed down the right path as a programmer. (Also, you tend to miss out on tidbits of good info due to faulty memory.) This time around I noted that one of the text editors they recommended was Emacs.

What is Emacs (and VI)?

After researching about Emacs, I immediately got the impression that this is one of the text editors that serious, hardcore programmers, may, I dare say, hackers use. If you want to become one of those (or at least aspire to) then you might as well use what those individuals use because they obviously must know something, right? Emacs has been around since the 1970's making it one of the oldest text editors with active development still current as of today (2008). So, once again, something must be good about it, right?

I noticed that another editor was consistently being referenced on most things I read about Emacs. That other text editor was VI (or VIM). Not surprisingly, just like a lot of things in the world of programming a long running rivalry exists between Emacs and VI. Now, unlike Emacs, I was, surprisingly, already familiar with VI since I had learned to use it in a technical class I took early in my career. Guess what? At the time, I hated it.

My dislike for VI centered on the fact that I was weaned on more modern text editors such as Notepad and other Windows applications that using VI felt so foreign to me. I could not understand how anyone could even be efficient with a tool such as VI. Why couldn't I simply use the arrows keys, delete key, etc.? Why must I memorize and use some other combination of keys? In addition, the mode switching also confused me which entails the fact that editing text was not quite the same as reading it.

VI reminded me too much of a word processing program I had used in the late 80's on my home PC (a Tandy if I recall). This made me view VI as being a relic that should no longer be needed in the modern world. Also, at the time (and prior to that) I hated having to learn and memorize command names and was not fully enamored with text-only, console-like environments (Microsoft did a really good job of making me dependent on GUIs and my mouse).

Of course, I no longer hold any of what I now consider to be quite ridiculous and silly attitude and opinions regarding VI. I have completely repented and now understand how deeply wrong I was. Hey, what do you expect from a newbie programmer back then?

Choosing a New Text Editor

Now, which one do I use? Emacs or VI? Truthfully, I really don't know. Just like most things each have their pros and cons.

However, I made my choice and decided to learn to use Emacs. I was swayed by the fact that Emacs as compared with VI has (1) so much more features (although might never use them all ;-) ) and (2) it is extensible. One long running criticism of Emacs (particularly from the VI community) was that it is very slow to load up and run (due to it using its own dialect of Lisp, an interpreted language and, as we all know, interpreted languages tend to be slower than compiled ones). Well, fortunately with modern systems this is no longer the case whatsoever (if anything more modern IDEs like Visual Studio are slow in comparison to Emacs)

Probably the most difficult thing to using Emacs at first will be the same reason why I originally did not like VI: learning all of its essential commands. But now it's different because I want to learn it because I fully understand it's rewards. It will indeed be tough at first but, from what I read, once you do (at least the basic ones) your productivity should start to increase. To me it will be no different than when I first started learning the fantastic Visual Studio add-in, ReSharper, a refactoring tool. With Re#, I made the very deliberate effort to learn the keyboard commands instead of relying on the mouse in order to code faster. Typically, this is now a very common approach I take with most new development tools and applications that I start to use. Learn to use as many key commands as possible.

One quick mention regarding setting up Emacs on Windows. If you want Emacs to be truly installed on your PC (meaning adding it to the Windows registry, adding a shortcut to your start menu, etc.) then I recommend running the file 'addpm.exe' found in the bin folder from the zipped file for Emacs . This is completely optional and you can obviously run and use Emacs without it. However, it does help to integrate it a bit more with your Windows environment.

Emacs and Visual Studio

The book "Pragmatic Programmer" seems to imply that a text editor should be your main IDE. However, primarily being a .NET developer my primary IDE is, of course, Visual Studio. Therefore, I can not truly have Emacs as my primary. If I did, I'd miss out on some the features built into VS such as Intellisense. But, my main problem is really missing out on the sheer power of ReSharper.

But, wait, not all is lost. Believe it or not, as it turns out, Visual Studio actually natively supports changing your key bindings to use Emacs! Now, I potentially might have the best of all worlds: VS + Re# + Emacs. Although VS does not obviously have all of Emacs' features, at least, I can continue to use and develop my Emac specific editing skills. Who knows? Since one of Emacs' greatest assets is extensibility it maybe possible to add some of the Re# features lacking into Emacs itself. (This would imply my learning ELisp but doubt it :-)) Perhaps some add-ins for Emacs already exist and I just have to find them.

It turns out that Emacs is not the only thing that can be supported by VS. In addition, I recall reading earlier this year how Jean-Paul S. Boodhoo started using VI with Visual Studio. (He also has some more recent posts on his experiences particularly VI with ReSharper) This was an early indication to me that perhaps maybe I was wrong about my opinion regarding VI and that it was, at the time, a very "green" programmer like myself just not understanding the power of a development tool and the inefficiencies of relying on a mouse. Perhaps down the road I might give VI a try as well.

I will certainly have future postings on my experiences with Emacs. In the meantime, I have to make sure to avoid "Emacs Pinky".

Tuesday, September 9, 2008

Cryptic Rhino Mock exception messages

Let me first start off by saying that Rhino Mocks is a great mock objects framework for unit testing in .NET and C#. As compared with NMock2, which was my first experience with testing using mock objects, it is far superior (the use of strongly typed method/property names instead of strings is one of its best features especially for TDD and refactoring.) However, there are some aspects of NMock2 that I do miss.

'Expect' Consistency

For starters, NMock2 was more consistent in how the 'Expect' calls are made versus the way Rhino Mocks does it. In NMock2, the use of 'Expects' are the same whether you use a void method or a method that returns a value:
Expect.Once.On(mockFoo).Method("SomeMethodThatReturnsAValue")
Expect.Once.On(mockFoo).Method("SomeVoidMethod")
That is not the case with Rhino Mocks. 'Expects' can only be used with methods that return values and not with void methods.

Recently, a new way of expressing 'Expects' with void methods was added to the Rhino Mocks framework but it relies on 'delegates'. Not sure if I really like the solution. It trades off one form of weak readability for another albeit different one.

This could be yet another reason to turn off newbies from testing with a mock framework such as Rhinos. It can be confusing. It is already quite a difficult endeavor to encourage software developers the virtues of unit testing. It is even more difficult to promote mock object testing so anything to lower the barriers is important and critical.

Understandable Exception Messages

In addition, Rhino Mock exception messages sometimes can be vague and unclear. This can be frustrating for new (and even existing) users.

For example, I was recently working on an old test fixture for a project which uses NMock2 and not Rhinos as its testing framework. To some degree, I felt a bit more productive with and in control of it because the error messaging is a lot more user friendly. I could more quickly determine the cause of a problem.

For example, below is an actual exception I received from NMock2:

NMock2.Internal.ExpectationException: not all expected invocations
were performed
Expected:
1 time: criteria.SetFirstResult(equal to <50>) [called 0 times]
1 time: criteria.SetMaxResults(equal to <5>) [called 0 times]
1 time: criteria.List(any arguments), will return
<System.Collections.Generic.List`1[System.DateTime]> [called 0 times]

Now, here is what I might get from Rhinos:
Rhino.Mocks.Exceptions.ExpectationViolationException:
ICriteria.SetFirstResult(50); Expected #1, Actual #0.
ICriteria.SetMaxResults(5); Expected #1, Actual #0.

Honestly, I like the first one better. It reads better to me. For one thing, Rhinos provides the raw (CLR?) object definition so that if the member is inherited from an interface or another class then it shows as it is defined for interface (i.e. "ICriteria") or the base class . Meanwhile, NMock2 shows the actual local variable name used in the code you are testing (i.e. "criteria"). Much faster to pinpoint the culprit.

In fact, where this really drives me crazy is for the domain objects (i.e. POCOs, business objects, etc.) for that same project. Every domain object inherits from IDomainObject so with Rhino Mocks I get this:
IDomainObject.Description
OK....but, which domain object is it? If I happen to have two or more domain objects being mocked/stubbed in my test it can get really hard figuring out the one it's complaining about. Instead, it would be nice if Rhino provided the following as does NMock2 using the variable name (assuming my domain object is named 'Foo'):
foo.Description
Another example of the disparity between the two frameworks is if a property related exception occurs then the Rhino message would contain this:
IFooView.set_PageSize
while NMock2 would provide this:
_view.PageSize \\ instance variable name
Some would say, "What's the big deal?", "Can't you figure out what it is?", "It only takes a few seconds to know what it is", etc. Well, that is the problem. If my brain has to stop to process what it is, even if it takes a few seconds, then that is slowing me down during my software development process. Multiply those "few" seconds by how many times you get Rhino exceptions like that and it does eat away at your development time. It does add up over time. It is not unlike trying to read code that is not very readable or well factored. Sure, you'll eventually figure out what it does but at the cost of precious dev time.

The following exception message is one I'm fairly certain I have gotten before but always forget because the message is so...well...CRYPTIC!!!!
System.InvalidOperationException: Previous method 'IView.get_ReturnSomeStringValue();' require a return value or an exception to throw.
If you specify the wrong data type in the 'Return' method of an Expect (or LastCall) then the above exception will be thrown. For example, if the method or property is suppose to return a 'string' type value but you instead specify a 'DateTime' type as shown below:

DateTime date = DateTime.Today
Expect.Call(_view.ReturnSomeStringValue).Return(date);
// this will throw an exception
then you will receive the error message mentioned earlier.

Specifying the proper type should fix the problem as follows:

string someStringValue = "some string value";
Expect.Call(_view.ReturnSomeStringValue).Return(someStringValue);
// this is ok
The exception message should really be about checking for strong typing and not the absence or lack of a return value.

Friday, August 29, 2008

"Make Something People Want"

I came across this article in the Wall Street Journal regarding issues that sellers have been having with eBay. This reminded me of a recent conversation I had with another software developer friend of mine regarding how craigslist could be improved with...something else.

If a potential competitor of eBay were to read about its users' woes then it would definitely be a nice blueprint on how to build something better. Why? Because it is "something people want" but are not getting.

I don't know if eBay's problems could necessarily be solved via technology (although some looked to be that way). But, it serves to show that just when you think something has been "solved" then think again. Has online auctioning really been "solved"? Has online "classified ads" been solved? Has "[fill in the blank]" been solved?

Classic example is Google. When they first entered the search engine market everyone believed that search engines have been "solved" and it could not possibly succeed against the more "mature" search engines available at the time...well, we all know how that story ends. But even with Google what it is today, still, you can not assume that search engines have been truly solved. For example, there are folks that do feel it is not solved and therefore are attempting to take on Google. But given Google's own beginnings it does not sound strange at all and should serve as an illustration that it can be done (or at least attempted).

If you do want to build something people actually want then keep some of following things in mind in respect to your competitor:
  • Keep your features list smaller- More features != better product. Most people think this makes for a better product but that is a complete fallacy that has been proven time and again. Keep it simple and focused on features that people actually would use. It does not matter how many features your competitors has in their product. It only makes their product worse and harder to use. Check out the book 'The Inmates Are Running the Asylum' to get a better idea as to why this is true.
  • Make it easier to use- Sounds "easy" but it is not. You really need to cut down on the amount of friction it takes in using your software as compared to your competitor. Make it goal oriented and not task oriented. Check out the previously mentioned book as well as 'Don't Make Me Think!' for infinitely better explanations than mine.
  • Stay smaller and leaner- Your main goal is your product. Try not to worry about building an empire. Do not focus on other unrelated products that you could make. Do not aim to take the whole market because in some markets even as low as a 5% share can be quite lucrative (you might be serving some niche that is not being adequately satisfied with whoever is the current market leader). Do not hire extraneous folks which only serve to drag down your costs. And so forth...
  • Ignore your competitors- Seems like a contradiction to the previous ones but not really. Who cares what your competitor is doing? Care about what they are not doing and what that means for the users.
Finally, as a software developer/engineer/programmer/hacker what can you do? I suggest you start here.

Tuesday, August 26, 2008

Mocks vs Traditional Asserts

I encountered something extremely interesting regarding assertions versus mocks. Traditional assertions (e.g. the NUnit asserts) are typically for "state-based" type testing while mocks are for "interaction-based" type testing.

I find that I am using assertions less and less. I'm not sure if that is a good or bad thing. A few weeks ago, I was working on adding some functionality to one of the domain objects for the project at work which unlike for Controller classes in MVC tend to be more state based testing than interaction based. At least that's what I thought.

Below is a simple method that I needed to test:
// domain object- TaskManager
public IList<Task> Reassign(IList<Task> tasks, string newTeam)
{
foreach (Task task in tasks)
{
task.Team = newTeam;
}

return tasks;
}
The following test uses traditional assertions:
 [Test]
public void CanReassignTasksToNewTeamWithAsserts()
{
TaskManager manager = new TaskManager();

const string oldTeam = "Old Team";
const string newTeam = "New Team";

// setup test data for tasks
Task task;
for (int idx = 0; idx < 3; idx++)
{
task = new Task();
task.Team = oldTeam;
tasks.Add(task);
}

// assert the re-assignment
IList<Task> updatedTasks = manager.Reassign(tasks, newTeam);
foreach (Task updatedTask in updatedTasks)
{
Assert.That(updatedTask.Team, Is.EqualTo(newTeam), "The task's team was not re-assigned.");
}
}

Now here is another test whose intention is to test the exact same thing but using mocks instead:


[Test]
public void CanReassignTasksToNewTeamWithMocks()
{
TaskManager manager = new TaskManager();

const string newTeam = "New Team";

// setup test data for tasks and set expectations
Task task ;
for (int idx = 0; idx < 3; idx++)
{
task = Mocks.CreateMock<Task>();
tasks.Add(task);

// set expectation to assign task to new team
task.Team = newTeam;
LastCall.Repeat.Once();
}

Mocks.ReplayAll();

manager.Reassign(tasks, newTeam);

Mocks.VerifyAll();
}

Guess which one I wrote first? Of course the one with mocks even though I started with the complete intention of doing it with state-based assertions but it quickly morphed to using mocks.

It was really interesting to produce these two tests that are functionally different but accomplish the same goal. They both "fail" if you remove the line:

task.Team = newTeam;

or if you place a different value:

task.Team = "Make this test fail.";

Since they fail by doing either of the above that means both tests are good, valid tests, right?

So, which should I use? Truthfully, the test with mocks is less brittle because you do not need a real instance of the "Task" object. But am I taking it too far? One "problem" I seem to have is that since I have been using mocks for so long my mind is wired to use them for everything (once again, is that a good thing or an anti-pattern?) Basically, when I think about how to test something I immediately think in terms of expectations with dependencies.

Perhaps mocks win out in this situation and in most it is better because true unit testing means that the only real instance of an object is the one that you are trying to test and essentially everything else should be mocked and/or stubbed somehow. Perhaps assertions are best with objects that are not primarily defined by their dependencies and that simply perform complex algorithms that return value types results (for example, static classes and methods). Of course, I could be oversimplifying that but I find it really hard to know when to use plain vanilla assertions.

Well, shortly after stumbling upon this "dilemma" on my own, I then read Martin Fowler's article named Mocks Aren't Stubs and it became much clearer to me what I was doing and why (as it always seem to happen whenever I read any of Fowler's stuff). According to him, I would be classified as a "mockist TDD practitioner".

Honestly, some of the reasons he lists for choosing not to be one (as opposed to a "classical TDD practitioner") are things that I definitely felt on my own especially recently when I was struggling with writing a bunch of mock heavy tests that started to get unwieldy and far more complex than the thing I was actually testing (let's just say it was a weird, dark period in my recent dev efforts that I was really questioning the use of mocks).

The quote below from him is definitely something that I thought to myself off and on for as long as I have been doing "mock testing":

"...A mockist is constantly thinking about how the SUT ["system under test" a.k.a. the object under test] is going to be implemented in order to write the expectations. This feels really unnatural to me..."

Monday, August 25, 2008

Applying good software development practices to VBA and MS Access (and tools, too!)

After working with TSQL almost exclusively for the last six months the time came last month where I had to make code changes to a Microsoft Access front-end application at work.

Based on what I have learned in the past several years or so regarding software development in .NET, I decided to arm myself for tackling legacy code in a platform and language notoriously known for not being ideal for software development.

IDE Tools

I accidentally came across MZ-Tool a plugin for the VB Editor IDE and wow! This made all the difference in the world. It was nice to find a free tool to inject a little ReSharper-ish support into the rigid VBA IDE. The product apparantly started in the VB\VBA world and when .NET came along it was ported to Visual Studio.NET. (It can actually be considered as a direct competitor to Re#. Note that the .NET version is not free.)

Here are the features I was using actively during coding:
  • Replace In All Projects- This was huge in that it allowed me to actually perform the refactoring technique, 'Rename', with various levels of control. What really made it possible is the actual visual tree view of your code where the text you want to change shows up as well as showing what function/sub where it can be find under. I was not afraid to actually change names of things that were either goofy, confusing or antiquated in coding style into much more meaningful names. In the end, I had significantly less fear in breaking code. (This feature is equivalent to Re# 'Find Usages Advanced')
  • Procedure Callers- Similar to Re# 'Show all usage" for a method or variable (and it's also navigable via a visual tree view)
  • Code Templates- This allows me to create and permanently store boilerplate code that I needed for testing code using VBLiteUnit (see next section). This feature opened my eyes to more actively use and create with 'Live Templates' in Re#.
  • Add Procedure- Slick and fast way to create more boiler-plate code for procs/func.
  • Add Error Handler- This was a gift from the heavens. Being able with a simple keystroke drop in code for error handling in any sub/function. Error handling is horrible in VB/VBA so this significantly helped in reducing the pain.
  • Add Module Header and Add Procedure Header- It's just like having GhostDoc. Adding comments is very important in legacy code.
  • Sort Procedures -Helps organize your code. It is equivalent to Re#'s 'File Structure Popup' which I use all the time for the same reasons: drag-n-drop your code to how you see fit.
  • Private Clipboard- This can be big in legacy code and in a language where you do end up with a lot of cutting and pasting. Actually, better than Re#'s version because you can actually control what is stored in it and can retain it for as long as you like within the duration of your session.
  • Review Source Code- A extremely limited version of code analysis in Re#. It only tells you if a variable, constant, procedure is not being used. But good nonetheless to clean up your code and get rid of cruft.
The following features were interesting but less used day to day:
  • Generate XML Documentation- produces a very nice readable xml doc about your code comparable to XML CHM output in VS
  • Statistics- You can actually see how your lines of code is distributed in your code base.
The one thing I wish MZTools had which IMHO I consider to be one of the two most important refactoring techniques is 'Extract Method' ('Rename Method' being the other). If it had that it would be quite the rock solid refactoring tool for VBA. Nonetheless, MZTool just like Re# in Visual Studio has given me control over my code instead of the code controlling me.

Unit Testing

Being a dedicated practitioner of Test Driven Development (TDD), this was important for me to do on a lot levels. After looking at two options I settled on VBLiteUnit because it is extremely lightweight and the ramp up learning time is short especially if you are familiar with any of the other xUnit frameworks (which is exactly what the author intended in creating it hence the word "Lite" in its name. My belief is that the author might have thought that the existing one out there, VBAUnit, was too bulky and cumbersome to use and maintain. Definitely a much more "pragmatic" and "agile" approach in his solution.)

The author decided to use the VB construct 'Select Case' (think 'Switch') with each leg of your 'Case' defining an individual test. Impressively it works really well (and added bonus is that your test descriptions can be more descriptive and natural in tone because it's just string text. Interesting to note as compared with a blog post on Unit Test Naming Conventions written by a developer I used to worked with)

To implement my new changes, as expected, I did have to do some refactoring of the code that required to be touched. This generally resulted in new testable classes but I had TDD with VBLiteUnit (and MZTools) to lead the way.

All in all it was very satisfying to be able to actually do TDD in Access/VBA. It gave me that same feeling that I get when doing TDD in C#. The platform for a moment felt much more real as software development than it ever had before.

Results

Some lessons were learned of course ("evolve or die"). If you have no choices, resources, etc. in a situation no matter how "trivial" then you still attack your problem 100%. Why? Because you not only get to apply and transfer knowledge and techniques to another area of software development but also you might actually learn some new things that you can in turn use later on. For example,
  • I started using the Immediate Window much more now in Visual Studio.
  • I started using features in Re# that I had not used or dismissed before (like the live templates, copy clipboard, etc.)
  • Design patterns can still be used even with what some might consider a "rudimentary" language. For example, I was actually able to implement a variation of the Repository pattern (ala Eric Evan's DDD) that hydrated a domain object and make it work. Hence if you understand design patterns and how to use them then it does not matter what OO language you are using.
However, I can see why VB developers don't know OOP. One good reason is that VB is lacking in some key features of an OOP language. The biggest one in my opinion is that it does not support implementation inheritance. That was frustrating because I was trying to use a technique that Michael Feathers described in his book, Working Effectively with Legacy Code, to test legacy code. That entails creating a seam in your code by inheriting and then overriding an external dependency. Unfortunately, VB simply just enforces the interface contract but does not carry over the implementation code from the base class to its derived ones.

Nonetheless, using VBA with VBUnitLite and MZTools had actually turn out to be a much more gratifying and, yes, may I even say aloud, "fun" experience than I expected. The main reason beyond the more obvious such as being to do TDD was that I was doing so much TSQL just prior that is was like being transported out of the stone ages. The TSQL language and Microsoft's IDE for it (SQL Server Mgt Studio) was so frustrating to use that working back in a programming language designed for applications even as "minor/toy" and with so much stigma as Access VBA was such a breath of fresh air and extremely invigorating (It was like giving a tablespoon of water to a very thirsty person).

TSQL was and still is such a struggle even with TSQLUnit. It has so much limitations when it comes to creating flow logic, reusable code, and testability. (Granted it is a DSL specialized and intended for manipulation of data and its storage structure but I feel like that the SQL language has not really evolved much in its odd 30 years of existence. In areas that it has changed it ends up resembling other non-SQL languages so what's the point? Perhaps it's time for a better replacement language(s)?) In the future, any application developer that I encounter again who says "sprocs are the way to go" will get an earful and maybe more.

TSQL is so painful in its existing form that, believe it or not, I rather be working in VBA than in TSQL if those were my only options!! I know that sounds crazy and shocking but that's how much I prefer not to have to do database development and deal with TSQL. I'll leave it to the folks who actually enjoy it. But sadly it is the bulk of my work these days. (The one thing I do like about it is it's language support for "dynamic SQL" but that is not enough of a motivation for me.)

Who knows? I went from C# --> SQL --> VBA --> SQL in the last six months. If I had gone from C# --> VBA I probably would have a different opinion and experience. Hard to tell in retrospect.

Don't worry I still do C# and have been doing it for the last six months in parallel on side projects and supporting processes at work. Unfortunately, I am back to doing TSQL for the next several months. Aargh!!!

Final Thoughts

What was the point of all of this other than to rant on about a language no one cares about if you consider yourself a "serious" programmer? Well, just like VB6, Microsoft plans on retiring VBA and possibly allowing Office apps to be supported by any.NET language including C#!! Therefore, the tools mentioned above are "dead" in the sense no new development is going be done on them especially that their respective authors have appeared to have moved on from doing any new releases.

IMHO, I think what Microsoft really should do is to allow dynamic languages to support their Office apps. That is where dynamic languages (ex. Ruby, Python, BOO, etc.) seem like such a natural fit and obvious for the nature of that work. Just imagine you can write code very quickly with ever-changing requirements and it does not have to perform fast (as compared with statically typed languages.) Seems like such a "no-brainer". (maybe in the far, not-too-distant future, power users could even be writing MS Excel or Word macros in F#? Just imagine that!)

Sunday, July 27, 2008

Snake bitten by Python (R.I.P. NAnt)

After first experimenting last year with IronPython, the .NET port of Python, I decided to take the full plunge into Python itself leading to another major milestone in my programming career. Why? Because I now have an incredibly handy language, Python, that can superbly manage rudimentary but necessary development tasks. Furthermore, Python has enlightened me to yet another way of thinking about how code can be written.

As a dynamic language, Python can be extremely powerful. It can be used for "glue" tasks like scripting but its potential is even greater. By being an interpreted language, pieces of your code can both be written and tested practically at the same time via its interpreter console (as if the Immediate Window in Visual Studio were the means to simultaneously see how your code works while you are writing it. No "compilation tax".) In addition, Python can do both OOP (e.g. classes, et al) as well as functional programming (e.g. treating functions like data in lists and supporting lambdas like Lisp). Finally, with its leaner and less verbose syntax, less code is written as compared with other static languages.

After a few days to ramp up and get acquainted with the language, I immediately started to implement Python on a few things. NAnt build scripts and Windows bat files were the main targets for Python conversion. I also intend on rewriting in Python a C# .NET console tool that merges the content of multiple files into a single one. It seemed more natural and sensible to use Python for these types of development tasks.

Replacing NAnt with Python is favored since NAnt is an XML-based DSL that might be doing a little too much. The problem is not that it is a DSL, generally a good thing particularly when done right (as the Ant/NAnt folks succeeded quite well in doing to their credit), but the part of it being "XML-based". Who really wants to program all day in XML? Apache Ant, NAnt in the Java world, was a victim of the exploding popularity of XML during the height of the dot com era web applications. XML should be left to do what it does best and what it was originally intended for: basic structured data storage and configuration. Ant (and, subsequently, NAnt) should not have mixed the following two: (1) formatting/organizing data and (2) build flow logic. Not a far cry from violating the principle "separation of concerns".

If I can avoid it, I am finished with NAnt (or any other equivalent build frameworks that rely heavily on XML for its flow logic). If given the choice in a development environment, I probably not opt to use NAnt to handle build scripts. Not that I have anything against NAnt itself, just that better, more programmer friendly alternatives exist. NAnt was (and still is) a great option as compared, say, with the inferior Windows bat files or with the dev shops that manually build their projects via the Visual Studio. Instead, a dynamic language like Python (or BOO or Ruby or whatever else) is preferable to manage this type of work. (A few build automation frameworks do exist written in Python, but I will like to take a look at the promising, .NET born BOO build system.)

NAnt documentation mentions that it has the advantage over native OS shell commands because it is "cross-platform". That might be true, but Python has that area easily covered specifically with its 'os' and 'shutil' modules. Portability is one of Python's key features.

Code generation of other programming languages is another area where I also started to use Python. Database change scripts written in TSQL that are repetitive and voluminous have benefited significantly from using Python (as one example, creating structurally similar 'drop column' statements for 100+ columns). In the future, for other kinds of code generation (e.g. NHibernate mapping files is one example), I will definitely consider Python as a substitute for heavier code-gen tools such as MyGeneration.

The more I use Python, the more I am convinced that it will be employed as my general, all-purpose utility programming language. I intend to use it as a vital supporting player handling the grunt work in my development processes. It does not matter what the primary language happens to be whether C#, TSQL, etc. By and large, I just like how quick and dirty a script can be whipped up to perform some auxiliary task without having to endure the overhead of compiling, creating, and running some executable file. (Who knows? Maybe one day I can work on a major project where Python is the star of the show.) All in all, it is just such a nice clean, readable language making it far more enjoyable to work with as compared with something like NAnt.

To give an idea on how visually different it is to use Python over NAnt, below are the code of two identical build scripts written in each language. This the first script I ported over to Python. The script runs the SQL Server Database Publishing Wizard to generate a file that contains the sql to create the schema of a baseline database required at the start of each development cycle. The following high level tasks are executed by the script:
  • Create Schema Script- Generates the raw initial tsql schema script from target database using the DB pub wiz
  • Convert Script File Encoding- Convert file from Unicode to ASCII
  • Replace Script Values- Read from external csv file containing pairs of strings to replace in script.
  • Checkout File From Source Control - Checkout from Perforce the existing schema file that will be replaced.
  • Copy File To Build Location- Move sql script file to build directory
  • Build Database And Run Unit Tests- Run another separate script (currently written in NAnt) that builds the db and runs the unit tests using TSQLUnit
NAnt Version
<?xml version="1.0"?>
<project name="Generic Database Build" default="BaselineDatabaseCreation"
xmlns="http://nant.sf.net/release/0.85/nant.xsd">
<property name="base.dir" value=".\" overwrite="false" readonly ="false" />
<property name="sourceDB" value="" overwrite="false"/>
<property name="sourceServer" value=".\sqlDev2005" overwrite="false"/>
<property name="dbmsVersion" value="2000" overwrite="false"/>
<property name="connectionString" value="Server=${sourceServer};Database=${sourceDB};Trusted_Connection=True;"/>
<property name="dbBuild.dir" value="" overwrite="false"/>
<property name="targetDB" value="" overwrite="false"/>
<property name="targetServer" value=".\sqlDev2005" overwrite="false"/>
<property name="sqlScriptingTool.dir" value="C:\Program Files\Microsoft SQL Server\90\Tools\Publishing\" overwrite="false"/>
<property name="sqlScript.fileName" value="CreateSchema.sql" overwrite="false"/>
<property name="sqlScript.filePath" value="${path::combine(base.dir, sqlScript.fileName)}" overwrite="false"/>
<property name="sourceControl.filePath" value="${dbBuild.dir}Schema\${sqlScript.fileName}" overwrite="false"/>
<!-- replace values list variable -->
<property name="temp.fileName" value="temp.txt"/>
<property name="temp.filePath" value="${path::combine(base.dir, temp.fileName)}"/>
<property name="replaceValuesList.fileName" value="" overwrite="false"/>
<property name="replaceValuesList.filePath" value="${path::combine(base.dir, replaceValuesList.fileName)}"/>

<target name="BaselineDatabaseCreation" description="Creates baseline database tsql script end-to-end.">
<call target="CreateSchemaScript"/>
<call target="ConvertScriptFileEncoding"/>
<call target="ReplaceScriptValues" unless="${replaceValuesList.fileName==''}"/>
<call target="CheckoutFileFromSourceControl"/>
<call target="CopyNewScriptFileToBuildLocation"/>
<call target="GetSeedTablesData"/>
<call target="BuildDatabaseAndRunUnitTests" unless="${targetDB==''}"/>
</target>
<target name="CreateSchemaScript" description="Generates the raw initial tsql schema script from target database">
<delete file="${sqlScript.filePath}" if="${file::exists(sqlScript.filePath)}" />
<exec program="${sqlScriptingTool.dir}sqlpubwiz">
<arg value="script" />
<arg line="-C ${connectionString}" />
<arg value="${sqlScript.filePath}" />
<arg value="-schemaonly" />
<arg line="-targetserver ${dbmsVersion}" />
<!-- '-f' means overwrite existing files is true -->
<arg value="-f" />
</exec>
<fail message="${sqlScript.filePath} was not created."
unless="${file::exists(sqlScript.filePath)}" />
</target>
<target name="ConvertScriptFileEncoding" description="Convert file from Unicode to ASCII">
<copy file="${sqlScript.filePath}" tofile="${temp.filePath}" outputencoding="ASCII" overwrite="true" />
<move file="${temp.filePath}" tofile="${sqlScript.filePath}" overwrite="true"
unless="${file::exists(replaceValuesList.filePath)}" />
</target>
<target name="ReplaceScriptValues"
description="Read from external csv file containing pairs of strings to replace.">
<fail message="${replaceValuesList.filePath} does not exist."
unless="${file::exists(replaceValuesList.filePath)}" />
<foreach item="Line" in="${replaceValuesList.filePath}" delim="," property="x,y">
<echo message="Replacing '${x}' with '${y}'..." />
<copy file="${temp.filePath}" tofile="${sqlScript.filePath}" overwrite="true">
<filterchain>
<replacestring from="${x}" to="${y}" />
</filterchain>
</copy>
<copy file="${sqlScript.filePath}" tofile="${temp.filePath}" overwrite="true"/>
</foreach>
<move file="${temp.filePath}" tofile="${sqlScript.filePath}" overwrite="true"/>
</target>
<target name="CheckoutFileFromSourceControl" description="Checkout from source control the schema file that will be replaced.">
<fail message="${sourceControl.filePath} does not exist."
unless="${file::exists(sourceControl.filePath)}" />
<p4edit view="${sourceControl.filePath}">
<arg line="-t"/>
<arg line="text+k"/>
</p4edit>
</target>
<target name="CopyNewScriptFileToBuildLocation">
<copy file="${sqlScript.filePath}" tofile="${sourceControl.filePath}" overwrite="true" />
</target>
<target name="GetSeedTablesData">
<!--TODO: Create a separate NAnt build script for this-->
</target>
<target name="BuildDatabaseAndRunUnitTests">
<nant buildfile="GenericDatabase.build" inheritall="false" >
<properties>
<property name="base.dir" value="${dbBuild.dir}"/>
<property name="server" value="${targetServer}" />
<property name="database" value="${targetDB}"/>
<property name="includeUnitTesting" value="true" />
</properties>
</nant>
</target>
</project>

Python Version
import os
import csv
import shutil

sqlscripting_tool=r'C:\Program Files\Microsoft SQL Server\90\Tools\Publishing\sqlpubwiz.exe'
dbms_version = '2000'
connection_string = 'Server=' + source_server + ';Database=' + source_db + ';Trusted_Connection=True;'

sqlscript_filename = 'CreateSchema.sql'
sqlscript_filepath = os.path.join(sqlscript_dir, sqlscript_filename)
source_control_filepath = os.path.join(db_build_dir, sqlscript_filename)

replace_values_filepath = os.path.join(replace_values_dir, 'ReplaceList.csv')
error_found_message = 'Error found!'

def run_script():
"""Run all tasks"""

tasks = [create_schema_script, convert_scriptfile_encoding, replace_script_values,
checkout_file_from_source_control, copy_file_to_build_location, build_database_and_run_unit_tests]
for task in tasks:
print 'Executing \'' + task.func_name + '\'... '
is_successful = task()
if not is_successful:
print 'Script Failure!'
break

if is_successful:
print 'Script Success!'

def create_schema_script():
"""Generates the raw initial tsql schema script from target database"""

if os.path.isfile(sqlscript_filepath):
os.remove(sqlscript_filepath)

args = ['sqlpubwiz', 'script', '-C ' + connection_string, '"' + sqlscript_filepath + '"',
'-schemaonly', '-targetserver ' + dbms_version, '-f']
os.spawnv(os.P_WAIT, sqlscripting_tool, args)

if os.path.isfile(sqlscript_filepath) == False:
print error_found_message
print "File '" + sqlscript_filepath + "' was not created."
return False

return True

def convert_scriptfile_encoding():
"""Convert file from Unicode to ASCII"""

cmd1 = 'type "' + sqlscript_filepath + '" > temp.txt'
cmd2 = 'move temp.txt "' + sqlscript_filepath + '"'
cmds = [cmd1, cmd2]
for cmd in cmds:
dos = os.popen(cmd)
dos.read()
dos.close()

return True

def replace_script_values():
""" Read from external csv file containing pairs of strings to replace values in sql script. """

# if 'replace values' list not provided then assume not needed
if replace_values_filepath == '':
return False

# check for 'replace values' file existence
if os.path.isfile(replace_values_filepath) == False:
print error_found_message
print "Replace values list file '" + replace_values_filepath + "' does not exist."
return False

# modify file content with new values
f = open(sqlscript_filepath, 'r')
text = f.read()
replace_values = csv.reader(open(replace_values_filepath, 'r'))
for row in replace_values:
find_text = row[0]
replace_with_text = row[1]
text = text.replace(find_text, replace_with_text)
f.close()

# write to script file with new values
f = open(sqlscript_filepath, 'w')
f.write(text)
f.close()

return True

def checkout_file_from_source_control():
""" Checkout from source control the schema file that will be replaced. """

# look for source control file
if os.path.isfile(source_control_filepath) == False:
print error_found_message
print "Source control file '" + source_control_filepath + "' does not exist."
return False

# checkout file (note: could use PyPerforce API framework instead)
cmd = 'p4 edit -t text+k ' + source_control_filepath + ''
p4 = os.popen(cmd)
p4.read()
p4.close()

return True

def copy_file_to_build_location():
""" Move sql script file to build directory """

shutil.copy(sqlscript_filename, source_control_filepath)

return True

def build_database_and_run_unit_tests():
""" Build database and validate schema by running unit tests """

nant_tool = os.path.join(base_dir, 'Tools\\NAnt\\bin\\', 'NAnt.exe')
build_script_filepath = os.path.join(base_dir, 'Projects\\Libs\\Utils\\NAntScripts\\DatabaseBuilds\\', 'GenericDatabase.build')
build_dir = os.path.split(os.path.normpath(db_build_dir))[0] # hack: need to remove 'Schema' folder; todo: need to remove this from generic db build script

# todo: replace NAnt script with Python script
args = ['NAnt', '-buildfile:' + build_script_filepath, '-D:base.dir=' + build_dir,'-D:server=' + target_server,
'-D:database=' + target_db, '-D:installUnitTesting=' + 'true', ]
os.spawnv(os.P_WAIT, nant_tool, args)

return True

run_script()