Wednesday, September 24, 2008

Comprehending List Comprehensions

Since using Python to write build scripts (as well as for code generation) to support my development process I have come to increasingly learn and appreciate the power of the Python language. A recent coding situation demonstrated to me how understanding alternatives available in a multi-paradigm language such as Python can amplify the limitations of another language such as C# and can have a real influence on how you think about and write code.

Building working code first...

In my current project, the database for the system is maintained under source control. In the directory of my project's database files, a sub directory exists named 'CurrentReleaseOnly' which contains database unit tests written using the TSQLUnit framework. The purpose of this folder is to segregate out tests that are really just "one time only" with no intention to retain once the db schema changes are implemented in production [1]. In addition, the sub directory contains a plain old text file named 'README.TXT" which serves to explain why the folder exists to other developers working on the project.

Let's say in that folder, 'CurrentReleaseOnly', I have 4 files, three of which are unit test sprocs and one is that 'readme' file:
  1. ut_uspVerifyDroppedColumns.sql
  2. ut_uspVerifyDroppedTable.sql
  3. ut_uspVerifyArchivedData.sql
Since the project files are maintained under Perforce (P4) [2], one of the project maintenance scripts needs to permanently delete all files within that folder from the source code repository with one exception of the aforementioned 'readme' file. In this example, that would imply deleting files # 1 through 3 but keeping # 4. Obviously, from one development cycle to another the number of files eligible for deletion would vary but only one would always be retained (i.e. # 4).

The command line syntax in P4 to open a file for delete is the following:
p4 delete file1.txt file2.txt file3.txt
The goal is to output and execute the above command which can be acheived with, the following was what was originally coded to achieve this action:

def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
for f in files:
cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd

# execute 'open for delete' in source control
p4 = os.popen(cmd)

return True
Quite simply, a list object is first populated with the names of all the files in the directory. Then a loop through each file name is performed incrementally building the P4 command text. Eventually, the output for the command text should look like this:
p4 delete ut_uspVerifyDroppedColumns.sql README.TXT
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
However, as evident, it will also delete the 'readme' file which, if you recall, needs to remain to document the use of the directory. To make this happen, the following conditional statement was added to the loop:
for f in files:
if f == exclude_file:

cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd
As a result, the P4 output changes to now exclude the 'readme' file:
p4 delete ut_uspVerifyDroppedColumns.sql
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
We now have achieved our desired output and it actually works. All is good except...

Implementing List Comprehensions

Now, you are thinking "So what? What is the big deal? This is rudimentary programming that any four-year-old can do." Yes of course. However, I kept thinking that this was not very "Pythonic". Python is all about manipulating lists in an efficient and concise manner.

I immediately went back and re-read some more about list comprehensions. Armed with a better grasp of this style of programming, the code was altered to now implement this alternative way of building the same P4 command:
import os
def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
delete_files = [(delete_files_dir + os.sep + f) for f in all_files \
if f != exclude_file]
cmd = 'p4 delete ' + ' '.join(delete_files)

# open for delete in source control
p4 = os.popen(cmd)

return True
By using Python's implementation of list comprehension, it allowed the building of a new list by filtering out the unneeded items based on some defined criteria. After the new filtered list is created the final p4 command text is generated using the join method of a single space (" ") string.

List comprehensions also support not just filtering but also repeatedly applying the same function against each item in a list. This is considered a variation and more shorthand way of implementing the "map" function commonly found in functional languages. Python does indeed explicitly support the classic functional programming functions of 'map', 'reduce', and 'filter' but it's list comprehensions are an even more "concise" way of implementing map and filter[3].

If you were not impressed with the previous filtering example then here is another more trivial example of applying a 'map' function using list comprehensions. This time the file extension is stripped out from each file name contained within a given list [4]:
>> files = ['ut_uspVerifyDroppedColumns.sql',  'ut_uspVerifyDroppedTable.sql',
'ut_uspVerifyArchivedData.sql', 'README.TXT']
>> print [f[:-4] for f in files] # remove file extension using list comprehensions
['ut_uspVerifyDroppedColumns', 'ut_uspVerifyDroppedTable',
'ut_uspVerifyArchivedData', 'README']
Comparisons to SQL

What really struck me about list comprehensions is how much it reminded me of the ubiquitous database language, SQL. Given my long experience with querying and data manipulation against sql databases I found Python's use and style of list comprehensions to be a much more interesting and maybe even more powerful. Shortly after noting the similarities I subsequently read that list comprehensions were even considered for database querying:
"Comprehensions were proposed as a query notation for databases and were implemented in the Kleisli database query language"
I plan to write a much longer post on my opinions regarding the future of SQL as a language but until then I will say the following. LINQ is a great attempt to bake into the C# language actual data querying features but with one major flaw. It still adopted the SQL syntax in the process which really does need its own makeover (or better yet a replacement).

I'm sure the main reason for Microsoft's decision to closely model LINQ after SQL was to give .NET developers something they were already deeply familiar with and thereby more apt to use it. However, if Microsoft had perhaps used something resembling list comprehensions instead of SQLish syntax it might have made C# an even more powerful language by baking a more initutive and compact syntax [5]

[1] If you are someone who is a TDD practitioner (like myself) you might be shouting: "How can you be throwing away unit tests after writing them? That is insanity and completely violates the the very essence of TDD!!!!". Yes, but just like any other methodology, the principles of TDD should not always be followed blindly and adhered to strictly. Sometimes, exceptions have to be made.

In this particular instance, the reasons for dropping certain unit tests after a period of time were primarily due to performance. Since tests against a database tend to be slower than more traditional unit tests found in app code, I decided that after each production release any tests that are no longer valuable beyond the next release would be purged from the project's code base.

For example, unit tests that are used to test such things as "one time only" migration of data from one database to another or tests that simplistically check for the existence (or, conversely, the non-existence) of db objects like columns or tables are good candidates for permanent removal from the suite of db tests. On the other hand, unit tests that assert and validate some complicated logic in stored procedures as an example would be kept and not be removed. Regardless, this is not point of this blog post so I digress...

[2] Admittedly, I would rather be using the significantly less intrusive Subversion.

[3] For some reason, the equivalent of 'reduce' is not supported by Python's list comprehensions. Perhaps it is because the creator of Python was not a fan of map/reduce/filter since the time of its inclusion into the Python language. He especially seems to have a dislike for 'reduce'.

[4] For improved readability, I could have defined a separate function stating more explicitly what exactly it was doing without resorting to using comments (which tend to be a 'code smell'):

def remove_file_extension(f): return f[:-4]
print [remove_file_extension(f) for f in files]
Or I could have used a lambda function for equal effect:
remove_file_extension = lambda f : f[:-4]
print [remove_file_extension(f) for f in files]

[5] Actually, as I recently discovered C# does support map, reduce, and filter as of version 3.0. (respectively, "Enumerable.Select", "Enumerable.Aggregate", and "Enumerable.Where") Not quite list comprehensions but definitely a huge lift for the language. In addition, my understanding is that F# being a functional language does support list comprehensions beyond the standard map/reduce/filter.

Saturday, September 13, 2008

Who needs a good text editor? I write perfect code

The first time I read Pragmatic Programmer (a future classic) it strongly emphasized that programmers choose one good text editor and learn it well. This got me thinking about how important a good text editor is if you write code for a living. I am of the school of thought that code is design. I want to increase the speed by which I write code so as to match my thoughts. Code is the concrete extensions of my thoughts on how to implement software. This means that code should be easy to manipulate and thereby be malleable and fluid in nature. Therefore, it makes perfect sense to me why the book discusses the virtues of using a good editor.

At the time of my first reading PP, my text editors were Visual Studio (the default IDE for .NET developers such as myself) along with the plain vanilla Notepad. Inspired by PP, I went on to use Notepad2 and then eventually moved on to the more robust and extensible Notepad++. However, I recently re-read PP because it is one of those book you need to keep referring back to make certain you are headed down the right path as a programmer. (Also, you tend to miss out on tidbits of good info due to faulty memory.) This time around I noted that one of the text editors they recommended was Emacs.

What is Emacs (and VI)?

After researching about Emacs, I immediately got the impression that this is one of the text editors that serious, hardcore programmers, may, I dare say, hackers use. If you want to become one of those (or at least aspire to) then you might as well use what those individuals use because they obviously must know something, right? Emacs has been around since the 1970's making it one of the oldest text editors with active development still current as of today (2008). So, once again, something must be good about it, right?

I noticed that another editor was consistently being referenced on most things I read about Emacs. That other text editor was VI (or VIM). Not surprisingly, just like a lot of things in the world of programming a long running rivalry exists between Emacs and VI. Now, unlike Emacs, I was, surprisingly, already familiar with VI since I had learned to use it in a technical class I took early in my career. Guess what? At the time, I hated it.

My dislike for VI centered on the fact that I was weaned on more modern text editors such as Notepad and other Windows applications that using VI felt so foreign to me. I could not understand how anyone could even be efficient with a tool such as VI. Why couldn't I simply use the arrows keys, delete key, etc.? Why must I memorize and use some other combination of keys? In addition, the mode switching also confused me which entails the fact that editing text was not quite the same as reading it.

VI reminded me too much of a word processing program I had used in the late 80's on my home PC (a Tandy if I recall). This made me view VI as being a relic that should no longer be needed in the modern world. Also, at the time (and prior to that) I hated having to learn and memorize command names and was not fully enamored with text-only, console-like environments (Microsoft did a really good job of making me dependent on GUIs and my mouse).

Of course, I no longer hold any of what I now consider to be quite ridiculous and silly attitude and opinions regarding VI. I have completely repented and now understand how deeply wrong I was. Hey, what do you expect from a newbie programmer back then?

Choosing a New Text Editor

Now, which one do I use? Emacs or VI? Truthfully, I really don't know. Just like most things each have their pros and cons.

However, I made my choice and decided to learn to use Emacs. I was swayed by the fact that Emacs as compared with VI has (1) so much more features (although might never use them all ;-) ) and (2) it is extensible. One long running criticism of Emacs (particularly from the VI community) was that it is very slow to load up and run (due to it using its own dialect of Lisp, an interpreted language and, as we all know, interpreted languages tend to be slower than compiled ones). Well, fortunately with modern systems this is no longer the case whatsoever (if anything more modern IDEs like Visual Studio are slow in comparison to Emacs)

Probably the most difficult thing to using Emacs at first will be the same reason why I originally did not like VI: learning all of its essential commands. But now it's different because I want to learn it because I fully understand it's rewards. It will indeed be tough at first but, from what I read, once you do (at least the basic ones) your productivity should start to increase. To me it will be no different than when I first started learning the fantastic Visual Studio add-in, ReSharper, a refactoring tool. With Re#, I made the very deliberate effort to learn the keyboard commands instead of relying on the mouse in order to code faster. Typically, this is now a very common approach I take with most new development tools and applications that I start to use. Learn to use as many key commands as possible.

One quick mention regarding setting up Emacs on Windows. If you want Emacs to be truly installed on your PC (meaning adding it to the Windows registry, adding a shortcut to your start menu, etc.) then I recommend running the file 'addpm.exe' found in the bin folder from the zipped file for Emacs . This is completely optional and you can obviously run and use Emacs without it. However, it does help to integrate it a bit more with your Windows environment.

Emacs and Visual Studio

The book "Pragmatic Programmer" seems to imply that a text editor should be your main IDE. However, primarily being a .NET developer my primary IDE is, of course, Visual Studio. Therefore, I can not truly have Emacs as my primary. If I did, I'd miss out on some the features built into VS such as Intellisense. But, my main problem is really missing out on the sheer power of ReSharper.

But, wait, not all is lost. Believe it or not, as it turns out, Visual Studio actually natively supports changing your key bindings to use Emacs! Now, I potentially might have the best of all worlds: VS + Re# + Emacs. Although VS does not obviously have all of Emacs' features, at least, I can continue to use and develop my Emac specific editing skills. Who knows? Since one of Emacs' greatest assets is extensibility it maybe possible to add some of the Re# features lacking into Emacs itself. (This would imply my learning ELisp but doubt it :-)) Perhaps some add-ins for Emacs already exist and I just have to find them.

It turns out that Emacs is not the only thing that can be supported by VS. In addition, I recall reading earlier this year how Jean-Paul S. Boodhoo started using VI with Visual Studio. (He also has some more recent posts on his experiences particularly VI with ReSharper) This was an early indication to me that perhaps maybe I was wrong about my opinion regarding VI and that it was, at the time, a very "green" programmer like myself just not understanding the power of a development tool and the inefficiencies of relying on a mouse. Perhaps down the road I might give VI a try as well.

I will certainly have future postings on my experiences with Emacs. In the meantime, I have to make sure to avoid "Emacs Pinky".

Tuesday, September 9, 2008

Cryptic Rhino Mock exception messages

Let me first start off by saying that Rhino Mocks is a great mock objects framework for unit testing in .NET and C#. As compared with NMock2, which was my first experience with testing using mock objects, it is far superior (the use of strongly typed method/property names instead of strings is one of its best features especially for TDD and refactoring.) However, there are some aspects of NMock2 that I do miss.

'Expect' Consistency

For starters, NMock2 was more consistent in how the 'Expect' calls are made versus the way Rhino Mocks does it. In NMock2, the use of 'Expects' are the same whether you use a void method or a method that returns a value:
That is not the case with Rhino Mocks. 'Expects' can only be used with methods that return values and not with void methods.

Recently, a new way of expressing 'Expects' with void methods was added to the Rhino Mocks framework but it relies on 'delegates'. Not sure if I really like the solution. It trades off one form of weak readability for another albeit different one.

This could be yet another reason to turn off newbies from testing with a mock framework such as Rhinos. It can be confusing. It is already quite a difficult endeavor to encourage software developers the virtues of unit testing. It is even more difficult to promote mock object testing so anything to lower the barriers is important and critical.

Understandable Exception Messages

In addition, Rhino Mock exception messages sometimes can be vague and unclear. This can be frustrating for new (and even existing) users.

For example, I was recently working on an old test fixture for a project which uses NMock2 and not Rhinos as its testing framework. To some degree, I felt a bit more productive with and in control of it because the error messaging is a lot more user friendly. I could more quickly determine the cause of a problem.

For example, below is an actual exception I received from NMock2:

NMock2.Internal.ExpectationException: not all expected invocations
were performed
1 time: criteria.SetFirstResult(equal to <50>) [called 0 times]
1 time: criteria.SetMaxResults(equal to <5>) [called 0 times]
1 time: criteria.List(any arguments), will return
<System.Collections.Generic.List`1[System.DateTime]> [called 0 times]

Now, here is what I might get from Rhinos:
ICriteria.SetFirstResult(50); Expected #1, Actual #0.
ICriteria.SetMaxResults(5); Expected #1, Actual #0.

Honestly, I like the first one better. It reads better to me. For one thing, Rhinos provides the raw (CLR?) object definition so that if the member is inherited from an interface or another class then it shows as it is defined for interface (i.e. "ICriteria") or the base class . Meanwhile, NMock2 shows the actual local variable name used in the code you are testing (i.e. "criteria"). Much faster to pinpoint the culprit.

In fact, where this really drives me crazy is for the domain objects (i.e. POCOs, business objects, etc.) for that same project. Every domain object inherits from IDomainObject so with Rhino Mocks I get this:
OK....but, which domain object is it? If I happen to have two or more domain objects being mocked/stubbed in my test it can get really hard figuring out the one it's complaining about. Instead, it would be nice if Rhino provided the following as does NMock2 using the variable name (assuming my domain object is named 'Foo'):
Another example of the disparity between the two frameworks is if a property related exception occurs then the Rhino message would contain this:
while NMock2 would provide this:
_view.PageSize \\ instance variable name
Some would say, "What's the big deal?", "Can't you figure out what it is?", "It only takes a few seconds to know what it is", etc. Well, that is the problem. If my brain has to stop to process what it is, even if it takes a few seconds, then that is slowing me down during my software development process. Multiply those "few" seconds by how many times you get Rhino exceptions like that and it does eat away at your development time. It does add up over time. It is not unlike trying to read code that is not very readable or well factored. Sure, you'll eventually figure out what it does but at the cost of precious dev time.

The following exception message is one I'm fairly certain I have gotten before but always forget because the message is so...well...CRYPTIC!!!!
System.InvalidOperationException: Previous method 'IView.get_ReturnSomeStringValue();' require a return value or an exception to throw.
If you specify the wrong data type in the 'Return' method of an Expect (or LastCall) then the above exception will be thrown. For example, if the method or property is suppose to return a 'string' type value but you instead specify a 'DateTime' type as shown below:

DateTime date = DateTime.Today
// this will throw an exception
then you will receive the error message mentioned earlier.

Specifying the proper type should fix the problem as follows:

string someStringValue = "some string value";
// this is ok
The exception message should really be about checking for strong typing and not the absence or lack of a return value.