Wednesday, September 24, 2008

Comprehending List Comprehensions

Since using Python to write build scripts (as well as for code generation) to support my development process I have come to increasingly learn and appreciate the power of the Python language. A recent coding situation demonstrated to me how understanding alternatives available in a multi-paradigm language such as Python can amplify the limitations of another language such as C# and can have a real influence on how you think about and write code.

Building working code first...

In my current project, the database for the system is maintained under source control. In the directory of my project's database files, a sub directory exists named 'CurrentReleaseOnly' which contains database unit tests written using the TSQLUnit framework. The purpose of this folder is to segregate out tests that are really just "one time only" with no intention to retain once the db schema changes are implemented in production [1]. In addition, the sub directory contains a plain old text file named 'README.TXT" which serves to explain why the folder exists to other developers working on the project.

Let's say in that folder, 'CurrentReleaseOnly', I have 4 files, three of which are unit test sprocs and one is that 'readme' file:
  1. ut_uspVerifyDroppedColumns.sql
  2. ut_uspVerifyDroppedTable.sql
  3. ut_uspVerifyArchivedData.sql
  4. README.TXT
Since the project files are maintained under Perforce (P4) [2], one of the project maintenance scripts needs to permanently delete all files within that folder from the source code repository with one exception of the aforementioned 'readme' file. In this example, that would imply deleting files # 1 through 3 but keeping # 4. Obviously, from one development cycle to another the number of files eligible for deletion would vary but only one would always be retained (i.e. # 4).

The command line syntax in P4 to open a file for delete is the following:
p4 delete file1.txt file2.txt file3.txt
The goal is to output and execute the above command which can be acheived with, the following was what was originally coded to achieve this action:

def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
for f in files:
cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd


# execute 'open for delete' in source control
p4 = os.popen(cmd)
p4.read()
p4.close()

return True
Quite simply, a list object is first populated with the names of all the files in the directory. Then a loop through each file name is performed incrementally building the P4 command text. Eventually, the output for the command text should look like this:
p4 delete ut_uspVerifyDroppedColumns.sql README.TXT
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
However, as evident, it will also delete the 'readme' file which, if you recall, needs to remain to document the use of the directory. To make this happen, the following conditional statement was added to the loop:
...
for f in files:
if f == exclude_file:
continue

cmd = cmd + f + ' '

cmd = 'p4 delete ' + cmd
...
As a result, the P4 output changes to now exclude the 'readme' file:
p4 delete ut_uspVerifyDroppedColumns.sql
ut_uspVerifyDroppedTable.sql ut_uspVerifyArchivedData.sql
We now have achieved our desired output and it actually works. All is good except...

Implementing List Comprehensions

Now, you are thinking "So what? What is the big deal? This is rudimentary programming that any four-year-old can do." Yes of course. However, I kept thinking that this was not very "Pythonic". Python is all about manipulating lists in an efficient and concise manner.

I immediately went back and re-read some more about list comprehensions. Armed with a better grasp of this style of programming, the code was altered to now implement this alternative way of building the same P4 command:
import os
...
def open_for_delete_unit_tests_from_previous_release():
""" Open for delete in Perforce unit tests from previous release """

# find files to open for delete
exclude_file = 'README.TXT'
delete_files_dir = os.path.join(unit_test_dir, 'CurrentReleaseOnly')

# build delete command text
all_files = os.listdir(delete_files_dir)
delete_files = [(delete_files_dir + os.sep + f) for f in all_files \
if f != exclude_file]
cmd = 'p4 delete ' + ' '.join(delete_files)

# open for delete in source control
p4 = os.popen(cmd)
p4.read()
p4.close()

return True
By using Python's implementation of list comprehension, it allowed the building of a new list by filtering out the unneeded items based on some defined criteria. After the new filtered list is created the final p4 command text is generated using the join method of a single space (" ") string.

List comprehensions also support not just filtering but also repeatedly applying the same function against each item in a list. This is considered a variation and more shorthand way of implementing the "map" function commonly found in functional languages. Python does indeed explicitly support the classic functional programming functions of 'map', 'reduce', and 'filter' but it's list comprehensions are an even more "concise" way of implementing map and filter[3].

If you were not impressed with the previous filtering example then here is another more trivial example of applying a 'map' function using list comprehensions. This time the file extension is stripped out from each file name contained within a given list [4]:
>> files = ['ut_uspVerifyDroppedColumns.sql',  'ut_uspVerifyDroppedTable.sql',
'ut_uspVerifyArchivedData.sql', 'README.TXT']
>> print [f[:-4] for f in files] # remove file extension using list comprehensions
['ut_uspVerifyDroppedColumns', 'ut_uspVerifyDroppedTable',
'ut_uspVerifyArchivedData', 'README']
Comparisons to SQL

What really struck me about list comprehensions is how much it reminded me of the ubiquitous database language, SQL. Given my long experience with querying and data manipulation against sql databases I found Python's use and style of list comprehensions to be a much more interesting and maybe even more powerful. Shortly after noting the similarities I subsequently read that list comprehensions were even considered for database querying:
"Comprehensions were proposed as a query notation for databases and were implemented in the Kleisli database query language"
I plan to write a much longer post on my opinions regarding the future of SQL as a language but until then I will say the following. LINQ is a great attempt to bake into the C# language actual data querying features but with one major flaw. It still adopted the SQL syntax in the process which really does need its own makeover (or better yet a replacement).

I'm sure the main reason for Microsoft's decision to closely model LINQ after SQL was to give .NET developers something they were already deeply familiar with and thereby more apt to use it. However, if Microsoft had perhaps used something resembling list comprehensions instead of SQLish syntax it might have made C# an even more powerful language by baking a more initutive and compact syntax [5]
...


[1] If you are someone who is a TDD practitioner (like myself) you might be shouting: "How can you be throwing away unit tests after writing them? That is insanity and completely violates the the very essence of TDD!!!!". Yes, but just like any other methodology, the principles of TDD should not always be followed blindly and adhered to strictly. Sometimes, exceptions have to be made.

In this particular instance, the reasons for dropping certain unit tests after a period of time were primarily due to performance. Since tests against a database tend to be slower than more traditional unit tests found in app code, I decided that after each production release any tests that are no longer valuable beyond the next release would be purged from the project's code base.

For example, unit tests that are used to test such things as "one time only" migration of data from one database to another or tests that simplistically check for the existence (or, conversely, the non-existence) of db objects like columns or tables are good candidates for permanent removal from the suite of db tests. On the other hand, unit tests that assert and validate some complicated logic in stored procedures as an example would be kept and not be removed. Regardless, this is not point of this blog post so I digress...


[2] Admittedly, I would rather be using the significantly less intrusive Subversion.


[3] For some reason, the equivalent of 'reduce' is not supported by Python's list comprehensions. Perhaps it is because the creator of Python was not a fan of map/reduce/filter since the time of its inclusion into the Python language. He especially seems to have a dislike for 'reduce'.


[4] For improved readability, I could have defined a separate function stating more explicitly what exactly it was doing without resorting to using comments (which tend to be a 'code smell'):

def remove_file_extension(f): return f[:-4]
print [remove_file_extension(f) for f in files]
Or I could have used a lambda function for equal effect:
remove_file_extension = lambda f : f[:-4]
print [remove_file_extension(f) for f in files]

[5] Actually, as I recently discovered C# does support map, reduce, and filter as of version 3.0. (respectively, "Enumerable.Select", "Enumerable.Aggregate", and "Enumerable.Where") Not quite list comprehensions but definitely a huge lift for the language. In addition, my understanding is that F# being a functional language does support list comprehensions beyond the standard map/reduce/filter.

No comments: