Sunday, November 9, 2008

Relax and "unwind" with recursion

I was recently looking to change some values in test data that was scripted in TSQL insert statements that I was not too fond of. This data is loaded on development and test builds of a database for a system that I work on.

The data changes center around some arbitrary ID numbers. These values are not system generated but can be considered to be natural primary keys since they are defined externally outside the system and they are used and referenced by the end-users signifying meaning in the business domain. However, for development and testing purposes, the values can be anything.

At first, the numbers reflected actual numbers that preexisted outside the testing and dev environments but I found them too difficult to remember when testing the application manually or when defining automated unit test case scenarios. Therefore, I decided on replacing the existing ID numbers with a sequence of numbers that I can recall more easily:
1234567890, 2345678901, 3456789012, ..., 0123456789
I figured this list of numbers would be easier to remember on the fly rather than if they were randomly generated. You can see an obvious pattern here hence lending themselves well to be committed to (human) memory. The initial number in the list is a universally simple value of 1234567890. Each subsequent number is the same except that the first digit is moved from the first position to the last position as compared to the preceding value in the list. For example, the number '1' in 1234567890 gets relocated to the end to become 2345678901.

I decided to write a simple Python script to update all of the data sql files with these new numbers. Now, I could have quite obviously created the list manually in a very short time in my script such as:
new_id_numbers = [1234567890, 2345678901, 3456789012, ..., 0123456789]
However, I always like to find opportunities to challenge my programming abilities even in some inconsequential situations such as this one. The challenge I created for myself was to see how I can programmatically generate this exact same list. Evidently, a pattern exists with these numbers implying code can be written to produce this particular set of values without the need to manually type them out.

My first impulse was that I could write some kind of iterative loop to generate the list:
# iteration version
def get_swapped_numbers(numbers):
first = numbers[0]
for item in numbers:
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return numbers

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
It works. Nothing unusual here. However, I recognized that this particular numerical pattern is indeed recursive in that each value in the list is dependent on the previous value and that recursion could be used to solve this as well. Like probably most programmers, I have never used recursion in actual code but it has been one of those fundamental concepts in computer science that I felt that I needed to better understand and explore. As a result, this simple pattern provided an excellent opportunity to flex my recursive muscles.

After mulling over for some time on how to implement a recursive function for this specific problem here is what I eventually churned out:
# recursion version
def get_swapped_numbers(numbers):
first = numbers[0]
if len(numbers) == len(first) or len(first) == 0:
return numbers
last = numbers[-1]
new = last[1:] + last[:1]
numbers.append(new)
return get_swapped_numbers(numbers) # recursion!!

>> initial_value_list = ['22566']
>> print get_swapped_numbers(initial_value_list)
['22566', '25662', '56622', '66225', '62256']
To build your recursive function, the most important piece is that somewhere in the body of the function the function must call itself by name:
    return get_swapped_numbers(numbers) # recursion!!
Without it, well...you just have a plain ol' vanilla function.

The other very important piece of a recursive function is to include some kind of a conditional statement that acts like a guard clause which is commonly referred to as the 'base case' in recursion lingo. This base case will cause your recursive calls to stop and begin to "unwind" itself as it spirals back up to the first initial call. If a base case is not included then you run the risk of unleashing an ugly infinite loop.

The base case is really no different than the break/return point in the iteration example:
    if len(numbers) == len(first) or len(first) == 0:
return numbers
In fact, any recursive function can be written and expressed as an iterative loop. This is probably why most programmers do not use recursion since you can achieve the same results using more familiar, day-to-day techniques. In addition, traditionally in most languages, recursion tends to perform slower than their loop counterpart.

So, why even use recursion when iteration can suffice? The real benefit is that sometimes a recursive function can end up being more readable and clearer in intent than an iterative function (although in my example it is a wash in this aspect). It also seems that certain types of recursion (e.g. tail recursion) are optimized by compilers so it could truly provide better performance with the added bonus of improved readability.

Truthfully, writing the loop was a lot easier than the recursion version. I am uncertain that the reason is because writing loops are so ingrained and second nature to me having created so many over the years in code coupled with the fact that this was my real first attempt to implement a true recursive function. Admittedly, it was a bit trippy mentally working through each function call to ensure avoiding the dreaded infinite loop. However, the same could be said when I first learned writing loops many years ago. With more time and practice, I can probably create recursive function without exerting any more thought than I would with creating a loop. All in all, even if I never again use recursion in my code, it is still an important technique for programmers to understand and recognize especially if encountered in code you did not write.

2 comments:

Unknown said...

You said 'Base Case', I don't think I've heard that since college.

Reminds me of a question we used to ask candidates interviewing for C++ game dev positions, which is faster iteration or recursion?

I tend to use recursion over iteration, usually because its clearer, but more often because its shorter to write. IMO, you can always come back and optimize a recursive function and make it iterative after profiling.

Ray Vega said...

Good point about optimization after the fact.