Monday, July 26, 2010

Debugging JavaScript memory leak in IE

I was recently working on one of the most difficult bugs in all of my years of programming. It involved a memory leak in IE8 caused by a couple of ASP.NET pages on a user's machine.

The first indication of the problem was the iexplore.exe process in the Windows Task Manager. Intermittently watching the 'Mem Usage' column on the 'Processes' tab, the memory usage reached over 2 gigabytes thereby significantly slowly down the user's machine. The rate of increase was about 9 MB every 30 seconds.

As with most web app bugs in IE, I checked to see if the problem was reproducible in Firefox. No leak. Next, I used two tools to help further identify the leaks: JavaScript Memory Leak Detector (IEJSLeaksDetector) and sIEve. Surprisingly, neither one indicated any memory leaks. This is when I knew this was not going to be easy troubleshooting much less fixing.

I tried running the app under IE7 compatibility view mode in IE8 but again no leak. Being skeptical of the reliability of this mode, I decided to try a true install of IE7 itself which required removing IE8 from the user's machine. Alas no leak. I also attempted from a different machine to connect to the web server running on that same user's machine to view the app (yes, one of the web app's features is to run as a local instance). Again no leak.

The problem web pages perform Ajax callbacks. Looking at the code, they were set to trigger the calls every 30 seconds. This was consistent with what was observed earlier with the rate of increase of memory usage. Naively, I thought could the ASP.NET Ajax Timer Control be broken?

I kept suspecting Microsoft's Ajax Timer Control which was helping to display a "Processing..." progress image to the user every 30 seconds. My ill-conceived rationale was that perhaps the code was using an old version of the Timer Control since the app was running on ASP.NET 2.0 and the code was originally written around the time when the ASP.NET AJAX framework was originally released.

Unfortunately, I temporarily fell under the spell of believing "Select is broken". This disillusion became painfully clear when I later confirmed that the specific Ajax framework was indeed the latest ASP.NET 3.5 web extensions for 2.0. Just to be sure, I created a standalone dummy web app with the same library that provides the Timer Control. Once again, as has been the pattern thus far: no memory leak.

More detailed information was needed. On the suggestion from another developer on the project, configured Task Manager with:
'Processes' tab -> 'View' -> 'Select Columns'
and added the following columns:
'Handles'
    'Threads'
    'USER Objects'
    'GDI Objects'  
Again, closely monitoring any changes, I noted that the GDI objects counts were increasing by 7~8 units every 30 seconds meaning this leak must somehow be graphics display related. Perhaps some image (or images) was not being removed or repeatedly added on each Ajax server request? (Possibly the aforementioned progress image?)

To further investigate the GDI objects findings, I used a few tools. Neither GDIUsage nor GDIObj provided any useful metrics (It's possible I might not have fully understood how to best use these.) However, GDIView allowed me to do snapshots of the memory usages of the GDI objects.

I manually copied the initial output displayed in GDIView and pasted that data into a text file. I subsequently waited until the data refreshed 30 seconds later and then copied that output into a separate text file. Finally, I performed a diff on the two files and it reconfirmed what was shown in Task Manager. The following text was the diff results:
Handle     Object  Kernel
  Type    Address  Extended Information
---------------------------------------------------------------  
0x86040059 Region 0xe2e68008 
0x3a0508e1 Bitmap 0xe45af558 Width: 1042, Height: 586 , Bits/Pixel: 32
0x29040972 Region 0xe55938d8 
0x3b040bb4 Region 0xe42f9868 
0x9a040bd1 Region 0xe1cd5738 
0xb3010bec DC 0xe3f9e9c8 
0x17300c6a Pen 0xe40dce88 Color: 0x02b0b0b0, Width: 0, Style: 0x00000000
0xe2050fb3 Bitmap 0xe2268840 Width: 1046, Height: 150 , Bits/Pixel: 16
0x133011fb Pen 0xe565c868 Color: 0x02b0b0b0, Width: 0, Style: 0x00000000
0x7004123c Region 0xe5685c18 
0x4e0415eb Region 0xe3031688 
0xc504172b Region 0xe183cdb8 
0x94041733 Region 0xe362f2b8 
0xae05180e Bitmap 0xe2c44840 Width: 1046, Height: 607 , Bits/Pixel: 16
0xfe01198b DC 0xe4c22008 
0x60011b7d DC 0xe28af008 

Clearly some bitmap was indeed being added repeatedly but the questions now were which one and why? The other developer helping out noted that the width of 1042 and height of 150 match the size of the web app within the browser. An important detail.

Recalling the purpose of the Ajax timer control was to display on both web pages a processing progress message and image (specifically a gif file), I decided to delve deeper into that area of the code. A 'progress' element containing the gif image is shown to the user via JavaScript. In that same JS function, an IFrame element is created and added to the 'Content' property of the 'progress' element. I noticed that the width and height of this IFrame was set to the document's height and width:
var progressIFrame = document.createElement("IFRAME");
    // ...
    progressIFrame.style.width = document.body.clientWidth;
    progressIFrame.style.height = document.body.clientHeight;
This was consistent with the results from the previously listed diff of the memory usage since the bitmap's dimensions were about the same size as the browser's viewport. Then, the very next line in the function struck me as strange:
progress.Content = progress.parentNode.insertBefore(progressIFrame, progress);
The return value of 'insertBefore' is a reference to the 'progressIFrame' element which it is then set to the 'progress' element's 'Content' property. Not only does it do this but it also adds it as a node sibling to the very same 'progress' element.

Why add the IFrame element to the DOM in two places? It was far from clear what the intent was for using the 'insertBefore' function if the IFrame element was already being added as part of the progress element itself?

Therefore, I changed it to directly set it and not bother with 'insertBefore':
progress.Content = progressIFrame;
and that stopped the memory leak!

Now, admittedly, I am not comfortably well-versed in JavaScript and this is one of those cases that I do not fully understand why the IE browser behaved the way it did in reaction to how the JavaScript is written.

Sometimes this is the outcome of debugging code (particularly when written by someone else). Although you come away learning some things you did not know before, you also might not know how or why something was broken to begin with or why the fix itself even worked.

Like writing code, debugging code is both a skill and art. However, it is certainly a different mindset.