Debugging Automation Errors

Come join me on a journey debugging hard-to-reproduce "Automation error" bugs.

Debugging Automation Errors

I was troubleshooting an error in Access today.  It was one of the dreaded Automation error's.  These are the errors that you see when working with COM objects or ActiveX controls, for example.

Automation errors are often accompanied by an error number like -2147217406.  The actual error is not really important.  What I want to talk about in this article is how I go about debugging these types of nebulous and inscrutable errors.

First, as a point of reference, this is the line of code that was generating the above error number:

Set oReg = GetObject("winmgmts:{impersonationLevel=impersonate}!\\.\root\default:StdRegProv")

The above code returns an instance of the StdRegProv class, which can be used to read and write to the Windows Registry.  This code had been working for weeks on dozens of different computers, but it began failing consistently this morning on one specific machine.

Efficient debugging starts with good logging

Here's the original error report I got from my client's IT department:

Joe is receiving a runtime error when he tries to open Vision Viewer.

That's the whole error report.  No screenshot.  No exact error message.  No real help of any kind.

But that didn't matter to me.  Why not?  Because vbWatchdog was logging these errors in our Fogbugz bug-tracking database.  Here's an excerpt from one of the logs:

The error log includes which user and computer had the error (partially blurred out for privacy).  It includes the name of the application ("VisionViewer"), the module and routine name ("clsSQL.LatestSqlOdbcDriver"), the program version number ("20.11.11a"), and the Access version ("2013").  It also shows the error message ("Automation error"), the error number ("-2147217406"), and the line number and line of code that raised the error ([  2] Set oReg...).

Start with Google

The very first thing I always do with an error I'm seeing for the first time is to Google it.  I know that "Automation error" is too generic on its own to yield useful results.  So I combine the error number with the message and search for Automation error "-2147217406" (the quotes around the number are important because they tell Google to look for exact matches).  The very first result took me to a page with the following advice:

Screen grab from tek-tips.com

The above advice may very well have solved my problem.  But I did not go that route for a couple of reasons.  First, the other top search results did not recommend that particular solution.  The lack of consistency worried me.  Plus, I don't routinely reinstall WMI (Windows Management Instrumentation).  Maybe it's no big deal.  But maybe a failed WMI re-install will brick the machine.

In general, I try not to rely on solutions that require doing things to the target computer.  If there is a way to work around it in code, that's always the better option.  Why?  Because if this is a problem on one machine today, it could be a problem on one, two, three or more machines over the next several years.  And I don't want to start distributing instructions for how to rebuild WMI to every user that will be running this application.

Reproduce...then fix

Before fixing any error, I always try to reproduce it.  This is often impossible with these kinds of errors, though.  As I said, this piece of code had already run hundreds or thousands of times on dozens of other machines with no problem.  As I suspected, I was unable to reproduce the error on my machine.

The next thing I would try is to reproduce the error on the device where the error occurred.  Often times, these types of errors can come and go even on the machine that is having the problem.  Luckily, the error was easily reproducible on the target machine.  (Trust me, it's waaaaay easier to troubleshoot a consistent error than one that comes and goes.)  

I was not surprised that I was able to reproduce it on the target machine, though.  The Fogbugz case had already recorded nine separate occurrences of the bug, all from the same machine and all in the past 36 hours.

Did you try restarting?

Having confirmed the bug on the target machine, I restarted the machine.  I did this even though the user told me they had already tried a restart.  It's not that I didn't believe the user; rather, it's that most users equate a shutdown-power on cycle as equivalent to a restart.  This seems logical.  When you shut down the computer, it no longer has power.  Surely, this is a "harder" reset of the computer than a simple restart where the computer never really "turns off."

However, the truth is counterintuitive.  Since Windows 10 introduced the fast boot feature, shutting down the computer is more like a hibernation.  Memory is saved to disk; services are placed into a state of suspended animation.  It's as if the whole computer is flash-frozen into cryogenesic stasis.  When the computer is turned back on, the pre-shut down environment "thaws" and things pick up where they left off.  Contrast that with a restart.  When you Restart Windows, the memory is wiped clean and all the services are shut down then started anew.

Check permissions

Alas, the simple restart did not resolve the error.  The next thing I tried was to run the application as a local administrator.  Permissions problems are a common cause of cryptic error messages that affect some machines and not others.  In this case, running as an administrator made no difference.

Debugging on the target machine

If the target machine has a full version of Access on it, you can step through the code right on the machine.  

I always create shortcut icons on the desktop and/or Windows Start Menu that my users click on to start my programs.  These shortcuts include the /runtime flag as part of the command line, so I can't simply double-click an icon and start troubleshooting the program.  Instead, I start up Access itself and then load the .accdb or .mdb file.

If you distribute compiled .accde or .mde files, you will obviously need to copy the uncompiled .accdb or .mdb file to the target machine for debugging.

Oftentimes, users do not have full versions of Access installed.  Almost all of my users have Word and Excel, though.  If the problematic VBA code can be isolated, I will run it from within Word or Excel on the target machine.

Rarely, the target computer does not have any Office application besides the Access runtime installed.  In the few times that has happened, I have adapted my code to run inside a VBScript file.  This is a technique that I use especially when I'm debugging an issue related to CreateObject().

Wireshark, procmon, and dumps, oh my!

When I'm truly desperate for a solution and I think the problem is originating outside of VBA, I will reach for a couple of truly low-level debugging tools.  Procmon is short for Process Monitor.  It logs every single Windows API call.  It's an avalanche of noise, and it takes time to sort through it all to find the signal.  The key is to use good filters.  The first one I always add is: Process name : is : msaccess.exe -> Include.  This will limit the logged calls to those coming from Access.  You will need to continue filtering down from there, but that's a good place to start.

If I'm troubleshooting a particularly vexing networking problem, that's when I reach for Wireshark.  I'm terrible with it.  It takes me close to an hour to set up even simple packet capturing.  And it suffers from the same noise vs. signal problems as Procmon.  But it is a mighty beast if you can manage to tame it.  That said, I avoid using it whenever I can.  Procmon itself monitors some network activity, so depending on your needs that may suffice.

The last one is memory dumps.  I don't actually use those.  I just needed a monosyllabic low-level debugging tool to fill out my section-heading homage to the Wizard of Oz.  For you masochists out there who enjoy reading and writing assembly code in your spare time, I guess reading memory dumps could be fun?

Skinning cats

If you debug something long enough, you eventually reach the point of diminishing returns.  Another five hours of combing through memory dumps is probably not a good use of your time (think of all the assembly you could be writing!).  When I've exhausted all my troubleshooting ideas, sometimes a tactical retreat is the best strategy.  I step back from the bug, evaluate what the purpose of the code actually is, and then try to come up with another way to skin the cat.  

In the end, that's how I solved the specific bug at the top of this article.  I simply refactored my code to use the following approach to reading the registry.  This one worked like a charm:

Dim oShell As Object: Set oShell = CreateObject("WScript.Shell")
oShell.RegRead("...")

Photo by Vlad Bagacian on Unsplash

All original code samples by Mike Wolfe are licensed under CC BY 4.0