Signal vs. Noise: ShowPlan.out Edition

Colin Riddington wrote a series of articles performing "Speed Comparison Tests."

I found article #8, Optimize Queries, to be especially relevant to my upcoming presentation on Troubleshooting Query Performance.  In the article, Colin takes a moderately complex query and optimizes it through a series of incremental changes.  It's a fantastic introduction to the query optimization process.

What I was most interested in, though, was the showplan.out output.  

In my JetShowPlan primer, I wrote a section about how you can define a showplan.out "User Defined Language" in Notepad++.  This was a perfect opportunity to put that UDL to the test.  

Does my Notepad++ UDL boost the signal to noise ratio of the showplan.out file?

What's So Special About the UDL?

I wanted to make the showplan.out file easier to interpret.

Specifically, I wanted to make the wrong code look wrong and the right code look right.  This would boost the signal of the file.

I also wanted to hide text that I didn't immediately care about, thus reducing the noise of the file.

BAD Keywords Colored Red

These keywords highlight potential performance problems, so I color them red:

  • temp
  • temporary
  • scanning
  • X-Prod

The temp and temporary keywords show that the database engine is creating intermediate results, which takes time even if those temporary results only reside in memory.  These are the least BAD keywords.

The scanning keyword shows that the db engine had to perform a full table scan.  In other words, it had to examine every single row of the table.  This can happen if you try to sort by a non-indexed field or use a custom function in a WHERE clause or any number of other things.  This is the most medium BAD keyword.

The X-Prod keyword appears when the db engine had to perform a cross-product join between two or more tables.  There are rare cases where this is necessary, but most of the time it is a HUGE red flag.  The cross product of two ten-row tables is an output table with 100 rows (10 x 10).  This is the most BAD keyword.

GOOD Keywords Colored Blue

The opposite of BAD is GOOD.  These keywords highlight areas where the db engine was able to optimize its query execution plan to save time:

  • index
  • rushmore

Both of these keywords refer to indexes.  And when it comes to SELECTing records, indexes are good (they're not so good when it comes to INSERT/UPDATE/DELETEing records, which is why you should not throw them on every single database column just for the heck of it).

Rushmore refers to special indexes that are highly optimized for the Access Jet/ACE database engine.  In other words, rushmore is the most GOOD keyword.

"Inputs to Query" Sections Made Foldable

The other thing I find annoying when reading a showplan.out file are all the lines dedicated to listing all of the "inputs to the query," including:

  • Tables
  • Indexes
  • Index statistics

Don't get me wrong, this information is handy to have available, but I don't want it taking up valuable screen real estate and distracting me from the actual steps used to build and execute the query plan.

Thus, the Notepad++ UDL adds an option to collapse these sections.  You can collapse them one at a time using the plus/minus signs in the left margin of the window.  You can collapse all of them using the Notepad++ menu, View > Fold All (shortcut key [Alt] + [0]).

Before and After

To really appreciate the benefits of this UDL, I want to show a before and after screenshot of the showplan.out file.  

This is output from the processing of a single query in Colin's test suite (qryTestD).

The original on the left is a giant Wall of Text™.  

The UDL-ified version on the right is faster to read and easier to scan for problems.

Putting My Keywords to the Test

I was curious about whether my BLUE and RED keywords really correlated with query performance.

I was greatly relieved to see that they do:

I ran each of Colin's queries and then grabbed screen captures of the resulting showplan.out files in Notepad++ (see below for each screenshot).  In each screenshot, I counted the number of words colored red and the number colored blue.  I entered the totals in the Red Text and Blue Text columns, respectively.

The net red/blue word count appears in the final column, Red/Blue Net.

As you can see, the two slowest queries (B &C) have more red words than blue words in their showplan.out file.

The five queries in the middle (D - H) have the same number of red words as blue words.

The three fastest queries (I - K) have more blue words than red words.

The Red-Blue ShowPlan Heuristic

In psychology, heuristics are simple, efficient rules, learned or inculcated by evolutionary processes, that have been proposed to explain how people make decisions, come to judgements, and solve problems typically when facing complex problems or incomplete information.

Based on these results, I propose the following heuristic:

  • Query plans with more red than blue are BAD.
  • Query plans with the same red as blue are OK.
  • Query plans with more blue than red are GOOD.

I want to emphasize that this heuristic is a very simplified rule-of-thumb.  It is not a replacement for proper analysis of the showplan.out file.  It's merely a starting point.

If it makes the showplan.out file more approachable for curious Access developers, then in my mind it will have done its job.


Detailed Outputs

Here are the results of running Colin's queries on my development machine:

-

NOTE: I skipped over Query A as its performance was so intentionally atrocious it would have frozen my machine if I tried to run it through 100 (or even 10) loops.

Test Output

The remainder of this article is nothing more than a set of screenshots that show Colin's query SQL with its associated showplan results.  Refer to Colin's article for his excellent analysis of how each query improves upon the results of its predecessor (note that in my testing, query D is actually faster than queries E and F).

B) Outer Joins -Many => One

C) Outer Joins - One => Many

D) Inner Joins - Nz(Count)

E) IIf(Count)

F) Simple Count

G) Indexed Order By

H) Where vs Having

I) Indexed Filter fields

J) First vs GroupBy

K) Stacked Queries


External references

Speed Comparison Tests 8
This is the eighth in a series of articles discussing various tests done to compare the efficiency of different approaches to coding. Example databases are provided so the tests can be done on your own workstations
Microsoft Access tips: Optimizing queries
How to get JET (the database engine in Microsoft Access) to execute queries quickly

Referenced articles

JetShowPlan: A Primer
You may be familiar with JetShowPlan, but I guarantee you’ve never read an article about it quite like this one.

Image by Chris Crowder from Pixabay