VIDEO: "Codifying" Long Names

Several months ago, I wrote an article about creating shortened codes from long descriptions:

Abbreviations in Lookup Tables
One way to boost the signal-to-noise ratio on your continuous forms is to use abbreviations rather than full descriptions for lookup tables.

A SQL Server Solution

To make it easier for others to implement, I posted the T-SQL to create a custom function in SQL Server named "Codify".  The function followed these rules to convert a long description into a short, 2-5 character code (the length is customizable):

  • Use the first letter from each word
  • Replace instances of 'and' with '&'
  • Don't allow a trailing '&'
  • If only one word, return the first 2-5 letters in the word
SQL Server “Codify” Function
This function will jump-start the process of converting long descriptions into meaningful abbreviations. It’s great for creating “Code” columns in lookup tables.

An MS Access Solution

The big problem with the solution I posted is that it only worked on SQL Server data.

Former longtime Access MVP, Jack Stockton, has rectified that problem by writing an equivalent function that runs in VBA.  What's more, Jack presented his solution to a meeting of the Pacific Access User Group meeting, so now there is video of the solution in action.

Point of Clarification: Uniqueness Not Guaranteed

During the presentation, one of the viewers asked a question about whether the function is guaranteed to create unique codes.

The short answer to that question is No.  

First off, it's not possible to guarantee unique codes that are created from longer strings due to the pigeonhole principle.  To make matters worse, the function has no memory.  Like a spin of the roulette wheel, it is not influenced by previous calls.

One could try to account for previously generated codes, but that would be a monumental effort.  The costs would far outweigh the benefits.

My preferred approach is to tailor the character code length based on the number of source items.  With a half-dozen items, a single-character code may suffice.  With several hundred categories, you will want a four- or five-character code to minimize collisions.

Ultimately, you should use the Codify function only as a starting point.  I mentioned this explicitly in the original article:

To be clear, this function only provides a starting point for creating lookup table codes.  It won't do anything to prevent duplicate abbreviations.  Not every abbreviation it creates will be the best fit for the description it goes with.  It's main purpose is to jump start the process.

BONUS MATERIAL: DocTests()

Jack also covered one of my favorite concepts: Documentation Tests.  He covers the concept at the 12:21 mark of his presentation.

I love the doc test concept for three reasons:

  1. They let you learn by example.
  2. The documentation doesn't go stale.
  3. Writing tests is simple and easy.
Python-inspired Doc Tests in VBA
Doc tests are not a replacement for unit or integration testing. But they do provide the best return on investment (ROI) of any type of test, mostly because the effort to write them is near zero.