SQL Server "Codify" Function

This function will jump-start the process of converting long descriptions into meaningful abbreviations. It's great for creating "Code" columns in lookup tables.

SQL Server "Codify" Function

I wrote yesterday about how I like to include a short, 2- to 5-character abbreviation code in my lookup tables.

Here's an excerpt from the article, Abbreviations in Lookup Tables:


BEFORE

Notice how much space the Industry column takes up.  If you didn't know any better, you would think that was the most important column on the form.

AFTER

The form below would be harder for a new user to understand, as the industry abbreviations will be unfamiliar to them.  But for a regular user, the condensed information would actually make this form easier to use.


Creating Abbreviations from an Existing Lookup Table

At the end of that article, I promised to provide a SQL Server function to generate these types of codes automatically.

To be clear, this function only provides a starting point for creating lookup table codes.  It won't do anything to prevent duplicate abbreviations.  Not every abbreviation it creates will be the best fit for the description it goes with.  It's main purpose is to jump start the process.

You (or your users) will still need to review the results and modify them as needed.

The Codify Function

The T-SQL Codify function that I adapted from this StackOverflow answer takes two arguments:

  • The text you want to abbreviate
  • The max number of characters you want returned

It follows a few simple rules:

  • Use the first letter from each word
  • Replace instances of 'and' with '&'
  • Don't allow a trailing '&'
  • If only one word, return first @codelen letters in the word

Here's the T-SQL:

/*****************************************************************
| This function creates an abbreviated code from a longer string
|   - Author: Mike Wolfe
|   - Source: https://nolongerset.com/sql-server-codify-function/
| 
| It follows these rules:
|     - Use the first letter from each word
|     - Replace instances of 'and' with '&'
|     - Don't allow a trailing '&'
|     - If only one word, return first @codelen letters in the word
|
| Adapted from: https://stackoverflow.com/a/9304623/154439
******************************************************************/
CREATE OR ALTER FUNCTION dbo.Codify (
    @txt nvarchar(2000),
    @codeLen int )
RETURNS nvarchar(2000) AS
        
BEGIN
    DECLARE @retval NVARCHAR(2000);
    DECLARE @str NVARCHAR(2000);

    SET @str=RTRIM(LTRIM(@txt));
    SET @retval=LEFT(@str,1);

    WHILE CHARINDEX(' ',@str,1)>0 AND LEN(@retval) < @codeLen BEGIN
        SET @str=LTRIM(RIGHT(@str,LEN(@str)-CHARINDEX(' ',@str,1)));
        IF UPPER(LEFT(@str, 3)) = 'AND'
            -- Use ampersand (&) instead of 'a' to replace 'And'
            SET @retval+='&'
        ELSE
            SET @retval+=LEFT(@str,1);

        -- Do not create a code with a trailing ampersand
        IF (LEN(@retval) = @codeLen) AND RIGHT(@retval,1) = '&'
            SET @retval = LEFT(@retval, @codeLen - 1)

    END

    IF Len(@retval) = 1
        SET @retval = LEFT(@txt,@codeLen);
    
    RETURN @retval;

END
GO 

Sample Usage

SELECT DISTINCT f.industry
 , dbo.Codify(f.industry, 2) AS [2-char Code]
 , dbo.Codify(f.industry, 3) AS [3-char Code]
 , dbo.Codify(f.industry, 4) AS [4-char Code]
 , dbo.Codify(f.industry, 5) AS [5-char Code]
 , dbo.Codify(f.industry, 6) AS [6-char Code]
FROM dbo.Fortune500 AS F
Sample data retrieved from: https://github.com/pssguy/fortune500/blob/master/fortune500.csv

And here are the results of the above query:

Industry Description 2-char Code 3-char Code 4-char Code 5-char Code 6-char Code
Advertising, marketing Am Am Am Am Am
Aerospace and Defense AD A&D A&D A&D A&D
Airlines Ai Air Airl Airli Airlin
Apparel Ap App Appa Appar Appare
Automotive Retailing, Services AR ARS ARS ARS ARS
Beverages Be Bev Beve Bever Bevera
Building Materials, Glass BM BMG BMG BMG BMG
Chemicals Ch Che Chem Chemi Chemic
Commercial Banks CB CB CB CB CB
Computer Peripherals CP CP CP CP CP
Computer Software CS CS CS CS CS
Computers, Office Equipment CO COE COE COE COE
Construction and Farm Machinery CF C&F C&FM C&FM C&FM
Diversified Financials DF DF DF DF DF
Diversified Outsourcing Services DO DOS DOS DOS DOS
Education Ed Edu Educ Educa Educat
Electronics, Electrical Equip. EE EEE EEE EEE EEE
Energy En Ene Ener Energ Energy
Engineering, Construction EC EC EC EC EC
Entertainment En Ent Ente Enter Entert
Financial Data Services FD FDS FDS FDS FDS
Food and Drug Stores FD F&D F&DS F&DS F&DS
Food Consumer Products FC FCP FCP FCP FCP
Food Production FP FP FP FP FP
Food Services FS FS FS FS FS
Forest and Paper Products FP F&P F&PP F&PP F&PP
General Merchandisers GM GM GM GM GM
Health Care: Insurance and Managed Care HC HCI HCIM HCI&M HCI&MC
Health Care: Medical Facilities HC HCM HCMF HCMF HCMF
Health Care: Pharmacy and Other Services HC HCP HCPO HCP&O HCP&OS
Home Equipment, Furnishings HE HEF HEF HEF HEF
Homebuilders Ho Hom Home Homeb Homebu
Hotels, Casinos, Resorts HC HCR HCR HCR HCR
Household and Personal Products HP H&P H&PP H&PP H&PP
Industrial Machinery IM IM IM IM IM
Information Technology Services IT ITS ITS ITS ITS
Insurance: Life, Health (Mutual) IL ILH ILH( ILH( ILH(
Insurance: Life, Health (stock) IL ILH ILH( ILH( ILH(
Insurance: Property and Casualty (Mutual) IP IPC IP&C IP&C( IP&C(
Insurance: Property and Casualty (Stock) IP IPC IP&C IP&C( IP&C(
Internet Services and Retailing IS ISR IS&R IS&R IS&R
Mail, Package, and Freight Delivery MP MPF MP&F MP&FD MP&FD
Medical Products and Equipment MP MPE MP&E MP&E MP&E
Metals Me Met Meta Metal Metals
Mining, Crude-Oil Production MC MCP MCP MCP MCP
Motor Vehicles and Parts MV MVP MV&P MV&P MV&P
Network and Other Communications Equipment NO N&O N&OC N&OCE N&OCE
Oil and Gas Equipment, Services OG O&G O&GE O&GES O&GES
Packaging, Containers PC PC PC PC PC
Petroleum Refining PR PR PR PR PR
Pharmaceuticals Ph Pha Phar Pharm Pharma
Pipelines Pi Pip Pipe Pipel Pipeli
Publishing, Printing PP PP PP PP PP
Railroads Ra Rai Rail Railr Railro
Real estate Re Re Re Re Re
Scientific,Photographic and Control Equipment SC S&C S&CE S&CE S&CE
Securities Se Sec Secu Secur Securi
Semiconductors and Other Electronic Components SO S&O S&OE S&OEC S&OEC
Specialty Retailers: Apparel SR SRA SRA SRA SRA
Specialty Retailers: Other SR SRO SRO SRO SRO
Telecommunications Te Tel Tele Telec Teleco
Temporary Help TH TH TH TH TH
Tobacco To Tob Toba Tobac Tobacc
Toys, Sporting Goods TS TSG TSG TSG TSG
Transportation and Logistics TL T&L T&L T&L T&L
Transportation Equipment TE TE TE TE TE
Trucking, Truck Leasing TT TTL TTL TTL TTL
Utilities: Gas and Electric UG UGE UG&E UG&E UG&E
Waste Management WM WM WM WM WM
Wholesalers: Diversified WD WD WD WD WD
Wholesalers: Electronics and Office Equipment WE WEO WE&O WE&OE WE&OE
Wholesalers: Food and Grocery WF WFG WF&G WF&G WF&G
Wholesalers: Health Care WH WHC WHC WHC WHC
zMiscellaneous zM zMi zMis zMisc zMisce

Referenced articles

Abbreviations in Lookup Tables
One way to boost the signal-to-noise ratio on your continuous forms is to use abbreviations rather than full descriptions for lookup tables.

Images:

All original code samples by Mike Wolfe are licensed under CC BY 4.0