Autonumber Primary Keys

They're fake ... and they're spectacular!

Autonumber Primary Keys

We use an autonumber field as the primary key for every new table we create at Grandjean & Braverman.

The value of this field is completely arbitrary (it is a surrogate key).  It has no meaning in the context of the rest of the record.  Its sole purpose is to act as a unique identifier that will never change.

They're fake ... and they're spectacular.

Naming Convention

There are two main schools of thought when it comes to naming autonumber primary keys.

The first school of thought is that the field should simply be named ID.  The name of the table would be used as a qualifier to distinguish the ID field on the Customer table from the ID field on the Invoice table (e.g., Customer.ID vs. Invoice.ID).  Foreign keys that reference the field would be named {TableName}ID.  A sample join would look like this:

FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.ID

The second school of thought is that you name the primary key and all the referencing foreign keys using the {TableName}ID format.  There are no fields named "ID" in the entire database.  A sample join would look like this:

FROM Invoice INNER JOIN Customer ON Invoice.CustomerID = Customer.CustomerID

Of course, in practice when you write SQL by hand, you should be aliasing your table names for readability.  So the two approaches would look like this:

FROM Invoice AS I INNER JOIN Customer AS C ON I.CustomerID = C.ID

-- vs. --

FROM Invoice AS I INNER JOIN Customer AS C ON I.CustomerID = C.CustomerID

Many an online flame war has been started over which school of thought is "better."  I find those arguments silly.  The second school of thought is obviously correct.

Or, as my wife tells me, "You're entitled to your opinion, so long as you acknowledge that mine is right."

There Are No Exceptions

For years I followed the rule of adding an autonumber primary key...most of the time.

Occasionally, I would think that I didn't really need a surrogate key for a particular table.  I could use a natural key instead and save myself the hassle of creating the extra column ([NARRATOR]: It's really no hassle).

I'll admit, I seldom regretted making the design choice to forego the autonumber key.  HOWEVER, on those rare occasions when I chose not to use an autonumber and later came to regret it... I really regretted it.

Once you reach the point where you realize you should have implemented an autonumber primary key, it is VERY painful to try to retrofit it into your design.  It requires updating every query where the table is involved in a join.  It's difficult to do that without breaking something in the process.

The bottom line: the downsides to including an autonumber primary key are negligible, but the potential downsides to skipping an autonumber primary key are substantial.

Many-to-many tables

It takes three tables to model a many-to-many relationship.  

For example, let's say you have a system that supports multi-client projects.  One client can be involved in many projects.  And one project can include many clients.  The relationship between clients and projects is many-to-many.

I would model this with a Project table, a Client table, and a Project_Client table.  Here's a typical query joining all three tables:

FROM Project_Client AS PC
  INNER JOIN Project AS P ON PC.ProjectID = P.ProjectID
  INNER JOIN Client AS C ON PC.ClientID = C.ClientID

You only need two fields in the Project_Client table:

  • ProjectID
  • ClientID

Those two fields can be combined to create a natural, composite key.  

If the table is only ever used as the glue that holds together the Project and Client tables, that approach is probably adequate.

But what if one day you decide you want to track additional information about the Project-Client relationship?  For example, let's say you want to track which days a client joined and left the project.  You might want to add a couple of fields to do this:

  • JoinedOn
  • RemovedOn

Now all of a sudden your table is more than just glue.  It has its own information independent of the many-to-many tables which it holds together.  At this point, it would be handy if there was a single number you could use to uniquely identify the table.  

Perhaps an automatically incrementing number stored in a field named Project_ClientID?

Lookup tables

Another place where I have gotten myself into trouble is with lookup tables.

Instead of an autonumber field, I have often used single-character text fields as primary keys.  For example, here are some rows in a luPmtType table:

  • C: Cash
  • D: Debit Card
  • K: Check
  • R: Credit Card
  • Y: Crypto

The appeal of using characters instead of numbers is that the letters provide a useful mnemonic.  It is easier to remember that checks have a PmtTypeCode of "K" than it is to remember that they have a luPmtTypeID of 3.

One problem with this approach is that the mnemonic gets less obvious as you continue to add new rows to the lookup table.

Honestly, though, I'm a bit conflicted on this one.  I think you could make a rather compelling signal vs. noise argument that letters carry significantly more semantic meaning than arbitrary numbers.  And that this benefit outweighs the potential downsides of using a natural key.

For the sake of consistency, though, I ultimately feel that autonumber primary keys should also be used for lookup tables.

Exception #1: One-to-One Tables

One-to-one relationships are relatively rare in most relational database systems, but they have their uses:

  • Reducing conflicts in merge replication
  • Reducing disk usage for tables with sparsely populated columns
  • Mitigating performance hit from binary large object fields (BLOBs)
  • Creating logical groups of required fields that are collectively optional
  • Reducing row locking for tables with dozens of fields

A one-to-one relationship requires that there be one (and only one) record in table A for each corresponding record in table B.  Thus, whichever table creates its records first needs to have an autonumber primary key.  The other table(s) would simply use the number generated and assigned to the original table.

Exception #2: Merge Replication

With SQL Server merge replication, two different devices ("subscribers") could each be creating new records in the same table while disconnected from the network.

In this scenario, the autonumber primary key field by itself is not enough to ensure uniqueness.  

There are a number of ways to deal with this problem: GUID primary keys, identity ranges, etc.  My preferred solution is to include a device source column as part of a surrogate, composite key.  The device source column's default value is the name of the device (server, local laptop, etc.) where the INSERT occurs.  As long as the devices all have unique names, this approach eliminates conflicts.  

It's still not a great solution, because now you're stuck with two fields you need to include as foreign keys on related tables.  That said, I still prefer it to working with GUIDs.

External references

Surrogate key - Wikipedia
Natural key - Wikipedia
Composite key - Wikipedia
Merge Replication - SQL Server
Merge replication uses a snapshot of the publication database objects and data, and then tracks modifications at the Publisher and Subscribers with triggers.

Image by Uwe Baumann from Pixabay