When Should You Include an Autonumber Column in a Table?
Surrogate vs. Natural Keys. GUIDs vs. Autonumbers. What factors dictate whether to include an autonumber column in your table? When should you do it?
Always.
It is awfully tempting to just end the article here, but I suppose I should explain myself and respond to a few common counterarguments.
Why Do I Say You Should ALWAYS Include an Autonumber Column?
Let me count the ways.
- Convenience. Autonumber columns (IDENTITY in SQL SERVER) contain whole numbers. They are also the primary key column in most tables (see limited exceptions below).
- Low cost. Data storage costs are so low that adding an extra number field to a table is negligible. Over the ten-year lifespan of a project, the cost of an autonumber field is almost certainly less than even half-an-hour of development time. In my experience, autonumber fields help minimize development and support costs, so they more than pay for themselves.
- Debugging aid. The ever-increasing nature of autonumber fields make them an invaluable debugging aid. Being able to see the sequence that records were added to a database relative to other records has saved me more times than I can
countremember (I can count really high). - Future proof. I have often wished I used an autonumber field as the primary key on a table. A common example is the junction table that sits in the middle of a many-to-many relationship. In the past, I often created a primary key from the two foreign key fields on the table. Inevitably, I would realize long after the fact that I needed additional fields on the junction table, and that it would be nice to be able to refer to those records with a single identifier.
- Consistency. Being able to use the same data type to identify a unique record for any table makes it easier to rely on consistent design patterns when building your application. This may take the form of generic DeleteRecord methods, auto-generated ORM-style classes, or auto-generated CRUD (CREATE/READ/UPDATE/DELETE) stored procedures.
But What About...
...Using Natural Keys Instead?
In the theoretical world of computer science and database normalization, there used to be a strong preference for natural keys. According to the theory, you should always be able to identify one (or more) fields that can be combined to produce a unique value.
This sounds nice in theory, but in practice it has several problems:
- Compound keys (a primary key composed of multiple fields) are super-annoying when it comes time to establishing table relationships. If your table has a three-column primary key, then every related foreign table must also include the same three columns. Madness I tell you!
- Reality has a tendency to not cooperate with theory. Pick your adage/quote:
• "The best laid plans of mice and men often go awry." -Robert Burns
• "No plan survives contact with the enemy." -Moltke the Elder
• "Everyone has a plan until they get punched in the mouth." -Mike Tyson
• "Anything that can go wrong will go wrong." -Murphy's Law
Too often the assumption that a natural key will be unique always and forever turns out not to be the case. And if and when that happens, you've got a real mess on your hands because your entire database may need to be redesigned. - Natural keys–especially those that include string fields or multiple columns–perform poorly compared to an autonumber primary key. While academia and computer science theory may not care about the performance of a relational database, your clients certainly do.
Unique Natural Keys are Still Important
You should absolutely identify the natural keys in your tables and create unique indexes to enforce this uniqueness.
However, natural keys and unique indexes should be in addition to a surrogate autonumber primary key.
When Unique Natural Keys Turn Out Not to be Unique
If it turns out that what you thought was a natural key can have duplicate values after all, you will need to deal with that.
You will likely need to add a new column to the key (and possibly the table) to maintain this uniqueness. As long as you weren't using the natural key as the table's primary key, then you simply update the unique index to include the additional column.
The rest of your database design is not affected.
Contrast that with what would happen if the natural key you identified was also the table's primary key. Now you have to add the additional column to:
- the primary key
- any tables with foreign key relationships
- every query, form, report, view, database trigger, and stored procedure that relies on that natural key
- which may require doing things in a certain order, temporarily disabling replication, and/or dropping and recreating constraints and indexes
In other words, you would have one monumental mess on your hands.
...Using GUIDs Instead?
Globally unique identifiers (GUIDs) have an important place in database design when it comes to replication and similar synchronization scenarios (see below). But unless you have that particular use case, autonumbers are far superior to GUIDs as surrogate keys.
- Footprint. GUIDs are 128 bits which is four times the size of a typical autonumber (32-bit
int
). Keep in mind that every index includes the primary key, so this size difference is multiplied by the number of indexes on the table. - Readability. While you should (almost) never expose an autonumber field to a user, it's a whole lot nicer as a developer to work with whole numbers than with monstrosities like this:
0E984725-C51C-4BF4-9960-E1C80E27ABA0
. Sure, you can copy and paste GUIDs, but with autonumbers you don't have to do that. - Brain-friendliness. The typical human brain can hold about seven items in short-term memory. That means you can easily remember autonumber values up to ten million. But it is impossible–unless you're a savant–to hold an entire GUID in your short-term memory.
- Performance. While the exact conditions matter, autonumber columns are generally about 20% more efficient than GUIDs when joining tables. Also, the fact that GUIDs are four times larger than 32-bit autonumber integers means that index lookups require additional hops (on average).
When Autonumbers Are Not Enough
While I always recommend including an autonumber column, sometimes they are insufficient on their own as a primary key.
Replication and Synchronization
The most common scenario where autonumbers are not sufficient to act as primary keys for a table is when records may be created within different databases and need to be synchronized at a later time. This is typical in replication scenarios (especially merge replication).
Even in situations where autonumbers are insufficient on their own, I still like to include them in my tables because they are so handy for debugging and troubleshooting.
As far as what to use as the primary key for such tables, I would recommend one of two options:
- Autonumber + DeviceSource (assuming devices have unique names)
- GUID
The pros and cons of each approach are a topic unto themselves. For the purposes of our discussion here, though, the key takeaway is that I recommend including autonumbers even if they are insufficient to serve as primary keys on their own.
OK, Two Exceptions
Single-Row Configuration Tables
For configuration tables that–by design–will only ever hold a single row, then you don't need an autonumber column. However, I strongly recommend that you use constraints to enforce the single-row nature of the table.
Tables Whose Records are Tied to Application Logic
Occasionally you need to alter application behavior based on the value in a database field (such as a product category).
To manage this scenario, I will create a read-only lookup table (what I refer to as a system table) with hard-coded primary key values. I still use an int
as the primary key, but I don't make it an autonumber. Within my code, I declare an enum with explicit values to match the primary key of the lookup table.
The main purpose of the lookup table itself is to enforce referential integrity with any related tables so that users can only choose one of the values we've accounted for within our code.
...but it still doesn't hurt to include one
Even though an autonumber field is not necessary for either of these scenarios, it also won't hurt you much (if at all) to include it. Which brings me full circle to my original answer.
If you want a simple guideline when designing tables in a relational database, "always include an autonumber column" is a pretty good one.
UPDATE [2023-06-17]: Added final section "...but it still doesn't hurt to include one" to emphasize that you simply cannot go wrong including an autonumber column in your table.