Did you ever have to delete a few million records from a SQL Server database?
If you ever find yourself in such a situation, there are some things you should know about what you should–and should not–do.
Let's dive right in.
How NOT to Do It
BAD OPTION #1- Deleting From a Linked Table in MS Access
This is the worst way to delete a large number of records.
If you execute a
DELETE * FROM MyLinkedSqlServerTable query in Access, Access will:
- Run a select query to return all the primary key values in the table
- Begin a transaction with SQL Server
- Send a separate DELETE command for every single row in the table
- Ask if you are sure you want to delete "xxx number of rows"
- If you click OK, it will commit the transaction, otherwise it will roll it back
This takes forever. Plus, once you exceed a certain number of rows, Access will warn you that if you keep going you won't have the option of canceling the operation.
BAD OPTION #2- Using the T-SQL DELETE Statement to Remove Millions of Rows at Once
Using SSMS or a pass-through query in Access are both much better options than deleting a large number of records from a linked SQL Server table.
However, simply executing the statement on the server is not a panacea. To ensure durability in case you decide to cancel your DELETE, SQL Server first writes all the operations to the log before committing the operation. This takes time.
It also puts relatively strong locks on the records being deleted, which can affect other processes that may be trying to read records in the table being deleted. Thus, it can negatively affect performance of the server overall.
How to DO It
GOOD OPTION #1: TRUNCATE table
By far, the fastest way to delete a bunch of records is to use the
TRUNCATE TABLE statement.
This is much faster than the
DELETE statement because it does not log any of the row-level delete operations. However, you can only use
- To delete ALL the records in the table
- On tables that have no referential integrity relationships with other tables
- On tables that are not involved in replication
If that's your situation, then
TRUNCATE TABLE is the way to go.
If it's not, read on for the next best option.
GOOD OPTION #2: DELETE Records in Batches
If you can't use TRUNCATE TABLE, your best option is to delete the records in batches.
With this approach, you execute several DELETE statements. Each statement removes a subset of the total number of records you want to remove. For example, instead of removing one million records with a single DELETE statement, you would delete 100,000 records ten times.
Unfortunately, this is a bit tougher than it sounds, as you can't use the TOP clause directly within a DELETE statement:
Limitations and Restrictions
TOPis used with
DELETE, the referenced rows are not arranged in any order and the
ORDER BYclause can not be directly specified in this statement. If you need to use TOP to delete rows in a meaningful chronological order, you must use
TOPtogether with an
ORDER BYclause in a subselect statement.
I'll cover in detail how you can "use
TOP together with an
ORDER BY clause in a subselect statement" in a future article.
UPDATE [2022-08-31]: The "future article" can be found here, HOW TO: Automate Batch Deletion of SQL Server Records.
A Parting Note
Deleting millions of records is not something you should routinely be doing.
In the past five years, I've had to do it exactly twice on a production database. Both cases dealt with automated logging applications that ended up in a state where they were producing excessive log data. One was a network monitoring app. The other was a Windows Event Log monitoring app.
So, while deleting such a large number of rows at once should be relatively rare, there are valid reasons you may need to do it.