3 Techniques for Programming Reversible Processes
How would you recover from a large accidental UPDATE operation? You'd better be able to answer that question *before* it happens.
Whatever can go wrong, will.
Set-based operations are one of the great advantages of relational database systems.
The ability to insert, update, or delete millions of rows of data with a single SQL statement is very powerful. But, to quote another adage, "with great power comes great responsibility." A mistake that affects millions of rows of data can be very difficult to roll back.
As programmers, we naturally spend the most time worrying about our own mistakes. I've written an entire series of articles on defensive programming practices all in the pursuit of eliminating or mitigating programmer error. But it turns out programmers are not the only people who can make mistakes.
You know who else occasionally makes mistakes? Users.
So, for every set-based operation you program into your applications, take at least a moment to answer the question,
"How would I undo this process if a user came to me and said, 'I made a mistake; can you fix it?'"
Techniques for Programming Reversible Processes
- Include a foreign key to a process table
- INSERT instead of UPDATE
- Restore from backups
Include a Foreign Key to a Process Table
If you have a set-based operation that runs a large INSERT query, it's not always possible to identify which records were added via a particular process.
If the table has an autonumber ID, you may be able to identify a range of IDs that represent the process you want to reverse. You could use that range to run a DELETE query to remove the newly added records.
But what if a second user manually created a record in the target table while the large INSERT process was running? In such a situation, you could end up with an unrelated record being mixed in with the ones you want to delete.
A better option would be to have a standalone Process table with a ProcessID primary key column. You would also create a ProcessID field in your target table.
Now, whenever you run your large INSERT operation, you first create a record in the process table. This will generate a new ProcessID value. When you run your INSERT operation, assign this value to the ProcessID field in the target table. Then, if you need to delete all the records you added as part of executing the process, it's easy to do:
DELETE * FROM LateCharge WHERE ProcessID=1234;
You can include other information about the operation in additional fields of your Process table, such as:
- Login of the user executing the process
- Date/time the user started the process
- A description of the process
INSERT Instead of UPDATE
It's much easier to recover from a large INSERT operation than it is to recover from a large UPDATE operation.
Undoing an INSERT operation is a simple matter of deleting all the records you just added. If you include a ProcessID field in the target table as shown above, this becomes dead simple.
Recovering from an UPDATE operation is way trickier. Depending on the nature of the problem, it might not even be possible. This is a great example of how asking the question, "How would I undo this process?" during the initial design of your application could save you a lot of headaches down the road.
For example, let's imagine you have an invoicing feature in your application. Customers have 30 days to pay their bill before a ten-dollar late charge gets applied. You could do this by directly updating an AccountBalance field on a Customer record for all overdue balances. Something like this (air code):
UPDATE Customer SET AccountBalance = AccountBalance + 10 WHERE BalanceDueOn < (Date() - 30)
Good luck sorting that out when Bob and Alice from accounting sheepishly tell you that they both ran the process at the same time this month and doubled up on their customers' late charges.
Compare that to this roughly equivalent INSERT option (again, air code):
INSERT INTO AccountTransaction (CustomerID, AppliedOn, Description, Amount, ProcessID) SELECT CustomerID, Date(), 'Late charge', 10, 1234 FROM Customer WHERE BalanceDueOn < (Date() - 30)
Now when Bob and Alice come crawling to you, you can check the Process table, see that Bob's process has an ID of 1234 and Alice's process has an ID of 1235 and fix the problem quite easily with the following query:
DELETE * FROM AccountTransaction WHERE ProcessID = 1235
Restore from Backups
This is your fallback solution.
I'll just assume you're taking regular backups of your backend data. If not, go do that right now. If your backend data is in Access files, set a few scheduled tasks to run batch files that create daily and/or weekly and/or monthly backups. If your backend data is in SQL Server, follow my step-by-step instructions for Creating Scheduled Backups Using SQL Server Express.
This is the simplest technique during development because it requires zero additional effort. However, if you need to rely on a backup to restore data as the result of an errant user action, the recovery process will prove tedious, error-prone, and unlikely to result in a complete rollback.
But, hey, it's better than nothing.
By the way, if a user tells you that they've made a mistake that's affected a large swath of data, your very first step in troubleshooting should be to make a copy of the most recent backup from before their mistake. In many backup scenarios, the most recent backups are overwritten regularly, so make sure you grab that sucker before it's gone.
Cover image created with Microsoft Designer