I am a father of four great kids. My kids have three sets of grandparents. Every year between Christmas and New Year's, those four kids receive four sets of presents. Let me tell you, that's A LOT of cardboard and wrapping paper.
As such, my annual New Year's Day ritual is burning all that cardboard and paper. When I first started doing this, I would make a big pile of all the cardboard and all the paper. I would fill a bucket of water and run a hose out to the edge of my driveway where I had gathered my cornucopia of combustibles. I would strike a match, light a few pieces of wrapping paper, then step back for those few harrowing minutes it would take for the conflagration to consume the cardboard.
The heat was brief but intense. Occasionally an errant gust of wind would blow through and make me double-check that I had enough slack in the hose. Luckily, nothing ever went wrong, notwithstanding some singed eyebrows and arm hair. But most years the fire got big enough that it drew my wife out on the porch to make sure I wasn't about to burn down the whole forest.
I still burn all that cardboard each year. But I no longer make one big pile. I burn a few boxes at a time. It takes two to three times longer now, but I never worry about it getting out of control any more.
Burning cardboard is a lot like a large data migration. You can do it all at once, or slowly over time. When I was younger, I migrated my data like I burned my cardboard: all at once in a single high-stakes operation. Perhaps my risk tolerance has decreased with age. I like to think I've gotten wiser.
My company wrote a custom computer aided mass appraisal (CAMA) system nearly twenty years ago. The program included a photo storage component. We stored the photos on the file system and not in the database. The application used information in the database records to build the photo names that were used to store and retrieve the images.
There were only a few hundred photos when the system was first put in place. Those photos were all stored in a single folder, with no subfolder structure. Fifteen years and 400,000 photos later, that was a decision we wished we could have back. Once a Windows directory hits about 300,000 files (regardless of size), the file system performance falls off a cliff. The time to retrieve a single file drops precipitously from well under half a second to 2 - 4 seconds per file.
We had also allowed users to manually re-order photos on each parcel. This came with another regrettable design decision: the image file name reflected the sort order on the parcel. We did this intentionally so that users could view the images in a third-party image viewer and have the pictures appear in order. Unfortunately, the mass renaming process started breaking when we hit the same critical file count threshold. At about 300K files in a single directory, the file system abstraction began leaking uncontrollably.
Our fix was to subdivide all those files into two new folder levels. In other words, we would add up to 1000 subfolders and each of those subfolders would have up to 1000 sub-subfolders. We would also decouple the photo numbering from the photo file name.
My naïve approach
My first thought was the way I would have handled this in the past. This is the high-stakes approach.
First, I need to point out that there are two parts to this kind of migration. One part is changing the program logic to construct the new file name and the other part is the renaming of the files themselves. With an all or nothing approach, the timing is critical. We need to change the program to use the new file names. Then, during off hours, we have to rename every single file. We then have to distribute the new version of the program to every single user and make sure that they have this version before they start working again.
For a migration of this magnitude, I would have started on Friday evening of a long weekend. Prior to the migration day, I would have created a script to automate the actual file renaming. The first thing I would do on Friday would be to kick off that script. Who knows how long it would run for. While the script was running, I would then send out the new version of the application.
When users came in on Tuesday morning everything would be working. And, honestly, things always would be working on Tuesday morning. I never had a complete migration failure. I have spent many long days and nights working over a holiday weekend to ensure the success of large migrations, though. All things being equal, I'd rather not be working every waking hour of a three-day weekend.
My more better approach
Frankly, I'm getting too old for those high-stakes migrations. Nowadays, I apply the hard lessons I've learned over the years to avoid the all-or-nothing migration approach. Instead, I "burn my cardboard" one box at a time. Here's what that looks like in practice.
Rather than script the migration of all those files during dedicated downtime, we updated the application to rename the photos one parcel at a time. There are about 40,000 parcels with images, so each parcel has about ten on average.
There is one key that makes the whole thing work. The system needs to be able to retrieve images regardless of whether they are using old style or new style filenames. This simple trick means that our script/deployment choreography is unnecessary. We can deploy our new front-end before renaming a single file.
We just finished this exact data migration project. By migrating slowly we were able to identify some bugs that we had not even considered. Since only a few images had been renamed, the bug caused only a couple of problems. If that same bug had been included after the photo rename, the results would have been disastrous.
Another advantage is that we could target specific types of records when rolling out our migration. For instance, we started by moving photos for the inactive parcels only.
Once we worked out the bugs in that part of the migration, we began targeting parcels with no dwellings. We continued our methodical approach until we finished renaming every file.
Slow and steady
Like the proverbial tortoise, my measured approach took much longer than my harebrained original idea. But the reduction in risk (and stress!) was well worth the cost of a slightly longer migration.