Tables to Text: Do it for the DVCS!
Anything that can lead to a bug in your software belongs in version control. That includes local tables with design-time data.
In previous posts, I talked about the how and why of custom database properties. Now, let's dive into an actual example of a custom database property I use in all of my Microsoft Access projects.
As I've mentioned in the past, I'm a big believer in version control for Microsoft Access. Unfortunately, Access does not lend itself to easy version control because all of its code and user interface objects are saved in a single binary file. The key to effective version control with Access, then, is to get as much information as possible converted from that binary file format into text files that we can easily compare from one version to the next.
Local Table Data
When writing a Microsoft Access application, it is critical that you "split" your program into a front-end and back-end. The front-end contains the user interface and business logic. The back-end is for data storage. The back-end may consist of one or more Access databases or a client-server database like SQL Server.
While all persistent user data must be saved in the back-end, you can still take advantage of the fact that you can create and populate table data in our front-end.
One common reason to do this is performance. You can often speed up a complex report by first populating a local temporary table with the data on which your report is based. This allows you to run multiple queries to populate the table, rather than construct some monstrous query with layers upon layers of subqueries.
Design-Time Data
Another common reason to use local tables is to store "design-time" data that you will use in your user interface or within your business logic. By "design-time," I am referring to data that you define while designing your application and that does not change at run time when the user is using it.
The most common example that I use is a series of local tables that includes information for re-linking tables among different environments (production vs. development vs. testing) or among different clients. One table includes database connection information, such as database name, server name, etc. Another table includes a list of tables that get relinked when switching locations. The actual schema is a bit more involved, but that's the general idea.
The key point, though, is that this is the kind of data that is perfectly suited for storing in local tables of the front-end database file. This data is also a potential source of bugs. If I misspell the name of the production database server, then my application will raise an error when I try to relink tables to that environment.
This brings me to one of my core philosophies of version control:
Anything that can lead to a bug in your software belongs in version control.
Signal vs. noise
Since changes to design-time data stored in local tables may lead to bugs, the content of the local tables must be included in version control. However, not every local table in the front-end database contains design-time data. As I mentioned earlier, a common use for local tables is as temporary storage for (often) denormalized data.
Obviously, we do not want to store temporary data in version control. By its nature, temporary data will always be changing. As a corollary to my earlier statement, anything that cannot lead to a bug in your software should likely be kept out of version control.
The main reason is a concept known as signal vs. noise. In radio communications, a strong signal does you no good if it is overwhelmed by too much noise. So it is with version control. If too many of the changes in your code from one version to the next are inconsequential, then they will drown out all the important bits. This is where the hg_csv database property comes in.
The hg_csv
custom database property
Wanting to include anything that could lead to a bug in my version control system while minimizing anything that could not, I needed a way to export design-time data from my local tables without also exporting temporary data from different local tables. My solution was to store a list of those tables with design-time data in a custom database property named hg_csv
.
I then modified my decompose.vbs
script to look for this custom database property, parse it into individual table names, and export the data contained in each listed table to a text-based csv file.
Repeatability
When exporting data from a binary file for the purposes of source control, it is vitally important that the export be consistently repeatable. The technical term for this is idempotence. Given a list of tables with unchanging data, you should be able to reproduce an identical set of csv files every time you run your export.
The problem with this is that rows stored in a relational database table have no inherent order. They are stored in some physical order on the disk, but any number of operations, such as a repair and compact, could change that physical order. So, if you simply export the table without specifying an explicit sort order, Access will export the rows in some undefined order. And while the data itself may not change from one run to the next, the order of the export might.
For this reason, you want to dictate the sort order when exporting to text files. Since table data is inherently unordered, it does not matter what sorting you use in your export. But it must be repeatable.
For simplicity, I create an ORDER BY
clause that is nothing more than a listing of the table's columns, in the order they appear in the table. You don't need to worry about performance, because you're usually dealing with no more than a few dozen records in these kinds of tables.
Sample code
Here is an excerpt from my own decompose.vbs
script:
'db: an open DAO.Database object
'ExportPath: folder path (without trailing slash) where .csv files will be created
'oApp: an Access application instance
Public Sub ExportCSVs(db, ExportPath, oApp)
Const acExportDelim = 2
Dim Tbls, TblName, QDef, QDefName
On Error Resume Next
QDefName = "___TempCsvExportQDef___"
Tbls = Split(db.Properties("hg_csv"), ",")
If Err = 0 Then
For Each TblName In Tbls
WScript.Echo "CSV: " & Trim(TblName)
Dim ColumnNames
ColumnNames = ColNames(db, Trim(TblName), True)
Set QDef = db.CreateQueryDef(QDefName, "SELECT * FROM " & Trim(TblName) & " ORDER BY " & ColumnNames)
oApp.DoCmd.TransferText acExportDelim, , QDef.Name, ExportPath & "\" & Trim(TblName) & ".csv", True
db.QueryDefs.Delete QDef.Name
Set QDef = Nothing
If Err <> 0 Then
WScript.Echo " error " & Err & ": " & Err.Description & " {" & ColumnNames & "}"
Err.Clear
End If
Next
Else
Err.Clear
End If
For Each TblName In Array("MSysIMEXSpecs", "MSysIMEXColumns")
WScript.Echo "CSV: " & Trim(TblName)
Set QDef = db.CreateQueryDef(QDefName, "SELECT * FROM " & Trim(TblName) & " ORDER BY " & ColNames(db, Trim(TblName), True))
oApp.DoCmd.TransferText acExportDelim, , QDef.Name, ExportPath & "\" & Trim(TblName) & ".csv", True
db.QueryDefs.Delete QDef.Name
Set QDef = Nothing
Next
End Sub
Public Function ColNames(db, TblName, SortableOnly)
Dim i, td
On Error Resume Next
Set td = db.TableDefs(TblName)
If td Is Nothing Then Set td = db.QueryDefs(TblName)
ColNames = td.Fields(0).Name
For i = 1 To td.Fields.Count - 1
Dim isSortable
Select Case td.Fields(i).Type
Case 11, 12: isSortable = False ' Memo, OLE, and Hyperlink fields are not sortable
Case Else: isSortable = True
End Select
If isSortable Or Not SortableOnly Then
ColNames = ColNames & ", " & td.Fields(i).Name
End If
Next
End Function