Version Control

Tables to Text: Do it for the DVCS!

Anything that can lead to a bug in your software belongs in version control. That includes local tables with design-time data.

Mike Wolfe

Sep 13, 2020 • 5 min read

In previous posts, I talked about the how and why of custom database properties. Now, let's dive into an actual example of a custom database property I use in all of my Microsoft Access projects.

As I've mentioned in the past, I'm a big believer in version control for Microsoft Access. Unfortunately, Access does not lend itself to easy version control because all of its code and user interface objects are saved in a single binary file. The key to effective version control with Access, then, is to get as much information as possible converted from that binary file format into text files that we can easily compare from one version to the next.

Local Table Data

When writing a Microsoft Access application, it is critical that you "split" your program into a front-end and back-end. The front-end contains the user interface and business logic. The back-end is for data storage. The back-end may consist of one or more Access databases or a client-server database like SQL Server.

While all persistent user data must be saved in the back-end, you can still take advantage of the fact that you can create and populate table data in our front-end.

One common reason to do this is performance. You can often speed up a complex report by first populating a local temporary table with the data on which your report is based. This allows you to run multiple queries to populate the table, rather than construct some monstrous query with layers upon layers of subqueries.

Design-Time Data

Another common reason to use local tables is to store "design-time" data that you will use in your user interface or within your business logic. By "design-time," I am referring to data that you define while designing your application and that does not change at run time when the user is using it.

The most common example that I use is a series of local tables that includes information for re-linking tables among different environments (production vs. development vs. testing) or among different clients. One table includes database connection information, such as database name, server name, etc. Another table includes a list of tables that get relinked when switching locations. The actual schema is a bit more involved, but that's the general idea.

The key point, though, is that this is the kind of data that is perfectly suited for storing in local tables of the front-end database file. This data is also a potential source of bugs. If I misspell the name of the production database server, then my application will raise an error when I try to relink tables to that environment.

This brings me to one of my core philosophies of version control:

Anything that can lead to a bug in your software belongs in version control.

Signal vs. noise

Since changes to design-time data stored in local tables may lead to bugs, the content of the local tables must be included in version control. However, not every local table in the front-end database contains design-time data. As I mentioned earlier, a common use for local tables is as temporary storage for (often) denormalized data.

Obviously, we do not want to store temporary data in version control. By its nature, temporary data will always be changing. As a corollary to my earlier statement, anything that cannot lead to a bug in your software should likely be kept out of version control.

The main reason is a concept known as signal vs. noise. In radio communications, a strong signal does you no good if it is overwhelmed by too much noise. So it is with version control. If too many of the changes in your code from one version to the next are inconsequential, then they will drown out all the important bits. This is where the hg_csv database property comes in.

The `hg_csv` custom database property

Wanting to include anything that could lead to a bug in my version control system while minimizing anything that could not, I needed a way to export design-time data from my local tables without also exporting temporary data from different local tables. My solution was to store a list of those tables with design-time data in a custom database property named hg_csv.

I then modified my decompose.vbs script to look for this custom database property, parse it into individual table names, and export the data contained in each listed table to a text-based csv file.

Repeatability

When exporting data from a binary file for the purposes of source control, it is vitally important that the export be consistently repeatable. The technical term for this is idempotence. Given a list of tables with unchanging data, you should be able to reproduce an identical set of csv files every time you run your export.

The problem with this is that rows stored in a relational database table have no inherent order. They are stored in some physical order on the disk, but any number of operations, such as a repair and compact, could change that physical order. So, if you simply export the table without specifying an explicit sort order, Access will export the rows in some undefined order. And while the data itself may not change from one run to the next, the order of the export might.

For this reason, you want to dictate the sort order when exporting to text files. Since table data is inherently unordered, it does not matter what sorting you use in your export. But it must be repeatable.

For simplicity, I create an ORDER BY clause that is nothing more than a listing of the table's columns, in the order they appear in the table. You don't need to worry about performance, because you're usually dealing with no more than a few dozen records in these kinds of tables.

Sample code

Here is an excerpt from my own decompose.vbs script:

'db: an open DAO.Database object
'ExportPath: folder path (without trailing slash) where .csv files will be created
'oApp: an Access application instance 
Public Sub ExportCSVs(db, ExportPath, oApp)
Const acExportDelim = 2
Dim Tbls, TblName, QDef, QDefName
    On Error Resume Next
    
    QDefName = "___TempCsvExportQDef___"
    Tbls = Split(db.Properties("hg_csv"), ",")
    If Err = 0 Then
        For Each TblName In Tbls
            WScript.Echo "CSV:   " & Trim(TblName)
            Dim ColumnNames
            ColumnNames = ColNames(db, Trim(TblName), True)
            Set QDef = db.CreateQueryDef(QDefName, "SELECT * FROM " & Trim(TblName) & " ORDER BY " & ColumnNames)
            oApp.DoCmd.TransferText acExportDelim, , QDef.Name, ExportPath & "\" & Trim(TblName) & ".csv", True
            db.QueryDefs.Delete QDef.Name
            Set QDef = Nothing
            If Err <> 0 Then
                WScript.Echo "  error " & Err & ": " & Err.Description & " {" & ColumnNames & "}"
                Err.Clear
            End If
        Next
    Else
        Err.Clear
    End If
    For Each TblName In Array("MSysIMEXSpecs", "MSysIMEXColumns")
        WScript.Echo "CSV:   " & Trim(TblName)
        Set QDef = db.CreateQueryDef(QDefName, "SELECT * FROM " & Trim(TblName) & " ORDER BY " & ColNames(db, Trim(TblName), True))
        oApp.DoCmd.TransferText acExportDelim, , QDef.Name, ExportPath & "\" & Trim(TblName) & ".csv", True
        db.QueryDefs.Delete QDef.Name
        Set QDef = Nothing
    Next
End Sub

Public Function ColNames(db, TblName, SortableOnly)
    Dim i, td
    On Error Resume Next
    Set td = db.TableDefs(TblName)
    If td Is Nothing Then Set td = db.QueryDefs(TblName)
    ColNames = td.Fields(0).Name
    For i = 1 To td.Fields.Count - 1
        Dim isSortable
        Select Case td.Fields(i).Type
        Case 11, 12: isSortable = False  ' Memo, OLE, and Hyperlink fields are not sortable
        Case Else: isSortable = True
        End Select
        
        If isSortable Or Not SortableOnly Then
            ColNames = ColNames & ", " & td.Fields(i).Name
        End If
    Next
End Function

Image by Peter H from Pixabay (tables...get it?)