clsConcat: Blazing-Fast String Building Performance in VBA
You don't need to understand how to pre-allocate a memory buffer in VBA to take advantage of this blazing-fast string builder class.
Building strings the standard way in VBA is slow and inefficient:
Of course, in programming, slow is a relative term. And, in most of the code you write, you need not worry about how long it takes VBA to build strings. It's far more important that your code be correct and human-readable the first time you write it.
But, there will be some situations where your string-building code will be noticeably slow to a human observer. Typically, those are scenarios where you are performing lots of string concatenations in a tight loop or (at least) one of the strings you are combining is very large.
So, what do you do when you need a more efficient alternative to VBA's standard string concatenation?
Concatenating Strings "In Place"
The thing that makes standard VBA string concatenation slow is that string variables are not modified "in-place" within memory. With every change to a string variable's contents, the entire string is copied from one place in memory to another.
To quote from the above article:
The strings are read from two different memory locations and then copied to a third memory location, even if you are concatenating literal strings to a single variable.
The key to massively increasing the performance of string concatenation in VBA is to avoid these excessive memory operations. We could write such code ourselves...but we don't have to.
clsConcat: Adam Waller's String Builder Class
One of my favorite open source Access projects right now is Adam Waller's Version Control Add-in (msaccess-vcs-addin).
I was stepping through the code in the add-in recently, when I came across his clsConcat class module. The class module provides a much more efficient way to build strings in VBA than the standard approach.
Here's the header comment from the class module, which provides a pretty good intro to the class and its raison d'être:
' String concatenation (joining strings together) can have a significant
' performance impact when you are using the ampersand character to join
' strings together. While negligible in occasional use, if you start
' running tens of thousands of these in a loop, it can really bog
' down the processing due to the memory reallocations happening behind
' the scenes. In those cases it is better to use the Mid$() function to
' change an existing buffer to build the return string.
' Special thanks to Nir Sofer - http://www.nirsoft.net/vb/strclass.html
' and Chris Lucas - http://www.planetsourcecode.com/vb/scripts/ShowCode.asp?txtCodeId=37141&lngWId=1
' for their inspiration with these concepts.
Documenting clsConcat
Now we know why the class module exists, but we don't know how to use it.
The class module is nothing more than an implementation detail of the Version Control project, so it had no dedicated documentation of its own aside from the embedded code comments (which are quite good, by the way).
This being the year 2023, though, I realized I didn't need to waste any of my time trying to figure out how his code worked. I could just ask AI! This is the sort of thing that current LLM's are very good at, as I've written about in the past:
ChatGPT is better at reading code than writing it.
Claude vs. ChatGPT
Instead of using ChatGPT to document this class module, I used Claude-2-100k via poe.com.
Claude is a competing large language model (LLM) from Anthropic. One of the big advantages it has over ChatGPT is that it has a larger "context window", meaning you can feed it more text to analyze. In fact, it supports up to 100,000 tokens (hence the name) which is roughly equivalent to 75,000 words.
The implementation at poe.com lets you upload text files as part of the prompt process. What follows below is the conversation I had with Claude-2-100k about the clsConcat class module.
A Conversation with Claude-2-100k
[Mike]: Summarize the attached class module, clsConcat
Here is a summary of the clsConcat class module:
Overall Purpose:
Provides efficient string concatenation functionality by using a paged buffer approach rather than repeated ampersand joins.
Procedures:
Add - Adds one or more strings to the internal buffer.
Purpose: Efficiently append strings while avoiding repeated memory allocations.
Remove - Removes characters from the end of the internal buffer.
Purpose: Allow trimming characters off the end of the concatenated string.
GetStr - Returns the full concatenated string.
Purpose: Retrieve the final joined string from the internal buffer.
Return Type: String
MidStr - Returns a substring from a specified position.
Purpose: Retrieve part of the concatenated string.
Return Type: String
RightStr - Returns the rightmost characters from the string.
Purpose: Retrieve ending substring based on length.
Return Type: String
Length - Returns the current length of the internal string.
Purpose: Quickly check length without building the full string.
Return Type: Double
Clear - Resets the internal buffer pages.
Purpose: Start fresh with a cleared internal buffer.
SetPageSize - Sets the page size used for internal buffer.
Purpose: Customize page size if needed.
SelfTest - Runs test cases on the class methods.
Purpose: Validate class is working as expected.
[Mike]: Provide a sample VBA procedure that shows how to work with this class module by including a call to every method and property at least one time.
Here is an example VBA procedure that demonstrates using the clsConcat class module:
Sub TestConcat()
Dim myConcat As New clsConcat
' Set page size
myConcat.SetPageSize 2000, 10
' Add strings
myConcat.Add "Hello"
myConcat.Add "World!"
' Add more strings
myConcat.Add "This is a test.", "Here is another sentence.", "And one more."
' Check length
Debug.Print myConcat.Length
' Get full string
Debug.Print myConcat.GetStr
' Get substring
Debug.Print myConcat.MidStr(10, 5)
' Get ending
Debug.Print myConcat.RightStr(10)
' Remove characters
myConcat.Remove 10
' Clear buffer
myConcat.Clear
' Add more strings
myConcat.Add "New string"
' Get string again
Debug.Print myConcat.GetStr
End Sub
This shows creating an instance, setting the page size, adding strings, checking the length, getting the full string and substrings, removing characters, clearing the buffer, and adding new strings.
[Mike]: Tell me more about the SetPageSize method:
• Why would I use it?
• What's the effect of assigning a small page size versus a large page size?
• Do I need to use it every time or only if I need to tweak performance?
Here are some additional details on the SetPageSize method:
Why use it:
- To customize the page size used for the internal string buffer. The default is 4096.
Effect of page size:
- Smaller page size = more memory overhead for managing pages but less wasted space per page
- Larger page size = less overhead but potential for more unused space per page
When to use:
- Only need to call it if you want to override the default page size
- Typically used to optimize performance for a specific usage pattern
- Smaller pages are better for concatenating many small strings
- Larger pages reduce overhead if concatenating longer strings
So in summary:
- The default page size is usually good enough for most cases
- Only call SetPageSize if you need to optimize for a certain workflow
- Balance memory overhead vs potential wasted space per page
[Mike]: What about the lngNewInitialPages argument of SetPageSize?
• Why would I use it?
• What's the effect of assigning a small page count versus a large page count?
Good question! Here are some details on the lngNewInitialPages argument:
Why use it:
- To set the initial number of pages allocated for the buffer array. Default is 100.
Effect of initial page count:
- Smaller count = less memory allocated upfront but more reallocations needed if more pages used
- Larger count = more memory allocated initially but less reallocations later
So in summary:
- Controls the initial array size for the internal buffer pages
- Default of 100 is usually sufficient for most cases
- Can tweak to optimize memory usage if you know approx string size ahead of time
- Smaller count uses less memory initially but can cause more overhead from reallocations
- Larger count uses more upfront memory but reduces reallocation overhead
- Balance memory usage vs potential for frequent reallocations
So lngNewInitialPages allows further tuning of performance, but the default is typically adequate. Only tweak if optimizing for a specific use case.
Cover image created with Microsoft Designer