Standard String Concatenation in VBA is Slow and Inefficient
A deep dive into the inner workings of string concatenation in VBA and why appending a single character to a long string can be so expensive.
Standard string concatenation in VBA–where you combine two strings using the ampersand character–is inefficient.
The strings are read from two different memory locations and then copied to a third memory location, even if you are concatenating literal strings to a single variable.
The Proof is in the Pointers
To demonstrate the issue, we need to get into the weeds a bit. We will use two hidden and undocumented VBA functions to do this:
- VarPtr: returns the address to where a variable is stored in memory
- StrPtr: returns the address to where a string's contents are stored in memory
You can find these functions in VBA's _HiddenModule. Be sure to check the box to "Show Hidden Members" in the Object Browser:
How VBA Stores Strings
The first thing to understand is that a String variable is stored in a different memory location than the String variable's contents. In other words, a VBA String variable is a pointer to a pointer.
This extra level of indirection may seem odd, but there is a very good reason for it. By storing the contents of the string in a separate location from the string variable, the string variable takes up a fixed amount of memory. This means that string variables can be used in fixed memory data structures, such as VBA arrays.
One of the important performance characteristics of a VBA array is that the location of the n-th member of the array can be found by multiplying the index n by the memory width of a single item. (Incidentally, this is why arrays are indexed starting at 0 rather than 1; the first item begins at an offset of 0 from the start of the array.) Of course, this approach only works if each item occupies the same amount of space in memory.
The String Variable
Dim s As String
s = "Hello, world"
In the above code example, s
is the string variable, while "Hello, world"
are the string's contents. The variable s
is stored in a location in memory. VBA maintains a pointer to the location of this variable in memory. The pointer is a 32-bit or 64-bit number, depending on whether VBA is running in a 32-bit or 64-bit development environment.
We can get the value of that pointer–the "variable pointer"–using VarPtr: Debug.Print VarPtr(s)
.
What a String variable pointer points to is the first bit of a 32-bit or 64-bit block of memory (again, depending on the VBA environment). The value in that 32-bit or 64-bit block of memory is another pointer–known as the "string pointer"–that points to the first character of the string's contents.
Note that for an unitialized String variable, the value of that 32-bit or 64-bit block of memory is zero. In other words, it points nowhere. This is the so-called vbNullString
, which I wrote about recently:
The String Pointer
You may recall that you can efficiently check the length of a string without reading the whole string. That's because the string's length is stored in a 4-byte (32-bit) length prefix.
Here's how a literal VBA string is stored in memory:
- 4-byte length prefix: stores the length of the string
- Variable number of two-byte Unicode characters: the string's actual value
- 2-byte NULL termination character: two 0 bytes (
vbNullChar
)
The string pointer points to the first Unicode character of the string's actual value. To get the string's length, you can offset backwards 32 bits from the string pointer and read those 32 bits. The Len()
and LenB()
functions do this for you in VBA.
We can get the value of this pointer–the "string pointer"–by using StrPtr: Debug.Print StrPtr(s)
.
By the way, Len()
[string length] is simply LenB()
[byte length] divided by two because VBA strings comprise 2-byte Unicode characters. That's why it's ever so slightly faster to check for an empty string by using LenB(s) = 0
rather than Len(s) = 0
; you save VBA from having to perform the divide by two operation.
Sample Code for the Skeptics
This all sounds nice, and maybe even plausible, but how do you know I'm not just making this stuff up? It's a healthy question to ask, especially since anyone can write anything on the internet. (It's true!)
Here's some sample code you can run in your own VBA environment to prove to yourself this works the way I've described:
Sub TestStringCopy()
Debug.Print "Value of `s`", "VarPtr(s)", "StrPtr(s)", "VarPtr(""A|B|C"")"; " StrPtr(""A|B|C"")"
Debug.Print String(9, "-"), String(9, "-"), String(9, "-"), String(9, "-"), String(9, "-")
Dim s As String
Debug.Print s, VarPtr(s), StrPtr(s)
s = ""
Debug.Print s, VarPtr(s), StrPtr(s), VarPtr(""), StrPtr("")
s = s & "A"
Debug.Print s, VarPtr(s), StrPtr(s), VarPtr("A"), StrPtr("A")
s = s & "B"
Debug.Print s, VarPtr(s), StrPtr(s), VarPtr("B"), StrPtr("B")
s = s & "C"
Debug.Print s, VarPtr(s), StrPtr(s), VarPtr("C"), StrPtr("C")
End Sub
Sample Output
Explanation
The first two lines of the procedure simply print the header rows for our output.
Next, we print the unitialized value of s
, the variable pointer to s
, and the string pointer of s
. Notice that the string pointer is zero, indicating that an unitialized string in VBA is set to vbNullString
. This makes sense as it prevents VBA from having to reserve a second area in memory for no reason.
After printing these initial values, we set our string variable to an empty string literal, s = ""
. We again print the variable pointer to s
, which has not changed. We also print the string pointer of s
. Notice this time that instead of zero, it points to a reserved memory address even though there is nothing being stored there (besides a 4-byte string length of zero and a two-byte null terminator, vbNullChar
).
For the remainder of the procedure, we concatenate the contents of s
with the literal strings A
, B
, and C
.
Notice that the variable pointer to s
never changes (26208928
). However, the string pointer changes every time. That's because the contents of s
keep getting moved around in memory with each concatenation.
Interestingly, the variable pointers for the literal strings never change either (26208916
). That's partly because we're only accessing one literal string per code statement. VBA apparently creates temporary variables for literal values, because if we run something like Debug.Print VarPtr("X"), VarPtr("Y"), VarPtr("Z"), VarPtr("X")
we get four values (the variable pointer for "X" does not get reused, even though it's the same literal value):
Step-by-Step Breakdown
When we run the code s = s & "B"
, the following memory operations take place (for readability, I'll use only the last 3 digits of the memory addresses):
- VBA follows the variable pointer for
s
to928
- VBA follows the string pointer of
s
to764
- VBA follows the variable pointer for the literal string
"B"
to916
- VBA follows the string pointer of
"B"
to060
- VBA copies the strings from
764
and060
and "pastes" them to364
For those keeping score at home, that's four memory reads and one memory write for a simple concatenation of strings.
Performance Implications
The most important thing to note is that the string contents of s
were read completely from one location in memory (764
) and transferred to a different location in memory (364
) simply to append a single character.
This is no big deal when s
is only a few characters long. But what if s
is 20,000 characters long and we want to append a single character? We would need to read and write 20,001 characters in memory. As you can see, this inefficiency can cause serious performance problems when you perform string concatenation in a tight loop or when s
grows too big.
Dealing with this performance problem is a topic for another day.