Writing regular expressions is hard–especially when you are just getting started. And, while writing regular expressions is hard, reading them is nigh impossible.
It's worth the effort to learn them, though, because regular expressions are powerful. Their signal-to-noise ratio is off the charts. For the right kind of problem, a single regular expression can take the place of dozens of lines of standard VBA code.
Note: Regular expressions are not built in to the VBA language, nor are they a part of the standard VBA libraries. The easiest way to use regular expressions in VBA is via the VBScript regular expression library. I published some regex convenience functions in a previous article.
Every single character in a regular expression means something. And trying to keep track of what each thing means in your head is difficult, even when you've been using them for a while.
I use the website regex101.com whenever I have to write a regular expression.
When you are just getting started, the best part of regex101.com is the "Explanation" area on the right side of the screen. It breaks down each piece of your regular expression so that you know exactly what it is doing.
For this article, I'm going to use the following regular expression:
If you are not familiar with regular expressions, the pattern above probably looks like complete gibberish. There is no semantic highlighting, which makes it hard to group the characters logically.
If we type the same regular expression into the "Regular Expression" box in regex101.com, we instantly get a detailed explanation of what each character represents:
If you have not figured it out yet, the regular expression pattern above is used to identify the individual components of a date-time literal in VBA.
Let's try testing this regular expression. We will use the following Test String:
#3/23/2021 2:14 PM#
Here's what it looks like on regex101.com:
You will notice that the test string is highlighted with a variety of different background colors. Those colors correspond to matched groups in the regular expression. Regex101.com helpfully shows a "Match Information" pane on the right side of the window, under the detailed explanation I showed earlier:
Regular expressions have a lot of token characters that have special meaning. Keeping track of all of them can be difficult, especially if you don't write regexes every day. Luckily, regex101.com includes a handy cheat sheet in the bottom-right corner of the page:
So far, all we've done is check to see if our regular expression matches the test string. If you were using my
Regex() function, that's all you would need. But what if you wanted to replace a part of the test string? This is where regexes really start to show their power.
Let's say we wanted to take a VBA date-time literal and convert it into a T-SQL date literal with no time portion? We could use the pattern above and combine it with this substitution string:
This will replace the pound signs (
#) with single apostrophes (
') and change the date order from "m/d/yyyy" to "yyyy-m-d".
If we click on the "Substitution" option under the "FUNCTION" heading on the left side of the page, we can test our substitution string to make sure it works:
Translating to VBA
Here are the equivalent terms to translate from RegEx101.com to the arguments of my RegExReplace function:
- REGULAR EXPRESSION ->
- TEST STRING ->
- SUBSTITUTION ->
I won't get into how the feature works here, but regex101.com also allows you to create multiple unit tests. This lets you ensure that your regex is doing what you want in every situation, especially those tricky corner cases.
[UPDATED 2022-01-25]: Added proper image attribution following DMCA notification.