Now you have two problems

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. --Jamie Zawinski

Now you have two problems

Years ago, Jamie Zawinski wrote the seminal thesis on regular expressions:

Some people, when confronted with a problem, think "I know, I'll use regular expressions."  Now they have two problems.

Regular expressions are kind of like recursion for most programmers.  

Mind-boggling at first.  Then a period of, "I sort of get it, but when would I ever actually use this?"  Followed by the first time you stumble upon an otherwise intractable problem where it offers an elegant solution.  After which, you hoist your shiny new hammer above your head, bellow to the heavens, "THERE CAN BE ONLY ONE!" and stride off in search of more nails (or anything that even vaguely resembles a nail).  Until finally, with a cascade of snapped screws and bent bolts in your wake, you place the now-rusting hammer in your toolbox where it belongs.  

Well, no matter where you are on your allegorical regular expression journey, I have a bit of help for you.

First, the absolute best resource I have found for writing, testing, and understanding regular expressions is the website regex101.com.  It uses color-coding to break down each step of any regular expression pattern you feed it.  You can test your regex against sample text.  You can even create multiple unit tests to help you tweak your regex to meet your exact needs.

Next, I offer up a few of my own VBA regex functions.  These are nothing more than wrappers around the VBScript regular expression library.

NOTE: The '>>> lines in the code below are Python-inspired doc tests.

'---------------------------------------------------------------------------------------
' Procedure : RegEx
' Author    : Mike Wolfe <mike@nolongerset.com>
' Date      : 9/1/2010
' Purpose   : Perform a regular expression search on a string and return the first match
'               or the null string if no matches are found.
' Usage     : If Len(RegEx("\d{1,2}[/-]\d{1,2}[/-]\d{2,4}", txt)) = 0 Then MsgBox "No date in " & txt
'           : TheDate = RegEx("\d{1,2}[/-]\d{1,2}[/-]\d{2,4}", txt)
'           : CUSIP = Regex("[A-Za-z0-9]{8}[0-9]",txt)
'---------------------------------------------------------------------------------------
'
Function RegEx(Pattern As String, TextToSearch As String, Optional IgnoreCase As Boolean = False) As String 
    Dim RE As Object, REMatches As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = False
        .Global = False
        .IgnoreCase = IgnoreCase
        .Pattern = Pattern
    End With
    
    Set REMatches = RE.Execute(TextToSearch)
    If REMatches.Count > 0 Then
        RegEx = REMatches(0)
    Else
        RegEx = vbNullString
    End If
End Function

'---------------------------------------------------------------------------------------
' Procedure : RegExReplace
' Author    : Mike Wolfe <mike@nolongerset.com>
' Date      : 11/4/2010
' Purpose   : Attempts to replace text in the TextToSearch with text and back references
'               from the ReplacePattern for any matches found using SearchPattern.
' Notes     - If no matches are found, TextToSearch is returned unaltered.  To get
'               specific info from a string, use RegExExtract instead.
'>>> RegExReplace("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My phone # is 570.555.1234.", "$1($2)$3-$4$5")
'My phone # is (570)555-1234.
'---------------------------------------------------------------------------------------
'
Function RegExReplace(SearchPattern As String, TextToSearch As String, ReplacePattern As String, _
                      Optional GlobalReplace As Boolean = True, _
                      Optional IgnoreCase As Boolean = False, _
                      Optional MultiLine As Boolean = False) As String
Dim RE As Object

    Set RE = CreateObject("vbscript.regexp")
    With RE
        .MultiLine = MultiLine
        .Global = GlobalReplace
        .IgnoreCase = IgnoreCase
        .Pattern = SearchPattern
    End With
    
    RegExReplace = RE.Replace(TextToSearch, ReplacePattern)
End Function

'---------------------------------------------------------------------------------------
' Procedure : RegExExtract
' Author    : Mike Wolfe <mike@nolongerset.com>
' Date      : 11/4/2010
' Purpose   : Extracts specific information from a string.  Returns empty string if not found.
'>>> RegExExtract("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My phone # is 570.555.1234.", "$2$3$4")
' 5705551234
'>>> RegExExtract("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My name is Mike.", "$2$3$4")
'
'>>> RegExReplace("(.*)(\d{3})[\)\s.-](\d{3})[\s.-](\d{4})(.*)", "My name is Mike.", "$2$3$4")
' My name is Mike.
'---------------------------------------------------------------------------------------
'
Function RegExExtract(SearchPattern As String, TextToSearch As String, PatternToExtract As String, _
                      Optional GlobalReplace As Boolean = True, _
                      Optional IgnoreCase As Boolean = False, _
                      Optional MultiLine As Boolean = False) As String
Dim MatchFound As Boolean

    MatchFound = Len(RegEx(SearchPattern, TextToSearch, IgnoreCase)) > 0
    If MatchFound Then
        RegExExtract = RegExReplace(SearchPattern, TextToSearch, PatternToExtract, _
                                    GlobalReplace, IgnoreCase, MultiLine)
    Else
        RegExExtract = vbNullString
    End If
End Function

Image by Andy Gries from Pixabay

Comments

Sign in or become a No Longer Set member to join the conversation.
Just enter your email below to get a log in link. (This will also subscribe you to my weekly newsletter.)