3 Gotchas of the SQL BETWEEN Statement
Let's talk about the SQL BETWEEN statement.
The BETWEEN statement is used when you want to filter the results of a query between a minimum and a maximum value. Seems easy enough. But, if you're not careful, you can get yourself into trouble with this statement.
Let's dive in.
1. BETWEEN is Inclusive
The first gotcha is a result of the ambiguity of the plain English meaning of the word "between."
Sometimes, the context implies that the two ends are included in the group we are describing. For example, if a business is open "between Monday and Friday," the general understanding is that it is open 5 days per week.
However, in other contexts, the use of the word "between" implies that the two ends are not included in the group. For example, if you are taking a driver's test and the instructor tells you to park "between the lines," don't be surprised if you fail for parking on one of the lines.
In the context of numbers, though, we almost always expect that the use of the word between is inclusive. In other words, if someone asks you to pick a number "between one and five," it's implied that you have five options and not three.
So, for an integer field, the following condition will evaluate to true if MyValue
is 1, 2, 3, 4, or 5:
WHERE MyValue BETWEEN 1 AND 5
This is a pretty basic concept in SQL, so this gotcha mainly impacts new users.
2. Be Careful with DateTime Fields
The next gotcha happens when the developer forgets that DateTime fields include a time component.
Let's say you are running a report on records that were created in the month of November 2021. You might use the following condition:
WHERE RecordCreated BETWEEN #11/1/2021# AND #11/30/2021#
Whether the above condition does what you expect depends entirely on the contents of the data in the RecordCreated field.
If the default value of the RecordCreated field is =Date()
then you are fine.
If the default value of the RecordCreated field is =Now()
then you are in trouble.
Every record created from 12:00:01 AM on 11/30/2021 through 11:59:59 PM will be excluded from your report. This is almost certainly not what you want.
Make wrong code look wrong
One thing you can do to reduce the likelihood of these kinds of errors is to use a consistent naming convention that makes it clear when a DateTime field is expected to contain a time component.
We would name a field with a default value of =Date()
, "CreatedOn."
We would name a field with a default value of =Now()
, "CreatedAt."
This would make it more obvious when we were doing something that did not make sense:
This is a mistake that pretty much every database developer has to learn on their own the hard way the first time. As such, this gotcha mainly affects intermediate users.
For Time Fields, Use Greater / Less Than
The most reliable way to write a query condition that depends on DateTime fields that include a time component in their data is to use greater than and less than signs instead.
Notice that we use greater than or equal to on the lower end of the range so that we include midnight on 11/1/2021.
Notice on the upper end that we use less than (without the equal sign) because we want everything up to (but not including!) the moment that the clock strikes midnight and the calendar rolls over into December.
3. Different Behavior Among SQL Dialects
Jet/ACE allows the first value to be larger than the second. In other words, this is valid Jet/ACE SQL:
T-SQL will not return any records under these same conditions. That's because it is replacing the BETWEEN with greater than / less than signs behind the scenes. So, the above condition in T-SQL is translated internally as:
This translation is made clear by the official documentation:
BETWEEN returns TRUE if the value of test_expression is greater than or equal to the value of begin_expression and less than or equal to the value of end_expression.
NOT BETWEEN returns TRUE if the value of test_expression is less than the value of begin_expression or greater than the value of end_expression.
SQL Server Linked Tables in Access
So, how are SQL Server linked tables treated in Access?
Actually, it depends on whether the query is evaluated using the Jet/ACE engine or the SQL Server engine.
Consider a simple five-row SQL Server table. If we filter it using the expression MyTableID BETWEEN 2 AND 4
, then the query returns three values (2, 3, and 4). But, if we switch the order of the values in the between statement, MyTableID BETWEEN 4 AND 2
, it returns no records:
Now, let's see how the same queries perform if we link to the table from Access:
For Query1 and Query2, the WHERE condition is applied using the rules of the SQL Server engine because that is where it is being evaluated.
However, notice that in Query3, we have forced the query to be evaluated using the Jet/ACE engine. How did we do that? By applying a VBA function–in this case Round
–to the field value on the left side of the BETWEEN statement, we are forcing Access to transfer the entire contents of the table from SQL Server so that the VBA function can be applied before the WHERE condition gets evaluated.
At that point, the Jet/ACE rules apply. This means that we can swap the high and low values and the BETWEEN statement will still output three records.
Note that if we apply the Round() function to the literal values 4 and 2 that it would not be enough to force the query to be evaluated within the Jet/ACE engine.
I want to emphasize that I am not advocating that you use the technique in Query3 to get the Jet/ACE BETWEEN behavior. That approach will be horribly inefficient, especially if you are dealing with a large production table.
This is a gotcha that could easily be missed by the most advanced Access developers. In fact, until about two days ago, I was not aware this difference in behavior between Jet/ACE and SQL Server even existed. (Thanks for the heads up, Gustav!)
This is an important gotcha to be aware of, especially if you are migrating your project from an Access backend to a SQL Server backend.
Why This Matters in Two Words: Logic Errors
These probably seem like simple little errors that are easy to fix.
And, in truth, they are easy to fix...IF YOU KNOW THAT THEY EXIST AS ERRORS. The problem is that all the "bad code" samples from above are logic errors.
And there is a reason that I list logic errors as the most expensive types of errors:
- Syntax errors
- Compile errors
- Misunderstood requirements (before you start writing code)
- Automated test errors (i.e., failing tests)
- Runtime errors
- Misunderstood requirements (after you've written the code)
- Logic errors
Why are logic errors so expensive? It's because they can go undetected for so long.
Take this sample from above:
WHERE CreatedAt BETWEEN #11/1/2021# AND #11/30/2021#
How long do you suppose your users could run a monthly change report before they realized that they were always missing the final day of the month? What if it's a summary report and the user is only checking total record counts? That sort of error could go undetected for months or even years.
So, take these gotchas seriously. If you get this stuff wrong at the beginning, you might never realize your mistake.