Choosing the Proper Lifetime of Variables in VBA
Some values should be calculated every time you need them. Others should be stored for performance reasons. But how do you decide which way to go?
For more information about the "Lifetime (a.k.a., 'Extent') of Variables," check out my earlier article: Scope vs. Extent in VBA.
There are four primary considerations:
- How long it takes to calculate or retrieve the value
- How often the value will be referenced
- How often the calculated or underlying value changes
- How many values need to be stored
Deciding the Proper Extent/Lifetime
Based on the above considerations, I created a list of questions you can ask yourself when trying to decide how long to store a value in memory:
- Does the value take a long time to calculate or retrieve?
- Will the value be referenced regularly?
- Will the value rarely change?
- Do only a few values need to be stored?
Each question you can answer YES to indicates that you should favor a longer extent.
If you answer YES to all four questions, consider Project or Module extent.
If you answer NO to all four questions, use Procedure extent.
For these examples, imagine an application that provides time tracking and payroll for a service business.
- The business charges by the hour, but charges different hourly rates depending on the client and/or project.
- It has dozens of employees–some paid hourly and others paid an annual salary.
- Each employee generates 5-10 hourly detail records each day.
Complex Subquery Returning Historical Values
Let's say you are calculating a net hourly profit for the entire business by year.
This is a complex calculation that has to take into account the hourly rate charged for each detail record, compare it to the effective hourly cost of each employee, summarize all the records, subtract the total cost from the total revenue and divide by the total number of hours. My math might not be quite right there, but that's not the point. The point is that you have a calculation that:
- Could easily take a few seconds per year to calculate
- Will only generate one value per year
- Might be called several times during the life of the program as it is one of the business's KPIs
Let's run through our heuristic above:
- Long time to calculate? YES
- Referenced regularly? YES
- Changes rarely? YES
- Generates few values? YES
Based on these answers, it makes sense to store the result of this calculation for the lifetime of the program. You could do this using a variable defined in the header of a standard code module, a local procedure variable declared as
Static, or in the TempVars object.
Hours Logged by a Single Employee Today
Let's say you want to know how many hours EmployeeID 21 logged so far today.
This is a simple SUM query involving only one table. Assuming we have an index on the EmployeeID column, the query should be very efficient.
- Should run in a fraction of a second
- The value will change many times throughout the day
- Each employee may return a different value
- The application has a dashboard that shows aggregate hours; employee-level detail is only available via drill-down
Let's run through our heuristic above:
- Long time to calculate? NO
- Referenced regularly? NO
- Changes rarely? NO
- Generates few values? NO
Based on the answers above, it makes sense to store the value of this variable for as short a time as possible. You would want to create a function that retrieved the value and returned it (without caching it for later).
Gross Revenue By Client and Month
Let's say one of our application's most used features is a report that shows the gross revenue by client and month. We need to join a few tables, restrict the records based on date, group them based on client, multiply the hourly rate by the number of hours, and then return the sum of that value.
- This is more complex than the last example, but not as complex as the net profit calculation from the first example
- The report might get opened several times if the user is running a month-end process; other times it might not be opened at all
- The values won't change for past months, but the report can be run for the current month to show month-to-date values
- The report will return at least one result for each client, but the total number would depend on how many months were included in the report
This is the Goldilocks example. Not too hot and not too cold. There's no clear correct answer here. We're missing too much context to know what would be best. Real-world usage would dictate whether some or all of these values should be cached to improve performance.
- Long time to calculate? SORT OF
- Referenced regularly? YES and NO
- Changes rarely? YES and NO
- Generates few values? DEPENDS
Based on the answers above, it makes sense to store the value of this variable for as short a time as possible, at least initially. That will always be the most reliable approach. Based on real-world usage, you can then choose to cache certain data if performance becomes an issue.
The Reliability - Performance Spectrum
Another way to think of program extents is in terms of reliability versus performance.
Shorter extents are more reliable because the calculations are always being done "just in time." Longer extents tend to improve performance because the total number of calculations performed is reduced over the lifetime of the program.
If you're now more confused than when you started, let me simplify things for you:
- Try to avoid declaring variables in the module header section
- Avoid the
Statickeyword inside functions
When should you worry about this stuff?
Honestly, not until AFTER you've learned how to create a solid relational design (i.e., your tables and fields), understand when/where/why to use indexes, and know how to debug and test the performance of your queries.
After all, 99% of performance problems in Access originate with how the database was designed and/or how the queries that access it have been written.
Cover image created with Microsoft Designer; photo by Paul Henri Degrande from Pixabay