Hiding Duplicate Values in Continuous Forms via Left Joins

In this installment, we use Bill Karwin's inequality LEFT JOIN technique to handle multi-column sorting without grinding everything to a halt.

Hiding Duplicate Values in Continuous Forms via Left Joins

This article is Part 3 of my "Hiding Duplicate Values" Mini-Series.

To follow along with the screenshots and examples, download the sample data from here: Sample Data for the "Hiding Duplicate Values" Series.


Starting Point

Unlike reports, controls on Access forms do not have a "Hide Duplicates" property.

The simplest way to hide duplicate values on a continuous form is by way of the underlying record source.  We'll start by creating a record source that includes all the columns and tables we need, before looking at a couple of ways to hide the individual duplicate values.

Basic Query

As in Part 2, we'll start with a basic query.  

Unlike Part 2, though, we will sort the results on user-facing fields, rather than autonumber primary keys.  This presents a different kind of challenge, as we can't simply apply the MIN() function to the InvItemID field to get the value we need.

We will sort the data using the following sort order:

  • Customer.FirstName
  • Customer.LastName
  • Customer.CustomerID
  • Invoice.InvoiceDate DESC
  • Invoice.InvoiceID DESC
  • InvItem.Description
  • InvItem.InvItemID

SIDE NOTE: I always like to include a unique field when sorting so that sorted results are reproducible.  For example, if a customer has two invoices with the same date, there's no guarantee those invoices would appear in the same order each time we ran the query if we only sorted by the InvoiceDate field.

If you compare the above results to Part 2's article, you will see that the data appears in a different order.  Notably, the order above is one that we can explain to the users because the data used for sorting is both visible and meaningful to the user (e.g., customer name vs. customer ID).

As before, our goal is to generate a query with the same columns, but where the duplicate values are hidden.  Here's the above query with the values we want to hide shown in gray:

Method 2: LEFT JOIN Queries

In Part 2, we were able to get away with a single level of GROUP BY subqueries because we were (implicitly) sorting our records on the primary key autonumber fields of our tables.

In that situation, you just need to return the minimum value of the primary key field of your detail table, InvItemID in this case. Sorting by user-facing fields adds complexity to the challenge of identifying the "header" values for the parent tables, but it's also a more realistic scenario.

You can still rise to this challenge using the GROUP BY subqueries approach we covered in Part 2, but you end up needing to nest another level of subqueries to do it.  In practice, this is likely to be a dog performance-wise on any reasonably sized data set.

As an alternative, we will use Bill Karwin's LEFT JOIN technique to fetch the max rows without incurring the same performance hit that we would get with GROUP BY subqueries.

Creating the Left Join Queries

The first step is to create one or more LEFT JOIN queries that we will later join to our Starting Point query to build the final record source for our form.

I wrote about Bill Karwin's technique here:

Efficient Querying: Finding Maximum Values for Each Distinct Value in a Column
A high-performance alternative to subqueries for solving a tricky query problem.

It's a bit hard to wrap your head around it, so I built a ChatGPT prompt to make it easier to create the counterintuitive LEFT JOINs here:

Get Max Row by Group with ChatGPT
This ChatGPT prompt lets you easily implement Bill Karwin’s clever LEFT JOIN solution to a common data selection problem in SQL.

Goal of the Left Join Queries

The goal with creating these queries is to return the InvItemID that corresponds with each row that we want to preserve for our group headings.  

If we add the InvItemID field to our existing StartingPoint query from above, it will be easier to visualize:

The InvItemID's in yellow represent the records where we want to show both the customer name and the invoice dates (5011, 5006, 5003, 5002).  

The InvItemID in magenta represents the one record where we want to show the invoice dates but hide the customer name (5008).

TopItemByInvoice SQL

Here's the SQL for the TopItemByInvoice query.  

Since it uses a non-equality join, it cannot be visualized with Access's QBE window:

SELECT t1.InvoiceID, t1.InvItemID
FROM InvItem AS t1 LEFT JOIN InvItem AS t2 ON t1.InvoiceID = t2.InvoiceID AND 
    (t1.Description > t2.Description OR 
     (t1.Description = t2.Description AND t1.InvItemID > t2.InvItemID))
WHERE t2.InvoiceID IS NULL;

I passed the following data to my ChatGPT prompt to build this SQL:

  • Table Name: InvItem
  • Column(s) to Return: InvoiceID, InvItemID
  • Column(s) to Group By: InvoiceID
  • Column(s) to Sort By: Description, InvItemID

NOTE: ChatGPT will frequently get the direction of the inequality operator backwards when building the LEFT JOIN SQL (i.e., > instead of < or vice versa).  Be sure to test your query to make sure it's identifying the correct top record.

TopInvoiceByCustomer SQL

Ultimately, we need to return the top InvItemID value for each customer.  However, we can't get at that value without first going through the Customer table.  Thus, we start there by identifying the top InvoiceID based on our sorting criteria from above, namely:

  • InvoiceDate DESC
  • InvoiceID DESC
SELECT i1.CustomerID, i1.InvoiceID
FROM Invoice AS i1 LEFT JOIN Invoice AS i2 ON i1.CustomerID = i2.CustomerID AND 
   (i1.InvoiceDate < i2.InvoiceDate OR 
   (i1.InvoiceDate = i2.InvoiceDate AND i1.InvoiceID < i2.InvoiceID))
WHERE i2.CustomerID IS NULL;

I passed the following data to my ChatGPT prompt to build this SQL:

  • Table Name: Invoice
  • Column(s) to Return: CustomerID, InvoiceID
  • Column(s) to Group By: CustomerID
  • Column(s) to Sort By: InvoiceDate DESC, InvoiceID DESC

TopItemByCustomer SQL

Now we can get to the top InvItemID for each customer, using the intermediate query we defined above.  This query is just a simple INNER JOIN, so we can use the QBE window:

Creating the Form Record Source

Now that we have our underlying queries, we can build the form record source.

First, we start by creating a copy of the StartingPointPart3 query and rename it ViaLeftJoins.

Next, we add the TopItemByCustomer and TopItemByInvoice queries to the existing query via LEFT JOINs on the InvItem.InvItemID field.  (This is very similar to this part of the process as outlined in Part 2.)

The next step is to replace the definitions of the four group header columns with IIf() expressions that test the InvItemID fields of the two TopItem queries to determine whether to output the values for a particular row.

Here are the new field definitions:

  • FirstName: IIf([TopItemByCustomer].[InvItemID] Is Null,Null,[Customer].[FirstName])
  • LastName: IIf([TopItemByCustomer].[InvItemID] Is Null,Null,[Customer].[LastName])
  • InvoiceDate: IIf([TopItemByInvoice].[InvItemID] Is Null,Null,[Invoice].[InvoiceDate])
  • DueDate: IIf([TopItemByInvoice].[InvItemID] Is Null,Null,[Invoice].[DueDate])

We can't sort by these updated column expressions any more (since most of the values are NULL), so we move the sorting to the end of the QBE window by including all the sorting columns with their "Show" checkboxes unchecked:

Here's the full SQL statement:

SELECT IIf([TopItemByCustomer].[InvItemID] Is Null,Null,[Customer].[FirstName]) AS FirstName, 
       IIf([TopItemByCustomer].[InvItemID] Is Null,Null,[Customer].[LastName]) AS LastName,
       IIf([TopItemByInvoice].[InvItemID] Is Null,Null,[Invoice].[InvoiceDate]) AS InvoiceDate, 
       IIf([TopItemByInvoice].[InvItemID] Is Null,Null,[Invoice].[DueDate]) AS DueDate, 
       InvItem.Description, 
       InvItem.Quantity, 
       InvItem.UnitPrice
FROM Customer 
  INNER JOIN (Invoice 
    INNER JOIN ((InvItem 
      LEFT JOIN TopItemByInvoice ON InvItem.InvItemID = TopItemByInvoice.InvItemID) 
      LEFT JOIN TopItemByCustomer ON InvItem.InvItemID = TopItemByCustomer.InvItemID) 
    ON Invoice.InvoiceID = InvItem.InvoiceID) 
  ON Customer.CustomerID = Invoice.CustomerID
ORDER BY Customer.FirstName, 
         Customer.LastName, 
         Customer.CustomerID, 
         Invoice.InvoiceDate DESC , 
         Invoice.InvoiceID DESC , 
         InvItem.Description, 
         InvItem.InvItemID;

This is the final query that we can use as a RecordSource for our continuous form:

All original code samples by Mike Wolfe are licensed under CC BY 4.0