Advanced Data Generator 4: First Look

Looking to generate lots of test data in your backend Access databases? Advanced Data Generator 4 may be exactly what you are looking for.

Advanced Data Generator 4: First Look

The Advanced Data Generator (ADG) is a great tool to add to your Microsoft Access toolbox.

I finally got a chance to test this out for myself after watching the 15-minute presentation on the tool during Access DevCon Vienna.  Please keep in mind that my observations in this article are from spending just over an hour working in the tool.  There may be more efficient ways to do things that I did not discover in this short time.

Here are my initial thoughts working with Advanced Data Generator 4.

Licensing Options

I downloaded the trial for the premium version that includes ADO/ODBC access.

This version of the program lets you connect to any data source that supports ADO or ODBC, including SQL Server, Microsoft Access, MySQL, PostgreSQL, SQLite, etc.  If you only need to connect to one kind of data source, there are lower-priced database engine-specific versions of the program available.  The limited versions currently sell for $115 (€99), while the premium version sells for $250 (€219).

Since you can evaluate the pro version for a full 30 days before committing to purchasing it, I think that's the way to go for most professional Access developers. Let's say you work exclusively with Access and SQL Server databases.  To buy those two versions of ADG will set you back $230.  For an extra $20, you have the flexibility to support ANY future type of database.  That's a no-brainer in my book.

Getting Started

I restored a backup copy of a SQL Server database I had lying around.

I connected to the database from within ADG, then started a new "Data Project." I found the process of generating sample data to be both powerful and straightforward.  My biggest complaint is that it was just so tedious.

You have to go table by table and field by field to set the data generation options.  If this was a greenfield project and I was doing this as I built the database schema, I don't think this would be too bad.  In fact, it might help clarify my thinking about how I was putting everything together.

On an existing database, though, it just took a lot of time.

Amazing Power and Flexibility

I'm convinced you can generate almost any sort of data that you want using this tool.

It comes preloaded with a bunch of data libraries, such as Belgian Street Names, French First Names, Spanish City Names, and Italian Family Names.  There is a "macro" feature that allows you to build custom data by combining information from these libraries.  

So, for instance, you could generate band names by combining a random digit plus an American Male First Name plus a Spanish city:

"Hey, who wants to go with me to the 7 Waylons from CARCABOSO concert this Friday night?"

While it's more work, you can also point the data generator to any number of different custom data sources that you provide, like a database table or CSV.  It can pull data from that source randomly or sequentially; you can also have it avoid pulling duplicates.  When you factor in this ability, you have total control over what data gets generated.

My Feature Requests

These feature requests all revolve around a particular use case.

While ADG is great for generating data in a greenfield project, setting up data generation when you already have production data is very labor-intensive.  I would love to see ADG improve its support for this situation.  

I have a few ideas for how to improve the situation.

"Anonymizer" Mode

With this feature, you would start with a fully populated database (such as a restored production backup) and overwrite certain key fields with random data.  For example, you could replace the name, social security number, and date of birth fields in an employee table with randomly generated data, while leaving the other non-personal information intact.  

The current process is overly tedious, as it requires you to generate sensible data for every single required field in the database.  If, instead, you only had to anonymize sensitive data (the user would decide what data is sensitive), then the process would be much shorter.

I believe this would be a relatively easy feature to implement.

Existing Data Analysis

This feature would auto-suggest certain kinds of random data based on an analysis of a fully populated database.

Whereas "Anonymizer" Mode could leave behind enough hints from unchanged fields to reveal the original identity of a record, this approach would still allow generating a full set of data from scratch.  The existing data would only be used to initially populate the random data generation settings.

This would not be as easy a feature to implement as "Anonymizer" Mode, but it would be exceptionally powerful.

Final Thoughts and Recommendations

The Advanced Data Generator is a mighty fine tool.

And, when you compare its cost to the equivalent tool from SQL Server tool-titan Red-Gate, it's a great value at about $112 (€99) for the SQL Server-only version compared to $405 for Red-Gate's SQL Data Generator.  

I would highly recommend it for new projects.  Its ability to generate loads of sensible records makes it easy to test whether your blazing fast application will slow to a crawl once it's facing real-world row counts.

Unfortunately, there is a lot of tedious work involved in setting up the field generation on a full-fledged production database.  Expect to spend a minimum of 30 seconds setting up each field (that's assuming you know exactly what kind of data belongs in the field).  One minute per field is probably a more accurate estimate, especially when you factor in the time you will spend verifying what sort of data belongs in each field.  There will also be some upfront work to create custom field templates so that you're not wasting time manually creating the same field generation settings for similar fields.

Assuming a database of 500 fields (50 tables times 10 fields per table), you should expect to spend at least 8 hours setting up a full data generation (500 fields times 1 minute per field divided by 60 minutes).  Throw in troubleshooting, data validation, and testing and the full length of the project is probably more like 2 to 3 working days.  

Is that worth it?  It very well could be.  The answer, of course, is "it depends on the project."

If the question is, "Should I generate test data for this database or not?" then the answer will depend on what you plan to do with the test data.

However, if the question is, "Should I roll my own data generation tool or buy Advanced Data Generator 4?" then the answer is clear:  save yourself a bunch of time and buy Advanced Data Generator 4.


External references

Test Data Generator / Test Data Generation / Advanced Data Generator @ Upscene Productions
Upscene: Database tools for developers. Database tools for Oracle, PostgreSQL, InterBase, Firebird, SQL Server, MySQL, NexusDB, SQL Anywhere and Advantage Database. Auditing tools for databases. Test Data Generator tools for databases.
SQL Data Generator - Data Generator For MS SQL Server Databases
Automatically populate your SQL Server databases with realistic test data - try SQL Data Generator free for 14 days.

Referenced articles

Access DevCon 2021: Day 2
If you weren’t able to make it to virtual Access DevCon Vienna 2021, here’s what you missed on Day 2.

All original code samples by Mike Wolfe are licensed under CC BY 4.0