skip to content
blog.metters.dev

Using LLMs to generate test data quickly

/ 2 min read


NOTE Do not use LLMs on your job unless you have explicit permission by your employer and customer!


LLMs have their limits, but they can proof handy in a supportive role for some programming tasks. The use case I came across was generating test data, e.g., for a prototype. It works well, and there is little risk because (test) data is being generated, instead of the business logic.

Test data generation

When I first created the product list for the product catalogue of my father-in-law’s company website, I knew next to nothing about the domain. There was no real product catalogue yet, but I needed to display something. The idea was to store all data of the catalogue as JSON, to be able to iterate over the list and display each entry including the data. The test data should be slightly different each time and also close enough to reality, so I could properly test the search function. To display something, I needed a small number of entries.

First, I asked a LLM about relevant key data of engine starters. After that, I explained some constraints for some fields, e.g., which values/lists may be empty sometimes and then let it generate several dozens of entries with example data.

Content of first generated products.json
[{
"id": "some-unique-id",
"imageUrl": "/products/product-91.jpg",
"name": "Honda Starter 1337",
"compatibility": ["Honda Civic", "Honda Accord", "Acura Integra"],
"powerOutput": "1.4 kW",
"starterType": "Gear-reduction",
"warranty": "3 years",
"compatibleEngines": ["1.8L L4 GAS DOHC Naturally Aspirated", "2.0L L4 GAS DOHC Naturally Aspirated"],
"mountingStyle": "Offset Starter (Straight Across)"
}]

The example above is the actual result. Only id and imageUrl as technical fields were not generated. Some entries were generated to represent edge cases.

Even though the actual structure of the product data turned out quite different from what I (or the LLM) expected, it still enabled fast progress.

Actual structure of my product data (engine starters)
[{
"Model": "BS001",
"Voltage": "12",
"Power": "1.1",
"Manufacturer": "Bosch",
"ManufacturerIds": ["0001107043", "0001107087"],
"Lester": ["31167N"],
"Application": ["Ford Focus 1.4L 1.6L 2004-2006"]
}]

Queries for database initialisation

The same approach works with test data generation for (local) databases. I needed some data in my database on startup, so I gave a query to a LLM and let it create a dozen of similar ones. Here’s an example of a statement a LLM easily can create many lines of:

Inserting mock values into my database
INSERT INTO users (first_name, last_name, date_of_birth, city)
VALUES ('Dennis', 'Ritchie', '1941-09-09', 'Bronxville');