Using LLMs to generate test data quickly
/ 2 min read
NOTE Do not use LLMs on your job unless you have explicit permission by your employer and customer!
LLMs have their limits, but they can proof handy in a supportive role for some programming tasks. The use case I came across was generating test data, e.g., for a prototype. It works well, and there is little risk because (test) data is being generated, instead of the business logic.
Test data generation
When I first created the product list for the product catalogue of my father-in-lawβs company website, I knew next to nothing about the domain. There was no real product catalogue yet, but I needed to display something. The idea was to store all data of the catalogue as JSON, to be able to iterate over the list and display each entry including the data. The test data should be slightly different each time and also close enough to reality, so I could properly test the search function. To display something, I needed a small number of entries.
First, I asked a LLM about relevant key data of engine starters. After that, I explained some constraints for some fields, e.g., which values/lists may be empty sometimes and then let it generate several dozens of entries with example data.
The example above is the actual result.
Only id
and imageUrl
as technical fields were not generated.
Some entries were generated to represent edge cases.
Even though the actual structure of the product data turned out quite different from what I (or the LLM) expected, it still enabled fast progress.
Queries for database initialisation
The same approach works with test data generation for (local) databases. I needed some data in my database on startup, so I gave a query to a LLM and let it create a dozen of similar ones. Hereβs an example of a statement a LLM easily can create many lines of: