Q&A
How do you test AI/ML features in your product — what assertions even make sense?
Ajitesh MohantaAmbassador
2w ago 2,349 0
Our product has a few LLM-powered features (a summarisation tool, a smart search). I'm trying to figure out how to test them.
The challenge: LLM outputs are non-deterministic. Traditional assertions don't work.
Approaches I'm exploring:
1. **Structural assertions** — assert the output is a non-empty string, contains required fields, is below a length limit. Easy but low signal.
2. **LLM-as-judge** — use a second LLM call to evaluate the output. Meta, but apparently effective.
3. **Golden set evaluation** — curate 50 test inputs with "acceptable" output ranges and measure drift over time.
4. **Contract testing for prompts** — assert that prompts sent to the LLM match a template, not that outputs are correct.
Has anyone shipped a production QA process for LLM features? What actually stuck?