Skip to main content
Community
Q&A

How do you test AI/ML features in your product — what assertions even make sense?

Ajitesh Mohanta
Ajitesh MohantaAmbassador
2w ago 2,349 0
Our product has a few LLM-powered features (a summarisation tool, a smart search). I'm trying to figure out how to test them. The challenge: LLM outputs are non-deterministic. Traditional assertions don't work. Approaches I'm exploring: 1. **Structural assertions** — assert the output is a non-empty string, contains required fields, is below a length limit. Easy but low signal. 2. **LLM-as-judge** — use a second LLM call to evaluate the output. Meta, but apparently effective. 3. **Golden set evaluation** — curate 50 test inputs with "acceptable" output ranges and measure drift over time. 4. **Contract testing for prompts** — assert that prompts sent to the LLM match a template, not that outputs are correct. Has anyone shipped a production QA process for LLM features? What actually stuck?

No replies yet. Be the first to answer this question!

Join the discussion

Sign in to join the discussion

Sign in
How do you test AI/ML features in your product — what assertions even make sense? — Community | QABash