Q&A

How do you test AI/ML features in your product — what assertions even make sense?

May 28, 2026 2,372 0

Our product has a few LLM-powered features (a summarisation tool, a smart search). I'm trying to figure out how to test them. The challenge: LLM outputs are non-deterministic. Traditional assertions don't work. Approaches I'm exploring: 1. **Structural assertions** — assert the output is a non-empty string, contains required fields, is below a length limit. Easy but low signal. 2. **LLM-as-judge** — use a second LLM call to evaluate the output. Meta, but apparently effective. 3. **Golden set evaluation** — curate 50 test inputs with "acceptable" output ranges and measure drift over time. 4. **Contract testing for prompts** — assert that prompts sent to the LLM match a template, not that outputs are correct. Has anyone shipped a production QA process for LLM features? What actually stuck?

No comments yet. Be the first to answer this question!

How do you test AI/ML features in your product — what assertions even make sense?

Join the discussion