Evaluators
Evaluators score how well LLM outputs match expected outputs during compilation.
Built-in Evaluators
exactMatch
Returns 1.0 if outputs are deeply equal, 0.0 otherwise:
import { exactMatch } from "@mzhub/promptc";
const evaluator = exactMatch();
evaluator({ name: "Alice" }, { name: "Alice" }); // 1.0
evaluator({ name: "Alice" }, { name: "Bob" }); // 0.0partialMatch
Returns the fraction of fields that match:
import { partialMatch } from "@mzhub/promptc";
const evaluator = partialMatch();
evaluator(
{ a: 1, b: 2, c: 3 },
{ a: 1, b: 2, c: 4 }
); // 0.666 (2 out of 3 match)arrayOverlap
Computes Jaccard similarity for arrays:
import { arrayOverlap } from "@mzhub/promptc";
const evaluator = arrayOverlap();
evaluator(["a", "b", "c"], ["a", "b", "c"]); // 1.0 (identical)
evaluator(["a", "b"], ["b", "c"]); // 0.33 (1/3 overlap)
evaluator(["a", "b"], ["c", "d"]); // 0.0 (no overlap)llmJudge
Uses an LLM to score the output quality:
import { llmJudge, createProvider } from "@mzhub/promptc";
const provider = createProvider("openai", {
apiKey: process.env.OPENAI_API_KEY
});
const evaluator = llmJudge({
provider,
criteria: "accuracy and completeness" // Optional
});
// Returns a score between 0 and 1
const score = await evaluator(prediction, groundTruth);Cost Consideration
llmJudge makes an API call for each evaluation. Use sparingly during compilation or combine with cheaper evaluators.
Evaluator Interface
All evaluators follow this signature:
type Evaluator<O> = (
prediction: O, // LLM output
groundTruth: O // Expected output
) => number | Promise<number>; // Score between 0 and 1Custom Evaluators
Create your own evaluator for domain-specific scoring:
// Simple custom evaluator
const containsKeywords = (prediction, groundTruth) => {
const keywords = groundTruth.keywords || [];
const text = prediction.text?.toLowerCase() || "";
const found = keywords.filter(k => text.includes(k.toLowerCase()));
return found.length / keywords.length;
};
// Use with compiler
const compiler = new BootstrapFewShot(containsKeywords);Combining Evaluators
Combine multiple evaluators with weighted averaging:
const combinedEvaluator = async (prediction, groundTruth) => {
const exactScore = exactMatch()(prediction, groundTruth);
const overlapScore = arrayOverlap()(
prediction.items || [],
groundTruth.items || []
);
// Weighted average: 60% exact, 40% overlap
return exactScore * 0.6 + overlapScore * 0.4;
};Choosing an Evaluator
| Use Case | Recommended Evaluator |
|---|---|
| Exact answers (classification, extraction) | exactMatch |
| Partial correctness allowed | partialMatch |
| List/set outputs | arrayOverlap |
| Subjective quality (summaries, creative) | llmJudge |
| Domain-specific | Custom evaluator |
Next: Compilers →