Cost Optimization
Strategies to minimize API costs during compilation and production.
Compilation Costs
Compilation is the most expensive phase because you're evaluating many candidate prompts. Here's how to control costs:
1. Use Token Budgets
const result = await compiler.compile(program, trainset, {
candidates: 50,
budget: {
maxTokens: 100000, // Hard limit
onBudgetWarning: (used, max) => {
console.warn(`⚠️ Token usage: ${used}/${max}`);
}
}
});2. Early Stopping
Skip poorly performing candidates early:
const result = await compiler.compile(program, trainset, {
candidates: 50,
earlyStopThreshold: 0.5, // Skip if score < 50%
});3. Smaller Training Sets
Start with 5-10 high-quality examples:
// Quality > Quantity
const trainset = [
// Choose diverse, representative examples
{ input: {...}, output: {...} },
{ input: {...}, output: {...} },
// 5-10 total is often enough
];4. Use Cheaper Models
// Use fast/cheap model for compilation
const compileProvider = createProvider("openai", {
defaultModel: "gpt-4o-mini" // Cheaper than gpt-4o
});
// Use better model for production
const prodProvider = createProvider("openai", {
defaultModel: "gpt-4o"
});Cost Tracking
Track token usage during compilation:
import { CostTracker, estimateCost } from "@mzhub/promptc";
// Estimate before running
const estimate = compiler.estimateCost(trainset.length, {
candidates: 20
});
console.log("Estimated tokens:", estimate.estimatedTokens);
// Actual usage after compilation
console.log("Actual tokens:", result.meta.tokenUsage.totalTokens);
console.log("API calls:", result.meta.tokenUsage.calls);
// Cost calculation (example rates)
const rates = {
"gpt-4o-mini": { input: 0.15 / 1000000, output: 0.60 / 1000000 },
"gpt-4o": { input: 2.50 / 1000000, output: 10.00 / 1000000 }
};
const cost =
result.meta.tokenUsage.inputTokens * rates["gpt-4o-mini"].input +
result.meta.tokenUsage.outputTokens * rates["gpt-4o-mini"].output;
console.log(`Cost: $${cost.toFixed(4)}`);Production Costs
1. Caching
Cache responses for repeated inputs:
import { PromptCache } from "@mzhub/promptc";
const cache = new PromptCache({
maxSize: 1000,
ttlMs: 60 * 60 * 1000 // 1 hour
});
async function runWithCache(input) {
const cached = cache.get("extract", input);
if (cached) return cached;
const result = await compiled.run(input);
cache.set("extract", input, result);
return result;
}2. Model Selection
| Task Type | Recommended Model | Why |
|---|---|---|
| Simple extraction | gpt-4o-mini | Fast, cheap, accurate for structured tasks |
| Complex reasoning | gpt-4o | Better at multi-step logic |
| High volume | groq (llama) | Fastest inference, lowest latency |
| Development | ollama | Free, runs locally |
3. Batching
Process multiple items in a single call when possible:
// Instead of N separate calls...
const results = await Promise.all(
items.map(item => program.run({ text: item }))
);
// Design schema to handle batches
const BatchSchema = defineSchema({
description: "Extract names from multiple texts",
inputs: {
texts: z.array(z.string()) // Process multiple at once
},
outputs: {
results: z.array(z.object({
text: z.string(),
names: z.array(z.string())
}))
}
});Cost vs Quality
Always test cheaper options first. gpt-4o-mini often performs as well as gpt-4o for structured extraction tasks, at 1/10th the cost.
Next: Production Deployment →