QA System Example

Build a question-answering system that shows its reasoning before answering. Uses Chain of Thought for step-by-step thinking.

What You'll Learn

  • How to build context-aware QA with multiple inputs
  • How Chain of Thought exposes the model's reasoning
  • How to use partialMatch for flexible evaluation
  • How to output confidence scores

1. Define the Schema

Our QA system takes a context paragraph and a question, then outputs an answer with a confidence score.

import {
  defineSchema,
  ChainOfThought,
  BootstrapFewShot,
  partialMatch,
  createProvider,
  z,
} from "@mzhub/promptc";

const QASchema = defineSchema({
  description: "Answer questions based on provided context. Think step by step.",
  inputs: {
    context: z.string().describe("The text containing information"),
    question: z.string().describe("The question to answer"),
  },
  outputs: {
    answer: z.string().describe("The answer extracted from context"),
    confidence: z.number().min(0).max(1).describe("Confidence in the answer"),
  },
});
Field Descriptions
Adding .describe() to your Zod fields helps the LLM understand what each field represents, leading to better outputs.

2. Create the Program

ChainOfThought adds a reasoning step, which is especially useful for QA tasks that require understanding context.

const provider = createProvider("openai", {
  apiKey: process.env.OPENAI_API_KEY,
});

const qaProgram = new ChainOfThought(QASchema, provider);

3. Run a Single Inference

Before compiling, let's run a single inference to see the reasoning trace:

const result = await qaProgram.run({
  context: "Python was created by Guido van Rossum and first released in 1991. It emphasizes code readability.",
  question: "Who created Python?",
});

console.log("Question: Who created Python?");
console.log("Reasoning:", result.trace.reasoning);
console.log("Answer:", result.result.answer);
console.log("Confidence:", `${(result.result.confidence * 100).toFixed(0)}%`);

Example output:

Question: Who created Python?
Reasoning: The context states that "Python was created by Guido van Rossum". 
           This directly answers the question about who created Python.
Answer: Guido van Rossum
Confidence: 95%

4. Prepare Training Data

Create examples with varying contexts and questions:

const trainset = [
  {
    input: {
      context: "The Eiffel Tower was completed in 1889. It stands 330 meters tall and is located in Paris, France.",
      question: "When was the Eiffel Tower completed?",
    },
    output: { answer: "1889", confidence: 0.95 },
  },
  {
    input: {
      context: "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976.",
      question: "Who founded Apple?",
    },
    output: {
      answer: "Steve Jobs, Steve Wozniak, and Ronald Wayne",
      confidence: 0.95,
    },
  },
  {
    input: {
      context: "The speed of light is approximately 299,792 kilometers per second in a vacuum.",
      question: "What is the speed of light?",
    },
    output: {
      answer: "approximately 299,792 kilometers per second",
      confidence: 0.9,
    },
  },
];

5. Compile for Better Accuracy

Use partialMatch for QA since answers don't need to be exact matches—similar answers should score well.

const compiler = new BootstrapFewShot(partialMatch());

const compiled = await compiler.compile(qaProgram, trainset, {
  candidates: 5,
  concurrency: 2,
});

console.log(`Compilation score: ${(compiled.meta.score * 100).toFixed(1)}%`);
Choosing an Evaluator
  • exactMatch: Strict equality (best for classification)
  • partialMatch: Substring matching (good for QA)
  • arrayOverlap: Jaccard similarity for arrays
  • llmJudge: Use another LLM to evaluate (most flexible)

Full Example

qa-system.ts
import {
  defineSchema,
  ChainOfThought,
  BootstrapFewShot,
  partialMatch,
  createProvider,
  z,
} from "@mzhub/promptc";

const QASchema = defineSchema({
  description: "Answer questions based on provided context. Think step by step.",
  inputs: {
    context: z.string().describe("The text containing information"),
    question: z.string().describe("The question to answer"),
  },
  outputs: {
    answer: z.string().describe("The answer extracted from context"),
    confidence: z.number().min(0).max(1).describe("Confidence in the answer"),
  },
});

const provider = createProvider("openai", { apiKey: process.env.OPENAI_API_KEY });
const qaProgram = new ChainOfThought(QASchema, provider);

const trainset = [
  {
    input: { context: "The Eiffel Tower was completed in 1889...", question: "When was the Eiffel Tower completed?" },
    output: { answer: "1889", confidence: 0.95 },
  },
  // ... more examples
];

async function main() {
  // Run single inference to see reasoning
  const result = await qaProgram.run({
    context: "Python was created by Guido van Rossum and first released in 1991.",
    question: "Who created Python?",
  });

  console.log("Reasoning:", result.trace.reasoning);
  console.log("Answer:", result.result.answer);

  // Compile for better accuracy
  const compiler = new BootstrapFewShot(partialMatch());
  const compiled = await compiler.compile(qaProgram, trainset, {
    candidates: 5,
    concurrency: 2,
  });

  console.log(`Score: ${(compiled.meta.score * 100).toFixed(1)}%`);
}

main().catch(console.error);