ACE Interactive Demo

Batch Processing SCX.ai

Single Demo

You are here

One question at a time. See each agent step-by-step.

Batch Demo

Training Mode

Process many samples. Watch the Playbook learn!

What is ACE?

Agentic Context Engineering

ACE is a self-improving AI framework where LLMs learn from their mistakes without changing their weights. Instead of retraining, ACE builds an evolving playbook—a living document of strategies, insights, and lessons learned that grows smarter with each problem solved.

Key Innovation

Context adaptation instead of weight updates—preserves knowledge without catastrophic forgetting.

Performance

+10.6% on agent tasks, +8.6% on finance—matches production agents with smaller models.

Self-Supervised

Learns from execution feedback—no labeled data required for continuous improvement.

Research Paper

arXiv:2510.04618

Zhang et al., 2025

The Three-Role Agentic Architecture

Generator

Step 1

Reads the Playbook and applies its strategies to answer the question. Reports which bullet points it used in its reasoning.

Reflector

Step 2

Analyzes the Generator's output. Tags each bullet as helpful, harmful, or neutral. Identifies root causes of errors.

Curator

Step 3

Updates the Playbook with new insights from the Reflector's analysis. Uses incremental delta updates to prevent context collapse.

Learn more about SCX.ai

What is the Playbook?Key Concept

The Playbook is ACE's secret weapon — it's not just a prompt. It's a living knowledge base that grows and improves with every problem the system solves.

Regular Prompt

• Static — never changes
• Written by developers
• Same for every task

ACE Playbook

• Dynamic — evolves over time
• Updated by the Curator agent
• Tracks what works (helpful/harmful counts)

Example Playbook Entry

[str-00001] helpful=3 harmful=0 :: Look for capitalized words to identify proper nouns

str-00001

Score

+3 helpful

Strategy

The actual insight

💡 Think of it like: A prompt is instructions for a new employee. The Playbook is the knowledge base they build up over months of experience.

Behind the Scenes: How ACE Calls the LLMTechnical

The three "agents" (Generator, Reflector, Curator) are not separate servers. They're just 3 different prompts sent to the same LLM.

Each Problem = 3 LLM API Calls

CALL #1

Generator

"Answer this question"

CALL #2

Reflector

"What went wrong?"

CALL #3

Curator

"Update playbook"

Same LLM API(DeepSeek-V3.1 via SCX.ai)

Training Mode

3-7 LLM calls per problem. Expensive, but builds up the Playbook with knowledge.

Production Mode

1 LLM call per problem. Just the Generator with a pre-built Playbook. Fast & cheap!

Where Does Everything Run?

On YOUR Server:

• Generator, Reflector, Curator code
• Playbook storage (database/file)
• Orchestration logic

On LLM Provider (SCX.ai):

• The actual AI model
• Stateless — no memory between calls
• Just processes prompts

⚡ Key insight: The "agents" are just different prompts. All the intelligence comes from how you orchestrate the calls and persist the Playbook.

Select a Problem

Current Playbook

What is a Playbook?

The Playbook is ACE's "living memory." It contains strategies and lessons learned from past tasks.

The Generator uses these bullets to avoid mistakes and follow best practices. The Reflector evaluates if they helped or hurt. The Curator adds new insights to this list in real-time.

Living Knowledge: This list is passed to the AI as context. Watch it grow! The Curator agent will add new rules here based on what it learns from the Reflector's feedback.

[str-00001]helpful=2 harmful=0::Look for capitalized words that may indicate proper nouns

[str-00002]helpful=3 harmful=0::Consider context clues like 'Inc.', 'Corp.', 'LLC' for organizations

[mis-00001]helpful=1 harmful=2::Don't confuse product names with company names

ACE Agent Pipeline

Paper

● Generator● Reflector● Curator

Generator

Waiting...

Reads the Playbook and uses its strategies to answer the question. Reports which bullet points it referenced.

Reflector

Waiting...

Analyzes the Generator's reasoning. Tags bullets as helpful/harmful. Identifies the root cause of any errors.

Curator

Waiting...

Updates the Playbook with new insights. Uses incremental delta updates to prevent context collapse.