What is a DOE system

AI is powerful. But unreliable.

Ask it the same question twice. Get different answers.

That's fine for brainstorming. Terrible for business operations.

Then I found out about Nick Saraevs DOE systems.

Directive-Orchestration-Execution.


The Problem With AI Today

AI models are probabilistic.

Every response has variance. Even with temperature at 0.

This creates a compound problem:

  • 90% accuracy per step
  • 5 steps in a workflow
  • 0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 59% success rate

That's why AI "agents" fail on complex tasks.

Each step introduces error. Errors compound. By step 5, you're flipping a coin.


The Solution: Separate Thinking From Doing

Here's the insight:

AI is good at decisions. Code is good at execution.

So don't make AI do both.

Split them.

THINKING (AI)          DOING (Code)
──────────────         ──────────────
Understand intent      API calls
Route to right tool    Data processing
Handle edge cases      File operations
Learn from errors      Consistent output

AI decides what to do. Code does it.

Now your success rate is:

  • AI routing: 95% accurate
  • Code execution: 99.9% reliable
  • Combined: 95% success rate

Much better than 59%.


The 3-Layer DOE Architecture

Layer 1: Directives (What To Do)

SOPs written in plain language. Markdown files.

They tell AI:

  • What the goal is
  • What inputs to expect
  • Which tools/scripts to use
  • What output to produce
  • How to handle edge cases

Think of it like training a new employee. You write down the process. They follow it.

Example directive:

# Audit GTM Container

## Goal
Analyze a GTM container and report issues.

## Steps
1. List all tags using gtm_list_tags
2. Check each tag for:
   - Missing triggers
   - Consent Mode compliance
   - Naming conventions
3. Generate report using audit_report.py

## Edge Cases
- If container has 100+ tags, paginate
- If API rate limited, wait 60 seconds

Layer 2: Orchestration (Decision Making)

This is the AI layer. Claude Code in my case.

It reads directives. Makes decisions. Routes tasks.

What it does:

  • Understands user intent
  • Picks the right directive
  • Calls scripts in the right order
  • Handles errors gracefully
  • Updates directives with learnings

What it doesn't do:

  • API calls directly (uses scripts)
  • Data processing (uses scripts)
  • File manipulation (uses scripts)

AI is the manager. Not the worker.

Layer 3: Execution (Doing The Work)

Deterministic Python scripts.

They do one thing. They do it reliably. Every time.

Characteristics:

  • Single responsibility
  • Clear inputs and outputs
  • Error handling built in
  • Testable
  • Fast

Example:

# execution/monitor/data_flow_validator.py

def validate_client(client_slug):
    """
    Validate tracking for a single client.
    Returns structured result. Always.
    """
    config = load_config(client_slug)
    results = run_playwright_checks(config)
    return format_results(results)

No AI in here. Just code that works.


Why This Architecture Wins

1. Reliability

AI errors don't cascade. If AI picks wrong script, script still runs correctly. Easy to debug.

2. Speed

Scripts are fast. No LLM inference for execution. AI only involved in routing decisions.

3. Learning

When something breaks:

  1. Fix the script
  2. Update the directive
  3. System is now stronger

Each error makes the system better. Compound improvement.

4. Transparency

Directives are readable. Anyone can see what the system does. No black box.

5. Portability

Switch AI models anytime. Directives work with Claude, GPT, Gemini. The logic lives in markdown, not prompts.


The Self-Annealing Loop

This is where it gets interesting.

Every error is a learning opportunity.

Error occurs
    ↓
Fix the script
    ↓
Update the directive
    ↓
Log the learning
    ↓
System is stronger
    ↓
Error never happens again

After 6 months, the system has solved problems you didn't know existed.

It's like compound interest for reliability.


Real-World Example: Tracking DOE

I built Tracking DOE using this architecture.

What it does:

  • Monitors marketing tracking (GA4, Google Ads, Meta, etc.)
  • Audits GTM containers
  • Sends alerts when tracking breaks
  • Learns from every issue

The layers:

Layer Implementation
Directives directives/gtm-audit.md, directives/troubleshooting.md
Orchestration Claude Code with Stape GTM MCP
Execution Python scripts in execution/monitor/

When I ask "audit my GTM container":

  1. Claude reads gtm-audit.md directive
  2. Decides which scripts to call
  3. Scripts fetch data via GTM API
  4. Scripts analyze and format results
  5. Claude presents findings

AI thinks. Code works. Results are consistent.


How To Build Your Own DOE System

Step 1: Identify Repetitive Tasks

What do you do repeatedly that could be automated?

  • Data processing
  • Report generation
  • Monitoring
  • Auditing
  • Onboarding

Step 2: Write Directives

For each task, document:

  • Goal
  • Steps
  • Tools needed
  • Edge cases
  • Expected output

Keep them in markdown. Simple and readable.

Step 3: Build Execution Scripts

One script per atomic action.

  • fetch_data.py
  • process_data.py
  • generate_report.py

Make them deterministic. Test them.

Step 4: Connect With AI

Use Claude Code, Cursor, or any AI coding assistant.

Point it at your directives. Let it orchestrate.

Step 5: Iterate

Every error → update directive → stronger system.


DOE vs Traditional Automation

Aspect Traditional DOE System
Flexibility Rigid flows AI adapts to context
Reliability Script-dependent Code execution reliable
Learning Manual updates Self-improving
Edge cases Break the flow AI handles gracefully
Maintenance High Low (self-documents)

DOE gives you the best of both worlds.

AI flexibility. Code reliability.


When NOT To Use DOE

DOE is overkill for:

  • One-off tasks
  • Simple automations (use Zapier)
  • Tasks with no edge cases
  • Low-stakes operations

Use DOE when:

  • Task has many variations
  • Errors are costly
  • You want compound improvement
  • Multiple people need to use it

Getting Started

Want to see DOE in action?

Check out Tracking DOE - my implementation for marketing tracking.

It's open source. Fork it. Adapt it to your use case.

The architecture transfers to any domain:

  • Sales operations
  • Customer support
  • Content creation
  • Data pipelines
  • DevOps

Same pattern. Different directives and scripts.


Summary

DOE = Directive-Orchestration-Execution

  • Directives: What to do (markdown SOPs)
  • Orchestration: AI decides and routes
  • Execution: Code does the work

AI thinks. Code works. System learns.

That's how you make AI reliable.


Previous
Previous

I Gave AI Full Control Over My GTM Containers