AI is powerful. But unreliable.

Ask it the same question twice. Get different answers.

That's fine for brainstorming. Terrible for business operations.

Then I found out about Nick Saraevs DOE systems.

Directive-Orchestration-Execution.

The Problem With AI Today

AI models are probabilistic.

Every response has variance. Even with temperature at 0.

This creates a compound problem:

90% accuracy per step
5 steps in a workflow
0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 59% success rate

That's why AI "agents" fail on complex tasks.

Each step introduces error. Errors compound. By step 5, you're flipping a coin.

The Solution: Separate Thinking From Doing

Here's the insight:

AI is good at decisions. Code is good at execution.

So don't make AI do both.

Split them.

THINKING (AI)          DOING (Code)
──────────────         ──────────────
Understand intent      API calls
Route to right tool    Data processing
Handle edge cases      File operations
Learn from errors      Consistent output

AI decides what to do. Code does it.

Now your success rate is:

AI routing: 95% accurate
Code execution: 99.9% reliable
Combined: 95% success rate

Much better than 59%.

The 3-Layer DOE Architecture

Layer 1: Directives (What To Do)

SOPs written in plain language. Markdown files.

They tell AI:

What the goal is
What inputs to expect
Which tools/scripts to use
What output to produce
How to handle edge cases

Think of it like training a new employee. You write down the process. They follow it.

Example directive:

# Audit GTM Container

## Goal
Analyze a GTM container and report issues.

## Steps
1. List all tags using gtm_list_tags
2. Check each tag for:
   - Missing triggers
   - Consent Mode compliance
   - Naming conventions
3. Generate report using audit_report.py

## Edge Cases
- If container has 100+ tags, paginate
- If API rate limited, wait 60 seconds

Layer 2: Orchestration (Decision Making)

This is the AI layer. Claude Code in my case.

It reads directives. Makes decisions. Routes tasks.

What it does:

Understands user intent
Picks the right directive
Calls scripts in the right order
Handles errors gracefully
Updates directives with learnings

What it doesn't do:

API calls directly (uses scripts)
Data processing (uses scripts)
File manipulation (uses scripts)

AI is the manager. Not the worker.

Layer 3: Execution (Doing The Work)

Deterministic Python scripts.

They do one thing. They do it reliably. Every time.

Characteristics:

Single responsibility
Clear inputs and outputs
Error handling built in
Testable
Fast

Example:

# execution/monitor/data_flow_validator.py

def validate_client(client_slug):
    """
    Validate tracking for a single client.
    Returns structured result. Always.
    """
    config = load_config(client_slug)
    results = run_playwright_checks(config)
    return format_results(results)

No AI in here. Just code that works.

Why This Architecture Wins

1. Reliability

AI errors don't cascade. If AI picks wrong script, script still runs correctly. Easy to debug.

2. Speed

Scripts are fast. No LLM inference for execution. AI only involved in routing decisions.

3. Learning

When something breaks:

Fix the script
Update the directive
System is now stronger

Each error makes the system better. Compound improvement.

4. Transparency

Directives are readable. Anyone can see what the system does. No black box.

5. Portability

Switch AI models anytime. Directives work with Claude, GPT, Gemini. The logic lives in markdown, not prompts.

The Self-Annealing Loop

This is where it gets interesting.

Every error is a learning opportunity.

Error occurs
    ↓
Fix the script
    ↓
Update the directive
    ↓
Log the learning
    ↓
System is stronger
    ↓
Error never happens again

After 6 months, the system has solved problems you didn't know existed.

It's like compound interest for reliability.

Real-World Example: Tracking DOE

I built Tracking DOE using this architecture.

What it does:

Monitors marketing tracking (GA4, Google Ads, Meta, etc.)
Audits GTM containers
Sends alerts when tracking breaks
Learns from every issue

The layers:

Layer	Implementation
Directives	`directives/gtm-audit.md`, `directives/troubleshooting.md`
Orchestration	Claude Code with Stape GTM MCP
Execution	Python scripts in `execution/monitor/`

When I ask "audit my GTM container":

Claude reads gtm-audit.md directive
Decides which scripts to call
Scripts fetch data via GTM API
Scripts analyze and format results
Claude presents findings

AI thinks. Code works. Results are consistent.

How To Build Your Own DOE System

Step 1: Identify Repetitive Tasks

What do you do repeatedly that could be automated?

Data processing
Report generation
Monitoring
Auditing
Onboarding

Step 2: Write Directives

For each task, document:

Goal
Steps
Tools needed
Edge cases
Expected output

Keep them in markdown. Simple and readable.

Step 3: Build Execution Scripts

One script per atomic action.

fetch_data.py
process_data.py
generate_report.py

Make them deterministic. Test them.

Step 4: Connect With AI

Use Claude Code, Cursor, or any AI coding assistant.

Point it at your directives. Let it orchestrate.

Step 5: Iterate

Every error → update directive → stronger system.

DOE vs Traditional Automation

Aspect	Traditional	DOE System
Flexibility	Rigid flows	AI adapts to context
Reliability	Script-dependent	Code execution reliable
Learning	Manual updates	Self-improving
Edge cases	Break the flow	AI handles gracefully
Maintenance	High	Low (self-documents)

DOE gives you the best of both worlds.

AI flexibility. Code reliability.

When NOT To Use DOE

DOE is overkill for:

One-off tasks
Simple automations (use Zapier)
Tasks with no edge cases
Low-stakes operations

Use DOE when:

Task has many variations
Errors are costly
You want compound improvement
Multiple people need to use it

Getting Started

Want to see DOE in action?

Check out Tracking DOE - my implementation for marketing tracking.

It's open source. Fork it. Adapt it to your use case.

The architecture transfers to any domain:

Sales operations
Customer support
Content creation
Data pipelines
DevOps

Same pattern. Different directives and scripts.

Summary

DOE = Directive-Orchestration-Execution

Directives: What to do (markdown SOPs)
Orchestration: AI decides and routes
Execution: Code does the work

AI thinks. Code works. System learns.

That's how you make AI reliable.

What is a DOE system