AI-Centered Engineering

FPT Cloud · BSS Compute Team · Dev Lead: Nguyen Anh Duc · v2.0 — April 2026

Spec Kit Pipeline Execution: TBD Brownfield-safe

The Problem

79 modules across 3 repos (Flask backend, React console, React admin). Dual-site VN/JP. Years of brownfield code, 38 audited tech debt items. Small team, biweekly releases, manual QC.

97.8%

Greenfield accuracy

→

50–76%

Brownfield accuracy

Without quality gates → prompt drift accumulates silently → new technical debt stacks on old.

Design Principles

🧠

AI Has No Memory

Every session starts blank. No context fed = AI guesses. Guessing on brownfield = wrong. Encode conventions into files AI reads automatically.

⚖️

80% vs 100%

AGENTS.md = suggestion (~80%). Hooks = law (100%). Suggestions for style, laws for critical rules.

📦

Context Budget

System prompt uses ~50 slots. Budget ~150–200 total. Every unnecessary line dilutes the important ones.

📄

Docs = Runtime Context

Documentation is no longer just for humans — it's the runtime context that determines AI performance. 1 doc, 2 audiences.

🎨

Prototype → Spec → Code

Stakeholders need to see UI first. Prototype to align, spec to define, code last.

🔀

Pipeline ≠ Tool

Spec Kit = stable pipeline (define what). Execution tool = swappable (build how). Swap execution tool without changing the process.

Compliance Levels

Hooks

FROZEN zone · Lint · Security · Auto-format

100%

.claude/rules/

Layer rules · Naming · API · DB · Testing

~80%

AGENTS.md

Context · Commands · Zone map · Skill table

~80%

Skills

Gold standard · Code review · Module arch

On-demand

3-Layer Documentation Architecture

Layer 1 — AGENTS.md + docs/

Permanent · Tool-agnostic

What the repo IS and HOW to work with it. AGENTS.md (conventions, commands). SPEC.md + DESIGN.md per feature (both humans and AI read). CONSTITUTION.md (cross-repo contract). 60,000+ repos adopted AGENTS.md standard.

Layer 2 — .claude/

Tool-specific · Amplification

How Claude specifically BEHAVES. Rules (8 files, auto-load). Skills (on-demand). Hooks (100% enforce). CLAUDE.md → symlink AGENTS.md.

Layer 3 — .planning/

Ephemeral · Working memory

What we're BUILDING right now. Phase-specific task state, execution logs. Cleared after shipping. Managed by execution tool.

Pipeline — Spec Kit Backbone

Stable process. Execution tool is swappable.

Phase 0: Foundation→ 1: Visual→ 2: Spec (Spec Kit)→ 3: Build (TBD)→ 4: Quality→ 5: Feedback

Foundation

AGENTS.md · CONSTITUTION · .claude/ · Gold standard · Zone map

~1 week · One-time

▶

Why: Without Phase 0, AI generates code but nobody can tell if it's right — no convention to compare against.

AGENTS.md hand-written by Dev Lead (CLAUDE.md symlinks to it). CONSTITUTION.md cross-repo contract. .claude/ — 8 rule files + 2 skills + 4 hooks + agents. Gold standard module — team refactoring one now. Zone map — FROZEN vs GROWTH, enforced by hook (100%). /speckit.constitution assists if needed.

Visual Alignment

Intent + Negative scope → Prototype → Demo → Lock decisions

2–3 days

▶

Why: Stakeholders need to see UI before they know what they want. Negative scope written upfront = shield against scope creep.

Claude Artifact for prototype (React, 30s iteration). Not Lovable (different stack, wrong data model). Output: Visual Decision Record + Component Map → MUI + Interaction Spec. No prototype code enters the codebase.

Spec Extraction (Spec Kit)

specify → SPEC.md + DESIGN.md → analyze → checklist → approve

1–2 days

▶

Why: Prototype is visual only. Need formal specs for AI to generate correct code + QC to test. Spec Kit is team-designed with PR-centric review gates.

/speckit.specify → extract from prototype + intent. SPEC.md (PRD+SRS merged: stories, AC in WHEN/THEN/SHALL, scope boundaries, NFR). DESIGN.md (architecture + technical merged: components, data model, API contract, data flow, implementation order). /speckit.analyze checks consistency. /speckit.checklist quality gate. docsmith generates formal docs for leadership if needed.

Implementation

Execution tool: TBD — plan → exec → verify → fix loop

3–5 days

▶

Why execution tool: Manual Claude Code = hand-feed context every session + manually commit. Execution tool automates: task distribution, parallel execution, verify/fix loop. Candidates: OMC (14.8k★, plugin), ruflo (28.1k★, orchestration platform), Agent Teams (native), GSD-2 (CLI), or manual Claude Code.

Execution tool reads AGENTS.md + .claude/rules/ + hooks automatically. Backend first, FE second. Per-repo workflows. Atomic commit per task. Spec Kit /speckit.tasks provides task breakdown regardless of execution tool chosen.

Osmani's rules (apply to any tool): WIP max 3–5 agents. Kill after 3+ stuck iterations. One file, one owner. Verification is the bottleneck, not generation.

Quality Gates

AI verify → Human MR → QC staging → E2E auto

1–2 days

▶

Why both AI + human: AI catches convention violations fast. Human catches logic, architecture, business context. Two layers complement each other.

Execution tool verify or /speckit.analyze verify code vs spec. Human MR review mandatory (GitLab). QC staging tests against AC. Playwright E2E for critical flows (AI-generated, CI auto-run). Hooks enforce 100%: lint gate, FROZEN zone, security.

Feedback Loop

Update conventions → Skill learning → 3-question retro

0.5 days

▶

Why: Whatever AI got wrong → update conventions so it doesn't happen again. Conventions are living documents.

Update AGENTS.md + rules + CONSTITUTION. If execution tool supports skill learning → extract debug patterns for brownfield knowledge. Retro: What did AI get wrong? What convention was missing? Where did scope creep happen?

Key Decisions

Pipeline

Spec Kit

Team-designed, PR-centric, review gates. Stable backbone — swap execution tool without changing process.

Execution

TBD

Evaluate after POC setup. Candidates: OMC (14.8k★), ruflo (28.1k★), Agent Teams (native), GSD-2, manual CC.

Config validation

agnix

Linter for AGENTS.md, SKILL.md, hooks, MCP. Autofixes. Validate before runtime testing.

Convention file

AGENTS.md

Single source, tool-agnostic, 60k+ repos. CLAUDE.md symlinks to it. → CLAUDE.md

Spec format

2 files/feature

SPEC.md (PRD+SRS). DESIGN.md (Arch+Tech). 1 doc, 2 audiences. Markdown = source, docx = output.

Enforcement

Hooks + Rules

100% hooks for critical (zone FROZEN, lint, security). ~80% rules for conventions.

Prototype

Claude Artifact

React in chat, 30s iteration. ✗ Lovable wrong stack, data model drift.

Testing

4 layers

Unit (AI) + CI (auto) + QC (human) + E2E (Playwright). Each catches different bugs.

Brownfield

.claude/skills/

Gold standard + module architecture skills. Execution tool skill learning as bonus if available.

Multi-Agent Rules (Osmani)

WIP max 3–5 agents. Beyond this, coordination overhead exceeds gains.

Kill after 3+ stuck iterations. Don't let stuck agents burn tokens.

One file, one owner. The most important rule to prevent overwrites.

Verification is the bottleneck — not generation. Invest in test quality.

Human-written AGENTS.md > LLM-generated. Dev-written: 4% improvement. LLM: 3% with 20% more cost.

Human judgment is non-negotiable for architecture, feature prioritization, system review.

Risk Assessment

AI forgets conventions between sessions

→ AGENTS.md + rules auto-load + hooks 100%

Scope creep during prototype demo

→ Negative scope BEFORE demo, check every request

Execution tool not yet chosen

→ Pipeline (Spec Kit) is stable, execution tool is swappable

Token cost exceeds budget

→ WIP ≤3 agents, kill stuck iterations, use eco modes if available

Gold standard not representative enough

→ Simple CRUD module, iterate after POC

Team resistance to new workflow

→ POC 1 feature, demonstrate value, scale later

🚀 POC — Customer Feedback Widget

O2 Q2/2026 roadmap · Right-sized scope · 2 repos · No cross-team dependencies

Week 1

Gold standard done
AGENTS.md + .claude/
Install Spec Kit

Week 2–3

Phase 1–2: Prototype → Spec
Phase 3: Execute (tool TBD)
Backend + FE

Week 3–4

Phase 4: Review + QC
Phase 5: Retro
Measure metrics

Setup

? days

One-time cost

Implement

? days

vs manual coding

Effort saved

vs full manual

MR rejects

Violations caught

Bugs in QC

vs team avg

Proto → Spec

? days

vs manual spec

Agent Readiness Level

Functional

README, linting, type checking, tests

⚠️

✅

Documented

AGENTS.md, devcontainers, pre-commit hooks

❌

✅

Standardized

Integration tests, secret scanning, observability

⚠️

Optimized

Fast CI, deployment metrics

❌

—

Autonomous

Self-improving, complex automated decomposition

❌

—

Now → After columns show current state vs post-POC target