AI-Centered Engineering

FPT Cloud · BSS Compute Team · Dev Lead: Nguyen Anh Duc · v2.0 — April 2026

Spec Kit Pipeline Execution: TBD Brownfield-safe

The Problem

79 modules across 3 repos (Flask backend, React console, React admin). Dual-site VN/JP. Years of brownfield code, 38 audited tech debt items. Small team, biweekly releases, manual QC.

97.8%
Greenfield accuracy
50–76%
Brownfield accuracy

Without quality gates → prompt drift accumulates silently → new technical debt stacks on old.

Design Principles
🧠

AI Has No Memory

Every session starts blank. No context fed = AI guesses. Guessing on brownfield = wrong. Encode conventions into files AI reads automatically.

⚖️

80% vs 100%

AGENTS.md = suggestion (~80%). Hooks = law (100%). Suggestions for style, laws for critical rules.

📦

Context Budget

System prompt uses ~50 slots. Budget ~150–200 total. Every unnecessary line dilutes the important ones.

📄

Docs = Runtime Context

Documentation is no longer just for humans — it's the runtime context that determines AI performance. 1 doc, 2 audiences.

🎨

Prototype → Spec → Code

Stakeholders need to see UI first. Prototype to align, spec to define, code last.

🔀

Pipeline ≠ Tool

Spec Kit = stable pipeline (define what). Execution tool = swappable (build how). Swap execution tool without changing the process.

Compliance Levels

Hooks
FROZEN zone · Lint · Security · Auto-format
100%
.claude/rules/
Layer rules · Naming · API · DB · Testing
~80%
AGENTS.md
Context · Commands · Zone map · Skill table
~80%
Skills
Gold standard · Code review · Module arch
On-demand

3-Layer Documentation Architecture

Layer 1 — AGENTS.md + docs/
Permanent · Tool-agnostic
What the repo IS and HOW to work with it. AGENTS.md (conventions, commands). SPEC.md + DESIGN.md per feature (both humans and AI read). CONSTITUTION.md (cross-repo contract). 60,000+ repos adopted AGENTS.md standard.
Layer 2 — .claude/
Tool-specific · Amplification
How Claude specifically BEHAVES. Rules (8 files, auto-load). Skills (on-demand). Hooks (100% enforce). CLAUDE.md → symlink AGENTS.md.
Layer 3 — .planning/
Ephemeral · Working memory
What we're BUILDING right now. Phase-specific task state, execution logs. Cleared after shipping. Managed by execution tool.

Pipeline — Spec Kit Backbone

Stable process. Execution tool is swappable.

Phase 0: Foundation 1: Visual 2: Spec (Spec Kit) 3: Build (TBD) 4: Quality 5: Feedback
0
Foundation
AGENTS.md · CONSTITUTION · .claude/ · Gold standard · Zone map
~1 week · One-time
Why: Without Phase 0, AI generates code but nobody can tell if it's right — no convention to compare against.
AGENTS.md hand-written by Dev Lead (CLAUDE.md symlinks to it). CONSTITUTION.md cross-repo contract. .claude/ — 8 rule files + 2 skills + 4 hooks + agents. Gold standard module — team refactoring one now. Zone map — FROZEN vs GROWTH, enforced by hook (100%). /speckit.constitution assists if needed.
1
Visual Alignment
Intent + Negative scope → Prototype → Demo → Lock decisions
2–3 days
Why: Stakeholders need to see UI before they know what they want. Negative scope written upfront = shield against scope creep.
Claude Artifact for prototype (React, 30s iteration). Not Lovable (different stack, wrong data model). Output: Visual Decision Record + Component Map → MUI + Interaction Spec. No prototype code enters the codebase.
2
Spec Extraction (Spec Kit)
specify → SPEC.md + DESIGN.md → analyze → checklist → approve
1–2 days
Why: Prototype is visual only. Need formal specs for AI to generate correct code + QC to test. Spec Kit is team-designed with PR-centric review gates.
/speckit.specify → extract from prototype + intent. SPEC.md (PRD+SRS merged: stories, AC in WHEN/THEN/SHALL, scope boundaries, NFR). DESIGN.md (architecture + technical merged: components, data model, API contract, data flow, implementation order). /speckit.analyze checks consistency. /speckit.checklist quality gate. docsmith generates formal docs for leadership if needed.
3
Implementation
Execution tool: TBD — plan → exec → verify → fix loop
3–5 days
Why execution tool: Manual Claude Code = hand-feed context every session + manually commit. Execution tool automates: task distribution, parallel execution, verify/fix loop. Candidates: OMC (14.8k★, plugin), ruflo (28.1k★, orchestration platform), Agent Teams (native), GSD-2 (CLI), or manual Claude Code.
Execution tool reads AGENTS.md + .claude/rules/ + hooks automatically. Backend first, FE second. Per-repo workflows. Atomic commit per task. Spec Kit /speckit.tasks provides task breakdown regardless of execution tool chosen.
Osmani's rules (apply to any tool): WIP max 3–5 agents. Kill after 3+ stuck iterations. One file, one owner. Verification is the bottleneck, not generation.
4
Quality Gates
AI verify → Human MR → QC staging → E2E auto
1–2 days
Why both AI + human: AI catches convention violations fast. Human catches logic, architecture, business context. Two layers complement each other.
Execution tool verify or /speckit.analyze verify code vs spec. Human MR review mandatory (GitLab). QC staging tests against AC. Playwright E2E for critical flows (AI-generated, CI auto-run). Hooks enforce 100%: lint gate, FROZEN zone, security.
5
Feedback Loop
Update conventions → Skill learning → 3-question retro
0.5 days
Why: Whatever AI got wrong → update conventions so it doesn't happen again. Conventions are living documents.
Update AGENTS.md + rules + CONSTITUTION. If execution tool supports skill learning → extract debug patterns for brownfield knowledge. Retro: What did AI get wrong? What convention was missing? Where did scope creep happen?

Key Decisions

Pipeline
Spec Kit
Team-designed, PR-centric, review gates. Stable backbone — swap execution tool without changing process.
Execution
TBD
Evaluate after POC setup. Candidates: OMC (14.8k★), ruflo (28.1k★), Agent Teams (native), GSD-2, manual CC.
Config validation
agnix
Linter for AGENTS.md, SKILL.md, hooks, MCP. Autofixes. Validate before runtime testing.
Convention file
AGENTS.md
Single source, tool-agnostic, 60k+ repos. CLAUDE.md symlinks to it. → CLAUDE.md
Spec format
2 files/feature
SPEC.md (PRD+SRS). DESIGN.md (Arch+Tech). 1 doc, 2 audiences. Markdown = source, docx = output.
Enforcement
Hooks + Rules
100% hooks for critical (zone FROZEN, lint, security). ~80% rules for conventions.
Prototype
Claude Artifact
React in chat, 30s iteration. ✗ Lovable wrong stack, data model drift.
Testing
4 layers
Unit (AI) + CI (auto) + QC (human) + E2E (Playwright). Each catches different bugs.
Brownfield
.claude/skills/
Gold standard + module architecture skills. Execution tool skill learning as bonus if available.

Multi-Agent Rules (Osmani)

1
WIP max 3–5 agents. Beyond this, coordination overhead exceeds gains.
2
Kill after 3+ stuck iterations. Don't let stuck agents burn tokens.
3
One file, one owner. The most important rule to prevent overwrites.
4
Verification is the bottleneck — not generation. Invest in test quality.
5
Human-written AGENTS.md > LLM-generated. Dev-written: 4% improvement. LLM: 3% with 20% more cost.
6
Human judgment is non-negotiable for architecture, feature prioritization, system review.

Risk Assessment

AI forgets conventions between sessions
→ AGENTS.md + rules auto-load + hooks 100%
Scope creep during prototype demo
→ Negative scope BEFORE demo, check every request
Execution tool not yet chosen
→ Pipeline (Spec Kit) is stable, execution tool is swappable
Token cost exceeds budget
→ WIP ≤3 agents, kill stuck iterations, use eco modes if available
Gold standard not representative enough
→ Simple CRUD module, iterate after POC
Team resistance to new workflow
→ POC 1 feature, demonstrate value, scale later

🚀 POC — Customer Feedback Widget

O2 Q2/2026 roadmap · Right-sized scope · 2 repos · No cross-team dependencies

Week 1
Gold standard done
AGENTS.md + .claude/
Install Spec Kit
Week 2–3
Phase 1–2: Prototype → Spec
Phase 3: Execute (tool TBD)
Backend + FE
Week 3–4
Phase 4: Review + QC
Phase 5: Retro
Measure metrics
Setup
? days
One-time cost
Implement
? days
vs manual coding
Effort saved
?%
vs full manual
MR rejects
?
Violations caught
Bugs in QC
?
vs team avg
Proto → Spec
? days
vs manual spec

Agent Readiness Level

1
Functional
README, linting, type checking, tests
⚠️
2
Documented
AGENTS.md, devcontainers, pre-commit hooks
3
Standardized
Integration tests, secret scanning, observability
⚠️
⚠️
4
Optimized
Fast CI, deployment metrics
5
Autonomous
Self-improving, complex automated decomposition

Now → After columns show current state vs post-POC target