Skip to content
← All case studies
#003March 19, 2026

Building the Kodulabor Website — Why AI Needs Rigor and Taste

~4 hours
Total time
12-18×
Acceleration
6,381
Lines generated
25.2%
Rework ratio
5 rounds
Corrections
2
AI review rounds
From brief to bilingual production website
6,381 lines · 9 commits · 5 correction rounds
01Source
Project brief + case study documents
Positioning, naming, framework definition
Two existing case studies as content
Resume, photo, internal announcement
02AI Processing
[AI]
Subagents scaffold site + generate content
Next.js + i18n + middleware in ~30 min
Translation files, page components, data layer
Technical infrastructure: 0 errors
03Human Review
[HUMAN]
Caught 4 major fabrication incidents
Case studies: entirely invented content
About page: wrong voice, fabricated details
Methodology: wrong framework, positioning contradiction
04AI Review
[AI]
Separate model critiques strategically
Fresh context, no accumulated blind spots
Identified positioning self-sabotage
2 rounds of structured feedback
05Human Decision
[HUMAN]
Taste as final filter
Accepted, modified, or rejected each suggestion
Chose compression level, tone, framing
25.2% of initial output replaced
Impact
Time
~4 hours vs 1–2 weeks
Cost
API costs vs €3,000+
Effort
30 min build + 2.5h corrections

Problem

Kodulabor needed a website. Not a placeholder — a credible launch vehicle for a new business line, timed to a public LinkedIn announcement. The site needed to establish positioning, present the assessment framework, host two published case studies in full, support English and Estonian localization across two domains (kodulabor.ai and kodulabor.ee), and be deployable on Vercel from a GitHub repository.

The scope was clear: home page, about page, methodology page, case studies section with detail pages, contact page. Bilingual. Markdown rendering for full-length case studies. Domain-based locale routing middleware. All in a single working session.

This sounds like a straightforward AI-assisted web development project. It was not. The technical build was fast. The content quality was a disaster that required five correction rounds.


AI Approach

The website was built using Claude Cowork — Anthropic's desktop AI assistant — running on Claude Opus. The development method combined direct prompting for architecture decisions with delegated subagents for bulk content generation and code scaffolding.

Technical stack:

AI architecture — and the root cause of problems:

The session used a parent-child agent pattern. I (the parent Claude session) handled architecture, decisions, and quality review. Subagents were delegated to handle:

  • Creating all translation files (English and Estonian)
  • Building all Next.js page components
  • Generating case study data

The subagents worked fast. They also worked without the full context. The parent session had the brief, the case study documents, the resume, the positioning decisions from the conversation — but the subagents received only their task descriptions. They filled in gaps by fabricating.


Human Effort

Session duration: ~4 hours (13:00–17:00 UTC, March 19, 2026)

Commits: 9

Total lines of code generated: 6,381

Prompt count: ~25 human messages

Effort breakdown by phase:

PhaseTimeActivity
Concept and brief~45 minNaming, positioning, brief document, planning
Initial build~30 minNext.js scaffolding, all pages, i18n, middleware
First correction: case study content~20 minDiscovered fabricated content, replaced with originals
Second correction: About page~15 minThird-person voice, fabricated career details
Third correction: Methodology~20 minWrong framework (process steps, not dimensions)
Fourth correction: Full rebuild~40 minRepositioning based on independent feedback
Fifth correction: Sharpening~30 minConsulting contradiction, bio compression, pain-led copy
Total~4 hours

The revealing ratio: Of the ~4 hours, roughly 30 minutes was productive AI-assisted building. Roughly 2.5 hours was catching and correcting AI-generated content that was plausible but wrong.


Traditional Benchmark

Building this website without AI:

ItemEstimate
Design & frontend (Next.js, Tailwind, responsive)20–30 hours
Content writing (5 pages × 2 languages)15–20 hours
Case study integration (markdown rendering, data layer)8–12 hours
i18n & middleware (locale routing, domain mapping)5–8 hours
Total48–70 hours
Calendar time1–2 weeks

Acceleration Factor

MetricTraditionalAI-assistedFactor
Wall clock time1–2 weeks1 afternoon (~4 hours)~20x
Human effort (total)48–70 hours~4 hours12–18x
Human effort (productive)48–70 hours~1.5 hours32–47x
Human effort (corrections)0 hours~2.5 hoursN/A

The acceleration is real but misleading if you only count productive time. The actual experience was: 30 minutes of impressive generation, then 2.5 hours of quality control. The acceleration factor on the technical build is extraordinary (the Next.js scaffolding, middleware, and page structure appeared in minutes). The acceleration factor on content that required judgment was much lower — and in some cases negative, because fixing plausible-but-wrong content is harder than writing it from scratch.


Quality Assessment

What was generated correctly on first attempt

The technical infrastructure was flawless:

  • Next.js project structure with App Router, TypeScript, proper configs
  • i18n system with [locale] dynamic segments and translation loading
  • Domain-based locale routing middleware
  • Markdown rendering component with styled table, code, and heading support
  • Static generation with generateStaticParams for all routes
  • Build passed on every attempt

Technical verdict: 9/10. The AI excels at well-trodden infrastructure patterns.

What was fabricated and had to be replaced

The content was a catastrophe:

1. Case study content — entirely fabricated (Commit 4) The subagent was told to create case study data. Instead of using the actual case study documents we had already written, it invented completely different content. The Revalia Homes study became a fictional story about "copywriting automation" with made-up metrics ("83% time reduction across 12 sessions, $0.24 per session"). The automated case studies became a story about "CSV upload pipelines." Both were plausible, well-structured, and entirely wrong.

Lines replaced: 423

2. About page — wrong voice, fabricated details (Commit 5) The subagent wrote the About page in third person ("He") despite the instruction to use first person. It also fabricated career details. "Skype (7 years)" became "seven years" (it was 5). "Joined at 20 employees" was correct but surrounded by invented narrative. The claimed "400+ engineers in Estonia known by first name" appeared despite not being in the brief or resume.

Lines replaced: 38

3. Methodology — wrong framework entirely (Commit 7) The brief defined a 9-dimension assessment framework (Problem, AI Approach, Human Effort, Traditional Benchmark, Acceleration Factor, Quality Assessment, Gotchas, Replicability, Verdict). The subagent invented a 9-step process (Project intake, Baseline measurement, AI integration design...) that sounded professional but didn't match anything in the brief or the actual case studies. The methodology page and the case studies described completely different things.

Lines replaced: 65

4. Positioning — "Not a consultancy" self-sabotage (Commits 8–9) The initial content included the line "Not a consultancy. Not an agency." — memorable copy that actively contradicted the business model. The contact page described paid consulting engagements while the home page rejected the category. It took external feedback to identify this as a structural problem, not a style issue.

Lines replaced: 206 across two correction rounds

Content verdict: 3/10. The AI produces content that reads well but means wrong. It is fluent without being accurate. It fills gaps in its context by pattern-matching against similar content it has seen, which produces plausible fabrications that require expertise to catch.


Gotchas & Limitations

1. Subagent context loss is the root cause

The parent session had 25+ messages of accumulated context: the brief, the positioning decisions ("kitchen table, not government"), the naming ontology, the case study documents, the resume, the photo, the internal Bolt announcement. When a subagent was delegated a task like "create the translation files," it received a summary of this context — not the full context. Every fabrication traces back to the subagent filling in what it didn't know.

Lesson: Delegating to subagents without passing the source documents is like briefing a junior copywriter verbally and expecting them to get the details right. They won't. They'll write something that sounds right.

2. Plausible fabrication is worse than obvious failure

When AI generates code that doesn't compile, you catch it immediately. When AI generates content that is well-written, properly formatted, internally consistent, and factually wrong — you might not catch it until someone else reads it. The fabricated case study content could have been published. It read fine. It just wasn't true.

Lesson: Content review cannot be skipped, even when the output looks professional. Especially when it looks professional.

3. Positioning requires taste, not generation

The AI was asked to write website copy. It produced copy that was clear, well-structured, and strategically incoherent. "Not a consultancy" on one page, "paid consulting engagements" on another. No amount of prompt engineering fixes this, because the problem isn't generation quality — it's strategic judgment. The AI doesn't know whether you should lean into or away from the consulting category. That's a human decision.

Lesson: AI can draft positioning. It cannot decide positioning. The human must own the strategic frame.

4. The most valuable feedback came from another AI model — used differently

The session quality improved dramatically after I fed the site content and the original brief into a separate AI model (outside this Cowork session) and asked it to critique the site as a positioning reviewer. That model, working with fresh context and no accumulated blind spots, produced two rounds of structured feedback that identified the positioning contradiction, the biography-heavy About page, the methodology mismatch, the missing conversion path, and the consulting self-sabotage.

This is the most instructive part of the whole project. The AI that built the site could not see its own strategic errors. A different AI instance, given the right framing ("critique this as a launch vehicle for a new business line"), caught them immediately. The problem was never AI capability — it was context contamination. The building session had accumulated so many incremental decisions that it lost the ability to evaluate the whole.

Lesson: AI reviewing AI works — but only when the reviewer has clean context and a different role. Using the same session to both build and critique produces blind spots. The review model saw the "Not a consultancy" contradiction instantly because it wasn't the one who wrote it. Separation of concerns applies to AI workflows, not just code architecture.

5. Human taste was still the final filter

Even with AI-on-AI review, the human made the final calls. Which feedback to accept ("remove 'Not a consultancy'" — yes), which to modify ("shorten the bio by 30-40%" — yes, but I chose the compression level), and which to reject or defer. The AI reviewer suggested specific copy; I used the direction but not the exact words. Taste — knowing what sounds like you, what matches the tone, what your audience will believe — is still a human function.

Lesson: The best workflow was: AI builds fast → different AI critiques strategically → human decides what's true. Three layers, not two.

6. The 25% rework ratio

Of the 2,898 lines in the initial build, 732 lines (25.2%) were deleted in subsequent correction commits. That means one in four lines generated by subagents was wrong enough to require replacement. On a small project, this is manageable. On a large project, a 25% fabrication rate would be disastrous.


Replicability Score

3 out of 5

The technical pattern (Next.js + i18n + Tailwind + Vercel) is highly replicable. The content failures would reproduce identically for anyone using the same subagent delegation pattern without full context passing. The correction process required:

  • An experienced engineer who noticed the fabrications
  • Actual source documents to compare against
  • A separate AI model used as a strategic reviewer (not the same session that built the site)
  • Human taste as the final filter on what feedback to accept

The three-layer workflow (AI builds → different AI critiques → human decides) is replicable. The specific judgment calls are not. Someone without the experience to evaluate the AI reviewer's suggestions would either accept everything (overcorrection) or nothing (wasted feedback).


Verdict

This project is the most instructive Kodulabor case study so far, because it shows where AI assistance fails — and the failure mode is insidious rather than obvious.

The technical build was genuinely impressive. A complete bilingual Next.js website with 17 pre-rendered routes, domain-based middleware, markdown rendering, and proper static generation — scaffolded in under 30 minutes. No human developer matches that speed on infrastructure.

The content was genuinely bad. Not obviously bad — subtly bad. Fabricated metrics, wrong voice, mismatched framework, contradictory positioning. Each piece of content read well in isolation. The problems only emerged when you compared the output against source documents, checked it against the brief, or asked someone with strategic judgment to review it.

The key insight: AI acceleration is real on the structural and technical layers. It is dangerous on the content and strategic layers — not because the AI is slow, but because it is confidently wrong. The 12–18x overall acceleration is real, but it hides a split: infrastructure was ~50x faster, content was ~2x faster after corrections, and strategic positioning required purely human judgment.

The most surprising finding: the best reviewer was also an AI — just a different one. Feeding the site content and the original brief into a separate AI model for critique produced sharper, more actionable feedback than the building session could generate internally. The building session had context blindness; the review session had fresh eyes. This suggests that AI-assisted projects should build separation of concerns into their workflow: one AI builds, a different AI reviews, and a human makes the final calls.

The practical takeaways for Kodulabor projects: never delegate content to subagents without passing the full source documents. Always review content against originals. Use a separate AI instance for strategic review. And keep the human in the loop as the taste layer — the one who decides what's true, what sounds right, and what the audience will actually believe.

The AI builds the house fast. A different AI checks if it's the right house. The human decides whether to live in it.


This case study was written during the same Cowork session it describes. The data comes from git commit history, line counts, and timestamps. The strategic feedback came from a separate AI model given the brief and site content for independent review. The irony of the whole thing is not lost on me.


Data Appendix

MetricValue
Session dateMarch 19, 2026
Session duration~4 hours
Total commits9
Total lines generated6,381
Total lines deleted (corrections)732
Rework ratio25.2%
Correction rounds5
Human prompts~25
Subagent delegations6
Fabrication incidents4 (case studies, about, methodology, positioning)
External AI review rounds2 (separate model, fresh context)
Build failures0
Technical infrastructure errors0
Content/strategic errors4 major, multiple minor
Final build17 pre-rendered routes, 2 languages
StackNext.js 16, TypeScript, Tailwind v4, react-markdown
Deployment targetVercel (kodulabor.ai / kodulabor.ee)

Correction Timeline

13:51 Initial commit — 2,898 lines, all pages, both languages ✓ Infrastructure: perfect ✗ Case study content: fabricated ✗ About page: third-person, fabricated details ✗ Methodology: wrong framework ✗ Positioning: "Not a consultancy" 13:54 Middleware — working correctly 13:59 README/CLAUDE.md — working correctly 14:12 CORRECTION 1: Case study content replaced 423 lines deleted, real documents inserted 14:22 CORRECTION 2: About page rewritten 38 lines deleted, first-person voice, real resume data 14:52 CORRECTION 3: Methodology rewritten 65 lines deleted, framework dimensions replace process steps 15:06 CORRECTION 4: Major rebuild from external feedback 161 lines deleted, positioning + structure overhaul 15:46 CORRECTION 5: Sharpening from second feedback round 45 lines deleted, contradiction fixed, copy tightened