The Kodulabor Assessment Framework
Every Kodulabor project is assessed using nine dimensions. This isn't a step-by-step process — it's a measurement framework applied to real work under controlled conditions. We examine the problem, the AI approach, the human effort involved, how it compares to traditional methods, and whether it could work for someone else too. Each dimension tells part of the story. Together, they show whether AI actually helped.
Problem
What needed to be solved, for whom, and why it mattered. The specific workflow, the pain point, and what success would look like.
AI Approach
Which tools were selected, why those tools, and exactly how they were applied. Model choice, prompt engineering, integration points, and where AI stops and human judgment begins.
Human Effort
Actual hours spent on the project. Broken down between AI-assisted work and manual work. Precision here matters.
Traditional Benchmark
What this task would have cost or taken without AI. Either from historical data or from careful estimation based on how the work actually works.
Acceleration Factor
The ratio of traditional effort to AI-assisted effort. If traditional work took 100 hours and AI-assisted took 30, the acceleration factor is 3.3x.
Quality Assessment
How does the AI-assisted output compare to traditional quality standards? Is it equivalent, better in some ways, worse in others? What had to be fixed?
Gotchas & Limitations
What didn't work. Edge cases where the AI failed, moments where human correction was needed, constraints we hit, and things that surprised us.
Replicability Score
Could someone else reproduce this result with similar tools and similar work? Rated 1–5. A 5 means the approach is straightforward and general. A 1 means it was custom and specific.
Verdict
Overall assessment. Did AI help here? By how much? What would we do differently next time? Who should and shouldn't try this approach?
Why this framework exists
AI hype is everywhere. Claims are easy. Data is rare. This framework exists because there's a difference between a tool that feels helpful in the moment and a tool that measurably improves how work actually gets done. We measure nine dimensions — problem, approach, effort, benchmark, acceleration, quality, gotchas, replicability, and verdict — because that's the only way to know if the AI really helped or just felt impressive. We document what broke. We share numbers. That's how the field advances.
A note on data sources
All data comes from real work. Timing logs from actual sessions, token counts from API calls, quality assessments from the people doing the work, and documented moments where the AI needed human correction. No proprietary models, no synthetic benchmarks. Every dimension — effort hours, acceleration factor, replicability score — comes from measurement, not intuition. If you want to replicate our results, you have the numbers.
Case Study Template
Every case study follows this structure. You can use it to document your own AI projects. The prompt next to each heading is a guide — answer it honestly, include numbers, and don't hide the hard parts.
Problem
What was the specific task or workflow? Who was doing it, and what made it difficult? What did success look like?
AI Approach
Which AI tools were used? Why those tools specifically? How were they set up — what prompts, what parameters, what integration points? Where was automation deliberately avoided?
Human Effort
How many hours did this take in total? Break it down: how much was AI-assisted work versus manual work? Include all overhead — setup, iteration, fixes, reviews.
Traditional Benchmark
Without AI — no tools, just human effort — how long would it have taken? Use actual data if available, or careful estimation based on similar work done before.
Acceleration Factor
Divide traditional effort by AI-assisted effort. If traditional would have taken 40 hours and AI-assisted took 12 hours, the factor is 3.3x. Include any work that wouldn't have happened at all without AI.
Quality Assessment
Is the output as good as it would have been without AI? Better in some ways? Worse? What had to be manually corrected? Are there quality trade-offs?
Gotchas & Limitations
What went wrong? When did the AI fail? What edge cases came up? What needed human intervention? What would you warn someone else about?
Replicability Score
Could someone else do exactly this with the same tools and get similar results? Rate 1–5. Explain the rating.
Verdict
Overall: did AI help? By how much? Should other people try this approach? What would be done differently? Who is this for, and who should avoid it?
Data Appendix
List the tools used, token counts, costs, timing breakdowns, prompts (if shareable), and any other raw numbers or data that support the case study. This is what lets readers verify and replicate.