Skip to content
Methodology

The Kodulabor Assessment Framework

Every Kodulabor project is assessed using nine dimensions. This isn't a step-by-step process—it's a framework I apply to measure what actually happened in real work. I look at the problem, the AI approach, the human effort involved, how it compares to traditional methods, and whether it could work for someone else too. Each dimension tells part of the story. Together, they show whether AI actually helped.

1

Problem

What needed to be solved, for whom, and why it mattered. The specific workflow, the pain point, and what success would look like.

2

AI Approach

Which tools I selected, why those tools, and exactly how I applied them. Model choice, prompt engineering, integration points, and where AI stops and human judgment begins.

3

Human Effort

Actual hours spent on the project. Broken down between AI-assisted work (where I worked with the AI) and manual work (where the AI couldn't help). Precision here matters.

4

Traditional Benchmark

What this task would have cost or taken without AI. Either from historical data or from careful estimation based on how the work actually works.

5

Acceleration Factor

The ratio of traditional effort to AI-assisted effort. If traditional work took 100 hours and AI-assisted took 30, the acceleration factor is 3.3x.

6

Quality Assessment

How does the AI-assisted output compare to traditional quality standards? Is it equivalent, better in some ways, worse in others? What did I have to fix?

7

Gotchas & Limitations

What didn't work. Edge cases where the AI failed, moments where human correction was needed, constraints I hit, and things that surprised me.

8

Replicability Score

Could someone else reproduce this result with similar tools and similar work? I rate it 1–5. A 5 means the approach is straightforward and general. A 1 means it was custom and specific.

9

Verdict

Overall recommendation. Did AI help here? By how much? What would I do differently next time? Who should and shouldn't try this approach?

Why this framework exists

AI hype is everywhere. Claims are easy. Data is rare. I built this framework because there's a difference between a tool that feels helpful in the moment and a tool that measurably improves how I actually work. I measure the nine dimensions—problem, approach, effort, benchmark, acceleration, quality, gotchas, replicability, and verdict—because that's the only way to know if the AI really helped or just felt impressive. I document what broke. I share numbers. That's how we learn.

A note on data sources

All data comes from real work. Timing logs from actual sessions, token counts from API calls, quality assessments from the people doing the work, and documented moments where the AI needed human correction. No proprietary models, no synthetic benchmarks. Every dimension—effort hours, acceleration factor, replicability score—comes from measurement, not intuition. If you want to replicate my results, you have the numbers.

Case Study Template

When I write a case study, I answer each of these questions. You can use this same structure to document your own AI projects. The prompt next to each heading is a guide—answer it honestly, include numbers, and don't hide the hard parts.

Problem

What was the specific task or workflow? Who was doing it, and what made it difficult? What did success look like?

AI Approach

Which AI tools did I use? Why those tools specifically? How did I set them up—what prompts, what parameters, what integration points? Where did I decide not to automate?

Human Effort

How many hours did this take in total? Break it down: how much was AI-assisted work (working with the AI) versus manual work (the AI couldn't handle it)? Include all overhead—setup, iteration, fixes, reviews.

Traditional Benchmark

If I had done this without AI—no tools, just human effort—how long would it have taken? Use actual data if available, or careful estimation based on similar work done before.

Acceleration Factor

Divide traditional effort by AI-assisted effort. If traditional would have taken 40 hours and AI-assisted took 12 hours, the factor is 3.3x. Include any work that wouldn't have happened at all without AI.

Quality Assessment

Is the output as good as it would have been without AI? Better in some ways? Worse? What had to be manually corrected? Are there quality trade-offs I made?

Gotchas & Limitations

What went wrong? When did the AI fail? What edge cases did I hit? What needed human intervention? What would I warn someone else about?

Replicability Score

Could someone else do exactly what I did with the same tools and get similar results? Rate 1–5 (1 = only works for this specific case, 5 = widely applicable). Explain the rating.

Verdict

Overall: did AI help? By how much? Should other people try this approach? What would I do differently? Who is this for, and who should avoid it?

Data Appendix

List the tools used, token counts, costs, timing breakdowns, prompts (if shareable), and any other raw numbers or data that support the case study. This is what lets readers verify and replicate.