Kodulabor — Applied AI research with measured outcomes

Problem

A parent with two children in competitive junior tennis wanted two things that, until very recently, only a professional analytics department could provide. First, a real analysis of a match a daughter had already played — not the broadcast scoreline, but why it was lost: where she stood, how she struck the ball, when the momentum actually turned. Second, and more urgently, a scouting report on tomorrow's opponent, prepared from a single YouTube video of that opponent playing someone else.

Companies already sell exactly this. Video-based tennis analytics is an established service: a player or academy sends match footage, and an analyst returns heatmaps, shot patterns, serve placement, and a tactical breakdown. The service is real, it is good, and for a junior-tennis family it is priced as an occasional luxury, not a per-match habit. The question this project set out to answer was concrete: can one person, with AI assistance and a consumer laptop, produce a genuinely useful version of that analysis — for a match already played and for an opponent scouting report — at a cost that makes it repeatable for every match?

Success was not "match a professional Hawk-Eye installation." Success was a defensible, honest analysis a coach could act on: court coverage, where she contacts the ball, serve speed by set, who won which points, winners versus errors, a momentum curve — and for the opponent, a one-page tactical brief with a game plan. The footage available was ordinary: a broadcast stream of one match, and a fixed-camera YouTube upload of another. No sensors, no chips in the balls, no clean data feed. Just video and a burned-in scoreboard.

AI Approach

The entire pipeline was built in a single extended Claude Code session, starting from a literal blank slate — the operator had never written computer-vision code and at one point asked what a "notebook" was. The work proceeded as a guided build, with the operator (a tennis parent, not an engineer) supplying domain judgment and the AI supplying the engineering.

Models and tools, and why each:

—Ultralytics YOLO (yolo11x, yolo11m, yolo11x-pose) — player detection and pose. Pose was the key: a serve is "wrist far above the head," a contact is a body in a strike position. Runs on the Mac's Apple-Silicon GPU (MPS).
—TrackNet (a trained tennis-ball tracker) — the decisive component. Classical ball detection (colour + motion) failed completely; the ball is six pixels, motion-blurred, and lost against the crowd. TrackNet recovered the ball trajectory in wide rally shots at ~80% of frames.
—Modal serverless GPU (cloud T4) — TrackNet over a full match is ~4–5 hours on the laptop GPU but minutes on cloud T4s fanned out in parallel, for a few cents. The split that made the project practical: heavy ball-tracking on cloud, everything else local.
—Tesseract OCR — reading the burned-in scoreboard: serve speed, points, games, and the serving indicator. This is what unlocked point-by-point analysis.
—Homography (OpenCV) — mapping the angled camera view to a top-down court, so foot and ball positions become real court coordinates.
—yt-dlp + Deno — downloading the source videos locally (YouTube blocks cloud datacenter IPs, so download had to happen on the residential connection).
—Next.js / static HTML + Vercel — two interactive, password-protected report sites.

How the analysis was assembled, in layers:

—Position — detect both players, tell them apart by outfit colour (one wore white, the other a black skirt; later, an opponent in pink vs. teal), map feet to court coordinates, build coverage heatmaps per set.
—Ball — TrackNet trajectory; a contact is a sharp vertical reversal in the ball's path at a player's position (a reversal with no player nearby is a bounce).
—Outcome — read the scoreboard to cut the match into 199 individual points; the last contact of a point versus the point winner gives winners and errors without needing to see where the ball landed.
—Synthesis — momentum (cumulative point differential), serve speed by set, contact-height and depth, and a tactical scouting profile.

Where automation was deliberately not used: every analytical claim was validated visually before it was trusted — detection montages, ball-track overlays, identity checks. The operator's tennis knowledge was load-bearing, not decorative: he corrected the player-identity colour mapping (twice), flagged that "high contact" does not mean "under pressure," reframed what actually changed in the third set, and supplied the single most important fact in the opponent scout — that she is left-handed — which inverted the entire forehand/backhand read.

Human Effort

This was one continuous session rather than a multi-week product, which makes the accounting cleaner in one sense (one sitting) and softer in another (the session was not instrumented with per-task timers, so the hours below are an honest reconstruction, not a stopwatch log).

Two deliverables were produced: a full match analysis + interactive report for a completed match, and an opponent scouting report for an upcoming one.

Estimated active engagement breakdown:

Phase	Active time	What happened
Footage triage + court/homography calibration (match 1)	~1.5 h	Characterising broadcast footage, finding wide-rally shots
Coverage + outfit-identity pipeline, de-noising	~1.5 h	Background subtraction to drop static officials; ball-kid filtering
Serve speed + scoreboard OCR (shifting layouts)	~1 h	The OCR region moves as the scoreboard widens per set
TrackNet integration, cloud (Modal) setup, ball validation	~1.5 h	Including the device/memory debugging and cloud fan-out
Point segmentation + winners/errors + momentum	~1.5 h	Reading 199 points from the score; outcome logic
Interactive match report (two players, momentum, deploy)	~2 h	Design, charts, password-gated Vercel deploy
Opponent scout: re-calibration (green court, pink identity)	~1 h	Everything re-derived for a different broadcast
Scout: cloud TrackNet, contact/serve analysis, tactical brief + deploy	~2 h	Full second run on fresh footage

Estimated total active human time: ~12 hours, in one session. A large amount of additional wall-clock time was unattended compute — TrackNet and pose runs the operator waited on, not worked through.

A defining characteristic: the density of correction. The operator caught and reversed several confident-but-wrong outputs — identity flips, a misread of the third-set decline, an unfolded contact map that put strikes "on the wrong side of the net," and the handedness of the scouted opponent. This is not overhead to be minimised; it is the mechanism that kept the analysis honest.

Traditional Benchmark

This case has two legitimate "without AI" comparisons, and the user specifically asked for both. They answer different questions.

A. Build it as software (the one-time cost)

Producing this capability as a conventional software project — a small sports-analytics tool a developer could re-run on any match — is a serious computer-vision build:

Component	Estimate
Court detection, homography, per-broadcast calibration tooling	20–40 h
Player detection, tracking, outfit-based identity, background de-noise	40–60 h
Pose-based serve and contact detection	30–50 h
Ball tracking: integrating a model, the cloud-GPU pipeline	30–50 h
Scoreboard OCR (serve speed, points, games, server; shifting layouts)	40–70 h
Point segmentation + winner/error outcome logic	30–50 h
Coverage / contact / momentum analytics + visualisation	30–50 h
Two polished, interactive, deployed report sites	40–80 h
Integration, debugging, validation	40–80 h
Total	~300–530 h
Cost (€60–100/hr)	€18,000–53,000
Calendar time	2–3 months, small team

Mid-range planning value: ~400 hours.

B. Produce the same outputs by hand, per match (the recurring cost)

This is the comparison that matters more for a tennis family, because the alternative to a pipeline is not "build software" — it is "pay an analyst, every time." To produce, by hand, the equivalent of one match report or one scouting brief:

Manual task	Per match
Shot-by-shot charting (every point, contact, serve, outcome) — the Match Charting Project standard runs ~2–3× match duration	6–9 h
Court-position / coverage tagging from video	4–8 h
Serve, contact-zone, winner/error compilation	3–6 h
Momentum, per-set splits, summary statistics	1–2 h
Writing + designing the report	6–12 h
Total skilled analyst + design time	~20–37 h per match
Cost at analyst rates	€1,500–3,000 per match
And it recurs — every new match starts from zero.

A commercial video-analysis service compresses this with their own (often partly automated) tooling, but the family still pays per match, and turnaround is days, not the same evening.

Acceleration Factor

Because the project is both a build and a repeatable capability, the honest answer is two numbers.

Comparison	Traditional	AI-assisted	Factor
One-time build vs. software project	~400 h	~12 h	~33×
Per subsequent match vs. manual analyst	~30 h / match	~1 h human + ~30 min cloud / match	~25–30× per match, recurring
Direct cost, per match	€1,500–3,000	~€0.50 cloud + time	>1,000×

The build acceleration (~33×) is in line with the other Kodulabor software case studies. But the load-bearing number here is the marginal cost per match: once the pipeline exists, a new match is roughly an hour of human calibration-and-review plus thirty minutes of cloud compute that costs about fifty cents. The fully-human alternative is twenty-to-thirty skilled hours, every single time. For a family that might want this for ten matches a season, the recurring comparison dwarfs the one-time one.

And a category the factor cannot capture: most of this analysis would simply never have been produced. No tennis parent is going to hand-chart 199 points or build a momentum curve for a junior match. The realistic counterfactual is not "the same thing, slower" — it is "nothing," or "an occasional paid report for the biggest matches only." The pipeline changes what is possible to ask for, not just how fast the answer arrives.

Quality Assessment

The output is genuinely useful and genuinely imperfect, and the framework rewards saying both clearly.

What met a real standard:

—Point segmentation validated against the known result. The per-set point counts independently reproduced the actual match outcome (Anikina won Set 1, lost Sets 2 and 3) — strong evidence the score-reading and point logic were correct.
—Winners/errors revealed the true story. The headline finding — she won more total points than her opponent and lost the match, and her terminal-shot error rate jumped from ~41% to 68% in the third set — is a real, coach-grade insight that the scoreline alone hides.
—TrackNet ball tracking worked where classical methods failed entirely, and ball-confirmed contacts were verified frame by frame.
—The opponent scout produced an actionable game plan — deep counter-puncher, never comes to net, forehand-dominant left-hander — every point of which a coach can use.

What a professional system would do better:

—True ball-in/out and bounce mapping (this needed calibrated multi-camera or Hawk-Eye-grade tracking; here it was inferred).
—Serve placement — the pose-based serve detector caught only 6 serves cleanly in the scout match; far-baseline serves are too small. No serve-placement read was possible.
—Spin, exact shot type, and formal forced/unforced error classification — beyond what one broadcast camera supports.
—Calibrated absolute court depth — the homography was hand-fitted per camera, so depth comparisons are relative, not metric.

What had to be corrected by human judgment: player identity (twice), the interpretation of the third-set decline (it was execution, not court position or contact height — both of which were flat), a contact map that needed folding across ends, and the opponent's handedness. None of these were caught by the AI on its own.

Gotchas & Limitations

This project broke in more places than any other in the Kodulabor library, and that is the most instructive thing about it.

1. Classical ball detection failed entirely — twice. Colour-plus-motion detection produced a tennis ball indistinguishable from shoes, skin, and the "TARBES" court text. The honest test (gate to wide rally shots only) returned one real detection in five minutes. Only a trained model (TrackNet) broke through. The lesson: the riskiest assumption — "can we even see the ball?" — has to be tested before any analysis is built on top of it.

2. The AI confidently told three wrong stories, and the data killed all three. An "aggression arc" (flat), a "serve speed collapse" (flat), and "more high balls in Set 3" (flat) were each plausible narratives the metrics refused to support. Forcing a metric to fit a story is the cardinal sin; the discipline of separating what the data shows from what we expected was the difference between a real analysis and a flattering one.

3. Position is not intent, and height is not pressure. Foot position couldn't see "aggression"; contact height couldn't see "discomfort." The operator's correction — that the third set was lost to execution errors, not body position — redirected the whole analysis toward outcome metrics (winners/errors), which is where the truth actually lived.

4. Player identity flipped twice. The outfit-colour rule was applied backwards (the operator knows his own daughter), and far-court ball kids in dark clothing leaked in as the dark-skirted player until a "must wear a white top" filter was added.

5. The scoreboard moved. Serve speed and points are burned in at fixed positions — except the box slides right as it widens to show more sets, so a fixed OCR crop silently read only the first set. Everything had to read the whole band and parse the number out.

6. Every new match is a re-calibration. The opponent scout required re-deriving the court mask (green court, not orange clay), the homography (a different fixed camera), and the identity colour (pink, not white) from scratch. The methods transfer; the pixel coordinates and colour thresholds do not. This is the single biggest limit on turnkey reuse.

7. Handedness needed a human. The forehand/backhand split assumed right-handed and reported 64% one wing. The operator's single observation — "she is lefty based on footage" — inverted it: she is 64% forehand-dominant, which changed the game plan from "probe both wings" to "hunt the backhand."

8. Operational friction. Two deploys failed in instructive ways — one shipped an empty folder (a silent copy failure → a live 404), and YouTube blocked the cloud download entirely, forcing the residential download. Both are the kind of mundane breakage that no demo shows.

Replicability Score

3 out of 5.

The architecture is reusable and the playbook from this build was written down explicitly. Another person could lift the whole approach: YOLO + pose for players, TrackNet on Modal for the ball, Tesseract for the scoreboard, homography for the court, and the contact-from-trajectory-reversal logic.

What blocks a higher score is real and was lived twice in one session: every new broadcast needs re-calibration — court colour, scoreboard pixel regions, outfit thresholds, camera homography — and the analysis needs domain expertise to validate and interpret. The operator's tennis knowledge caught errors the AI could not. A developer without a tennis-literate partner would ship confident, wrong conclusions; a tennis coach without the engineering would not get past the first script. The result needs both, which is exactly why it is a 3 and not a 5.

Verdict

For decades, "professional video analysis of your match" meant a department, a service contract, or nothing. This project demonstrates a third option: a domain expert and an AI, in a single session, producing a defensible match report and an actionable opponent scout — for the price of a coffee in cloud compute, repeatable every match.

The honest framing is a partnership of two scarce things. The AI supplied capability that no individual could assemble alone — a trained ball-tracking model, cloud GPUs, pose estimation, OCR — wired into a working pipeline by someone who had never written computer-vision code. The human supplied the judgment that kept it true: catching identity flips, refusing flattering narratives, and contributing the one fact (left-handed) that the footage alone could not yield. Neither half produces this result. Together they replace, for one family, a capability that used to require a company.

Who should try this: a parent, coach, or player with genuine domain knowledge and the patience to validate every claim. The domain expertise is not optional — it is half the system.

Who should not: anyone hoping for a turnkey "upload video, get analysis" button. This is not that, and the re-calibration tax is real. It is a capability you operate, not a product you buy — and operating it well requires knowing the sport well enough to tell the AI when it is wrong.

The deepest lesson echoes the fitness-coach study from a different angle: AI collapses the cost of building and running the analysis, but it does not collapse the cost of judgment. The pipeline is cheap. Knowing whether to believe it is the work.

This case study was produced using the Kodulabor Assessment Framework. Methodology and findings published openly at kodulabor.ai.

Data Appendix

Metric	Value
Deliverables	1 match analysis report + 1 opponent scouting report, both deployed
Matches analysed	2 (one full broadcast match, one fixed-camera scouting match)
Estimated active human time	~12 hours (single session; not stopwatch-logged)
Build acceleration vs. software project	~33× (~12 h vs. ~400 h)
Marginal acceleration vs. manual analyst	~25–30× per match, recurring
Points segmented (match 1)	199, from scoreboard OCR
Ball-confirmed contacts (match 1)	1,086 (866 + 220 by player)
Tracked positions / contacts (scout)	3,899 positions, 898 contacts
Wide-rally segments tracked on cloud	374 (match 1) + 115 chunks (scout)
TrackNet frames processed (cloud)	~96,000 (match 1) + ~171,000 (scout)
Ball detection rate (in-rally)	~80% (match 1), ~60% (scout, harder camera)
Local GPU speedup (TrackNet, MPS vs CPU)	~9× (138 ms/frame vs 1,279 ms/frame)
Cloud GPU	Modal, T4, parallel fan-out, full match in minutes
Estimated cloud cost	a few cents to ~€1 per match
Models used	yolo11x / yolo11m / yolo11x-pose, TrackNet (yastrebksv), Tesseract OCR
Reports deployed	2 (Vercel, password-protected, edge-middleware auth)
Confirmed analytical corrections by the human operator	4+ (identity ×2, third-set interpretation, contact-map fold, handedness)
Re-calibration required for the second match	court mask, homography, identity colour, scoreboard regions
Operator's prior computer-vision experience	none (asked "what is a notebook?" at the start)

Tennis Match Analysis — Pro-Style Scouting from Broadcast Video