Finding the right starting point in a library of 31 tailored CVs.
After applying to thirty-plus roles across data science, energy, finance, and policy, my workflow had quietly become a problem in itself. Each new application meant opening a folder of past CVs, scanning filenames, half-remembering which version had the right framing, pasting it into Claude.ai, and only then starting the actual tailoring work.
Thirty-one CVs sounds manageable until you're staring at a job description at 11 p.m. and trying to remember whether the Intraday Trader version or the Quantitative Analyst version had the better Python-on-time-series framing. Manual filtering scaled badly, and human memory scaled worse.
CV Matcher exists to remove that step. It takes a new job description as input and returns the three CVs that need the least rewriting to become a strong submission — along with the exact bullets to keep, the skills to add, and the industry markers to scrub out.
The matching pipeline is intentionally split into two stages with very different cost profiles.
Stage 1 runs locally in the browser: a cosine similarity over term frequencies — a lightweight TF-without-IDF variant, chosen for zero dependencies and deterministic behaviour. It compares the new JD against each library entry along three weighted axes (JD↔JD: 0.50, JD↔CV: 0.35, JD↔CL: 0.15) and produces a fast pre-ranking in milliseconds.
Stage 2 runs on Claude Sonnet via a Vercel serverless function. The top 8 candidates from Stage 1 are sent for a deeper semantic analysis: which bullets actually transfer, which skills are genuinely missing, what industry markers work against the candidate, and what surgical edits would close the gap.
The split exists because semantic reasoning is expensive and term-frequency math is free. Use each where it earns its keep.
The stack is small on purpose. A single HTML file holds the frontend; one Python serverless function at /api/match holds the AI layer; Supabase holds anything bigger than a few kilobytes.
Frontend (single-page HTML + vanilla JS). PDF text extraction happens client-side via pdf.js, so the server never receives binary files. The library and the user's preferences live in localStorage. PDFs themselves — too large for localStorage — get pushed straight to a Supabase Storage bucket, served back through time-limited signed URLs rather than public links.
Backend (Python on Vercel). A single function receives the new JD plus the top-8 pre-filtered candidates, calls Claude Sonnet 4 with a carefully constrained prompt, parses the structured JSON response, and returns ranked results. Truncations are calibrated per field (JD: 1200 chars, CV: 1500 chars, CL: 800 chars) to stay well within the 10-second Vercel timeout while preserving the signal Claude needs.
Storage (Supabase). A three-tier strategy — localStorage for metadata that doesn't need syncing, Supabase Storage for binary PDFs, Supabase Postgres for match history that should follow me across devices.
A naive prompt — "Which of these CVs best matches this job?" — produces confident-sounding garbage. Every CV scores 85%. The reasoning is vague. The output format drifts. After enough iterations, three principles became load-bearing.
The recruiter inside the prompt isn't looking for the best match — they're looking for the best starting point for tailoring. That distinction does real work:
The model now optimises for leverage (how little rewriting is needed) instead of an abstract similarity score. The outputs immediately became more actionable.
An earlier version returned scores on four axes — hard skills, experience, language, cover letter — each with weights and percentages. It looked rigorous and was almost useless. Knowing a CV scored 73% on hard skills doesn't tell you what to change.
The current schema is the opposite: every field is something I can act on.
The recruiter persona stops grading and starts editing.
LLMs pad to fill templates. They give generic advice. They hedge. The prompt now contains explicit anti-patterns:
This isn't elegant prompt writing — it's defensive prompt writing. The model is a collaborator with predictable bad habits, and the prompt is where you name them out loud.
"Engineering is knowing when not to build. The 10-second timeout could have been a constraint to fight — instead, it became a forcing function for cleaner architecture. The matching belongs in the app; the deep tailoring belongs in Claude.ai. Two different tools, two different jobs."
Why two stages instead of one Claude call?
Sending all 31 CVs to Claude per query would be slow, expensive, and would burn context budget on candidates that are obviously wrong. Term-frequency cosine similarity is free, deterministic, and good enough to eliminate the worst two-thirds of the library in milliseconds.
Why TF without IDF?
With a corpus of ~31 documents, IDF's benefit (down-weighting common terms) is small. The simpler version stays auditable and dependency-free in the browser. If the library grows past a few hundred entries, this is the first thing I'd revisit.
Why Claude Sonnet rather than Opus?
Sonnet is the right balance: strong enough to follow a constrained schema and reason about CV-to-JD overlap, cheap enough that running 30+ matches a month is negligible. Opus would be over-spec for this task.
Why keep deep CV optimisation in Claude.ai instead of the app?
Two reasons. First, Vercel's 10-second serverless timeout is hard, and a thorough rewrite often needs longer reasoning. Second, Claude.ai gives me a longer context window and an iterative interface that an API call doesn't. The match step belongs in the app; the rewrite belongs in a conversation.
Why store PDFs in Supabase rather than as base64 in localStorage?
localStorage caps at ~5 MB per origin. A library of 31 tailored CV PDFs would exceed that quickly. Supabase Storage handles the binary load; localStorage holds only the lightweight metadata.
What surprised me. How quickly the Claude API turned a repetitive judgment task into something automatable. The matching logic I was running in my head — "this CV has the trading framing but lacks the Python angle; that one has the Python but reads too policy-heavy" — turned out to be exactly the kind of structured reasoning that an LLM, given the right prompt, does well and does consistently. The second surprise was how much of the work was prompt engineering, not coding. The Python serverless function is 130 lines. The prompt evolved across dozens of revisions.
What I'd do differently. Honestly, nothing I'd flag as a mistake — but the version online today won't be the one in a month. Software like this doesn't get finished; it gets iterated. The TF-IDF question, the prompt format, the scoring schema — each will sharpen as I use the tool more.
What's next. A conversational layer on top of CV Matcher. Right now it's request-response: paste a JD, get a ranked Top 3, end. The next version turns it into a dialogue — "why did you rank #2 over #3?", "can you draft the rewritten Python bullet?", "compare this version of my CV to the suggestion." The match becomes the start of the conversation, not the end of the task.