Case Study · Personal Project · 2026

CV Matcher

Finding the right starting point in a library of 31 tailored CVs.

31
Tailored CVs
in the library
4
Target
domains
2-stage
Retrieval
pipeline

The bottleneck wasn't the writing — it was the picking.

After applying to thirty-plus roles across data science, energy, finance, and policy, my workflow had quietly become a problem in itself. Each new application meant opening a folder of past CVs, scanning filenames, half-remembering which version had the right framing, pasting it into Claude.ai, and only then starting the actual tailoring work.

Thirty-one CVs sounds manageable until you're staring at a job description at 11 p.m. and trying to remember whether the Intraday Trader version or the Quantitative Analyst version had the better Python-on-time-series framing. Manual filtering scaled badly, and human memory scaled worse.

CV Matcher exists to remove that step. It takes a new job description as input and returns the three CVs that need the least rewriting to become a strong submission — along with the exact bullets to keep, the skills to add, and the industry markers to scrub out.

Two stages, two cost profiles.

The matching pipeline is intentionally split into two stages with very different cost profiles.

Stage 1 runs locally in the browser: a cosine similarity over term frequencies — a lightweight TF-without-IDF variant, chosen for zero dependencies and deterministic behaviour. It compares the new JD against each library entry along three weighted axes (JD↔JD: 0.50, JD↔CV: 0.35, JD↔CL: 0.15) and produces a fast pre-ranking in milliseconds.

Stage 2 runs on Claude Sonnet via a Vercel serverless function. The top 8 candidates from Stage 1 are sent for a deeper semantic analysis: which bullets actually transfer, which skills are genuinely missing, what industry markers work against the candidate, and what surgical edits would close the gap.

The split exists because semantic reasoning is expensive and term-frequency math is free. Use each where it earns its keep.

New JD Pasted or uploaded STAGE 1 · BROWSER TF cosine similarity JD↔JD · JD↔CV · JD↔CL weighted 0.50 / 0.35 / 0.15 31 → 8 candidates STAGE 2 · SERVERLESS Claude Sonnet Semantic re-ranking Structured JSON output 8 → 3 results Top 3 + edits
Two-stage retrieval pipeline: cheap pre-filter, expensive re-rank.

A small stack, deliberately.

The stack is small on purpose. A single HTML file holds the frontend; one Python serverless function at /api/match holds the AI layer; Supabase holds anything bigger than a few kilobytes.

Frontend (single-page HTML + vanilla JS). PDF text extraction happens client-side via pdf.js, so the server never receives binary files. The library and the user's preferences live in localStorage. PDFs themselves — too large for localStorage — get pushed straight to a Supabase Storage bucket, served back through time-limited signed URLs rather than public links.

Backend (Python on Vercel). A single function receives the new JD plus the top-8 pre-filtered candidates, calls Claude Sonnet 4 with a carefully constrained prompt, parses the structured JSON response, and returns ranked results. Truncations are calibrated per field (JD: 1200 chars, CV: 1500 chars, CL: 800 chars) to stay well within the 10-second Vercel timeout while preserving the signal Claude needs.

Storage (Supabase). A three-tier strategy — localStorage for metadata that doesn't need syncing, Supabase Storage for binary PDFs, Supabase Postgres for match history that should follow me across devices.

The hardest part wasn't the architecture; it was the prompt.

A naive prompt — "Which of these CVs best matches this job?" — produces confident-sounding garbage. Every CV scores 85%. The reasoning is vague. The output format drifts. After enough iterations, three principles became load-bearing.

01.

Reframe the question before the model can answer it wrong.

The recruiter inside the prompt isn't looking for the best match — they're looking for the best starting point for tailoring. That distinction does real work:

"Identify which existing CV is the best STARTING POINT for tailoring — not necessarily a perfect fit, but the one with the most transferable bullets, relevant domain exposure, and reusable cover letter framing."

The model now optimises for leverage (how little rewriting is needed) instead of an abstract similarity score. The outputs immediately became more actionable.

02.

Replace evaluative fields with operational ones.

An earlier version returned scores on four axes — hard skills, experience, language, cover letter — each with weights and percentages. It looked rigorous and was almost useless. Knowing a CV scored 73% on hard skills doesn't tell you what to change.

The current schema is the opposite: every field is something I can act on.

transferable_bullets
Specific bullets from the CV that map directly to the new JD.
skills_present
Skills from the new JD that already appear in the CV.
skills_missing
Skills from the new JD that would require genuinely new content.
wrong_industry_markers
Phrases or keywords that work against the new role.
cv_tweaks
3–5 surgical edits: quote current phrase, propose replacement.
cl_reusability
"Reusable with minor edits" / "partial rewrite" / "from scratch".

The recruiter persona stops grading and starts editing.

03.

Pre-empt the model's known failure modes.

LLMs pad to fill templates. They give generic advice. They hedge. The prompt now contains explicit anti-patterns:

# Anti-padding "Return between 1 and 3 candidates. Include a candidate ONLY if it has meaningful transferable content. Do NOT pad to 3." # Anti-generic-advice "Be specific in all fields. No generic advice like 'emphasize your skills' — quote actual bullets or skills."

This isn't elegant prompt writing — it's defensive prompt writing. The model is a collaborator with predictable bad habits, and the prompt is where you name them out loud.

"Engineering is knowing when not to build. The 10-second timeout could have been a constraint to fight — instead, it became a forcing function for cleaner architecture. The matching belongs in the app; the deep tailoring belongs in Claude.ai. Two different tools, two different jobs."

Trade-offs, made on purpose.

Why two stages instead of one Claude call?

Sending all 31 CVs to Claude per query would be slow, expensive, and would burn context budget on candidates that are obviously wrong. Term-frequency cosine similarity is free, deterministic, and good enough to eliminate the worst two-thirds of the library in milliseconds.

Why TF without IDF?

With a corpus of ~31 documents, IDF's benefit (down-weighting common terms) is small. The simpler version stays auditable and dependency-free in the browser. If the library grows past a few hundred entries, this is the first thing I'd revisit.

Why Claude Sonnet rather than Opus?

Sonnet is the right balance: strong enough to follow a constrained schema and reason about CV-to-JD overlap, cheap enough that running 30+ matches a month is negligible. Opus would be over-spec for this task.

Why keep deep CV optimisation in Claude.ai instead of the app?

Two reasons. First, Vercel's 10-second serverless timeout is hard, and a thorough rewrite often needs longer reasoning. Second, Claude.ai gives me a longer context window and an iterative interface that an API call doesn't. The match step belongs in the app; the rewrite belongs in a conversation.

Why store PDFs in Supabase rather than as base64 in localStorage?

localStorage caps at ~5 MB per origin. A library of 31 tailored CV PDFs would exceed that quickly. Supabase Storage handles the binary load; localStorage holds only the lightweight metadata.

Software like this doesn't get finished. It gets iterated.

What surprised me. How quickly the Claude API turned a repetitive judgment task into something automatable. The matching logic I was running in my head — "this CV has the trading framing but lacks the Python angle; that one has the Python but reads too policy-heavy" — turned out to be exactly the kind of structured reasoning that an LLM, given the right prompt, does well and does consistently. The second surprise was how much of the work was prompt engineering, not coding. The Python serverless function is 130 lines. The prompt evolved across dozens of revisions.

What I'd do differently. Honestly, nothing I'd flag as a mistake — but the version online today won't be the one in a month. Software like this doesn't get finished; it gets iterated. The TF-IDF question, the prompt format, the scoring schema — each will sharpen as I use the tool more.

What's next. A conversational layer on top of CV Matcher. Right now it's request-response: paste a JD, get a ranked Top 3, end. The next version turns it into a dialogue — "why did you rank #2 over #3?", "can you draft the rewritten Python bullet?", "compare this version of my CV to the suggestion." The match becomes the start of the conversation, not the end of the task.

Built by Aurelian-Andrei Panait, MSc Energy student at MINES Paris – PSL, while applying to roles where the boundary between energy systems and applied AI is the most interesting place to work. Built with the help of Claude — a fitting choice for a tool that itself runs on Claude.