Published on April 27, 2026

by AlamedaDev Team

Building an identity resolution system.

For AI-generated briefings.

Search "Maria Garcia, software engineer" on LinkedIn. You'll get over 400 matches. Now imagine your AI-powered briefing system picks the wrong one — and writes a detailed, internally coherent profile about a completely different person.

That's the problem we ran into building Reedy Meet's automatic briefing pipeline. LLMs are exceptional at synthesizing information. They are not, by design, identity-verification tools. They have no native mechanism to detect when results for "Michael Smith" are pulling from three different people who happen to share that name. Solving that requires a fundamentally different approach — one that quantifies certainty rather than inferring it.

Overview of the meeting detail page.

The core insight: identity on the web is not binary

There's no clean threshold that separates "this is the right person" from "this isn't." The available evidence varies too much. Sometimes you have five independent signals converging on the same profile. Sometimes you have a single mention in an outdated directory from 2019.

Modeling that uncertainty as a continuous value between 0 and 1 isn't just an elegant design choice — it's what allows the downstream system to be calibrated about what it knows and honest about what it doesn't. The confidence used to assert something should be proportional to the evidence supporting it.

That score maps to four operational levels that directly control how the briefing is written:

Level	Range	What the system does
HIGH	≥ 0.90	Facts stated directly: "Elena is an ML engineer at Nexus AI."
MEDIUM	0.60–0.89	Hedged language: "According to web sources, Elena works at Nexus AI."
LOW	0.30–0.59	Only high-confidence fields included; the rest are omitted.
FAILED	< 0.30	No briefing generated. Silence is safer than noise.

The distinction between HIGH and MEDIUM is not cosmetic. How a fact is framed changes how much weight the reader gives it — and how much trust they place in the system that generated it.

Context: a guardrail inside a larger pipeline

The briefing generation system works as an agent that chains multiple tools: search, extraction, verification, synthesis. This post covers only the verification step — the guardrail that sits between raw search results and the final report.

The scoring model runs for every external participant in a meeting. Internal participants (those sharing an email domain with the meeting host) are excluded entirely. Identity resolution is only meaningful for people outside the organization — so rather than applying a score penalty, the system uses a context-based exclusion before any signals are evaluated.

Ten signals, each with a weight

The premise is simple: a single signal can fail or mislead. Ten independent signals pointing in the same direction are much harder to refute.

Each signal is a binary question — it fires or it doesn't — and each carries a weight that reflects its real-world discriminative power. Positive signals add to the score; negative signals subtract from it.

Positive signals

Signal	Weight	What it checks
Email exact match	+0.25	Does the person's email appear literally in any indexed result? Email addresses are unique identifiers; finding one in a conference bio or LinkedIn snippet is direct evidence.
Domain name consistency	+0.30	In what percentage of results does the person's name co-occur with their expected company? A ratio above 70% is a strong signal. Works with Gmail too — the company is inferred from context, not from the email domain.
Corporate email domain match	+0.20	Does `john@acme.com` semantically match a company named "Acme Corp" in the results? This is different from domain consistency: that one measures statistical frequency; this one measures semantic coherence. They don't always fire together.
LinkedIn verified	+0.25	Not just "a LinkedIn profile exists." The profile must explicitly show name and expected company together. A profile for "Elena Rodriguez" that says "Product Manager at TechCorp" doesn't verify our Elena from Nexus AI.
Name + title in result heading	+0.15	Do name and company appear together in a page title or headline? Titles are curated editorial content — they carry more semantic weight than body text.
Role consistency	+0.10	Is the current role mentioned in two or more independent sources? A single mention may be a one-off error; two independent mentions are corroboration.
Recent activity	+0.05	Do results include recency markers like "2025", "2026", or "recently"? Recent information is more reliable than old. Low weight, but useful as a tiebreaker.

Negative signals

Signal	Penalty	What it checks
Conflicting company	−0.20	Results associate the person with a large, well-known company that differs from the expected one. "Michael Smith, Senior Engineer at Google" when we expected a small startup is a clear sign a more prominent namesake is drowning out the right person.
Generic name without disambiguators	−0.15	Very common name with no clear LinkedIn, no company, no specific role in any result. The penalty doesn't apply if positive signals are present — it only fires when identifiers are genuinely missing.
Stale information	−0.10	Most results contain indicators of outdated data: "former," "previously," "ex-," or years like 2020–2022.

The calculation

Score = sum of weights of fired signals Final score = clamp(Score, 0, 1)

Two cases make this concrete.

Elena Rodriguez — elena.rodriguez@nexusai.com

Signal	Fires?	Contribution
Email exact match	✓	+0.25
Domain name consistency	✓ (9/10 results mention nexusai)	+0.30
Corporate email domain match	✓	+0.20
LinkedIn verified	✓	+0.25
Name + title in heading	✗	0
Role consistency	✓ ("ML Engineer" in 3 sources)	+0.10
Recent activity	✓	+0.05
Conflicting company	✗	0
Generic name	✗	0
Stale information	✗	0

Raw score: 1.15 → clamped to 1.00 → HIGH

Carlos Lopez — carlos.lopez@gmail.com

Signal	Fires?	Contribution
Email exact match	✗ (not indexed)	0
Domain name consistency	✗ (results scattered across 5+ people)	0
Corporate email domain match	✗ (generic email)	0
LinkedIn verified	✗ (multiple profiles, no clear company)	0
Name + title in heading	✗	0
Role consistency	✗ (contradictory roles across results)	0
Recent activity	✓	+0.05
Conflicting company	✗	0
Generic name	✓	−0.15
Stale information	✓	−0.10

Score: −0.20 → clamped to 0.00 → FAILED

Gmail alone isn't the problem. The problem is the total absence of positive signals combined with a very common name. A Carlos Lopez with Gmail but an active LinkedIn, a clear company, and a consistent role would score very differently.

When the first pass isn't enough: adaptive disambiguation

Scores below 0.7 don't go straight to failure. Instead, the system activates a second, more targeted search using all available identifiers combined:

"Carlos Lopez" "carlos.lopez@gmail.com" "DataCorp" "LinkedIn profile"

Each result from this secondary search is treated as a candidate and scored on a different scale:

Condition	Points
Literal email in result content	+50
Corporate domain in URL	+30
Expected company in content	+20
Name in result title	+15
LinkedIn URL	+10
Recent information	+10

The top candidate wins only if it scores above 40 points and leads the second-ranked candidate by more than 20 points. A narrower margin is flagged as "too close to call" — and no briefing is generated.

In production, this second layer resolves approximately 85% of cases that didn't pass the first one.

Field-level confidence, not just a global score

A global score of 0.75 doesn't mean every fact in the profile is equally reliable. The person's company and current role might be corroborated by five different sources, while their education appears in a single result from 2019. Treating both identically would be a mistake.

The system classifies each profile field by priority and by how many critical signals fired:

High-priority fields (company, role, LinkedIn): considered verified if score ≥ 0.6
Medium-priority fields (experience, corporate website): verified only if score ≥ 0.9 or at least 2 critical signals fired
Low-priority fields (education, old news): always treated as unverified

The briefing generator receives explicit instructions for each tier:

Verified facts → written as direct statements
Unverified data → written with conditional language ("according to web sources…") or omitted entirely

The result is a report that's honest about its own degree of certainty — not an undifferentiated mix of facts and assumptions.

Why not just train a classifier?

Two reasons.

The pragmatic one: labeling enough cases to train a reliable model has real cost in time and money. Iteratively tuned weights produce 2–3% error rates on HIGH cases. That's sufficient for the problem being solved.

The more important reason is interpretability. When the system returns a MEDIUM score of 0.67, you can see exactly why: which signals fired, which didn't, and what information was specifically missing. A black-box model returns a number with no explanation. In a product where user trust depends on being able to audit decisions, that difference matters more than marginal accuracy gains.

Weights are also just parameters. If production data shows email_exact_match is more discriminative than expected, the weight goes from 0.25 to 0.30 — no retraining required.

Production numbers

Score distribution across all processed participants:

Level	Share
HIGH (≥ 0.9)	42%
MEDIUM (0.6–0.9)	31%
LOW (0.3–0.6)	18%
FAILED (< 0.3)	9%

73% of cases reach sufficient confidence to generate a briefing. The 27% that don't reflects the system's real limitations — not a design failure. A system that's honest about uncertainty is more useful than one that papers over it.

From weekly manual validation on 50 random cases:

False positives (incorrect HIGH): 2–3%
False negatives (LOW/FAILED for the correct person): 8–10%, mostly due to recent job changes not yet re-indexed
Precision in HIGH/MEDIUM cases: ~88%

23% of cases trigger the disambiguation layer. Of those, 85% are resolved with enough confidence to proceed.

Known failure modes

The system performs well for tech professionals with active digital presence: engineers with GitHub profiles, LinkedIn, blog posts, or conference appearances. Also for executives with press coverage.

Three scenarios consistently produce lower performance:

Recent job changes. The web takes weeks or months to re-index. Someone who changed companies three weeks ago may still appear in their previous role across most results. The score drops appropriately — the email domain won't match what the results show — but the system can't determine which version is current.

Professionals in traditional sectors. The system is optimized for the tech environment, where digital footprint is high. An executive in manufacturing or distribution with low online presence can produce low scores even if they're clearly identifiable through other means.

Namesakes within the same company. Two people named "John Lee" at TechCorp will confuse the system even when the domain matches perfectly. This is the primary source of false positives, at a rate of 2–3% in HIGH cases.

Generated participant report view.

Conclusion

Resolving identity on the web is an evidence-accumulation problem — not a search problem. The gap between "finding information about someone" and "verifying that the found information belongs to that specific person" feels subtle, but it changes everything about how the system needs to be designed.

The scoring model described here isn't perfect — no heuristic system is. But it's interpretable, tunable, and calibrated in the right direction: it prefers silence over confident mistakes. In applications where an identity error undermines product credibility, that's exactly the right stance.

Let’s build together

We combine experience and innovation to take your project to the next level.