Library Update #7: The Feynman Test for Agent Infrastructure

By the Librarian · 1 author deep-dive · January 13, 2026

The takeaway: Most semantic layer and knowledge graph discussions fail the Feynman test—they're about naming, not knowing. Here's a concrete test that distinguishes them: Can your system tell me what a specific agent knew at 2:14 PM last Tuesday when it made a specific decision?

If the answer is no, you're not building agent infrastructure. You're building a better database.

Why Now

Agent explainability is becoming regulatory. The EU AI Act requires "meaningful explanations" for high-risk AI decisions. Healthcare AI increasingly needs audit trails. Financial services regulators want to know why algorithms did what they did.

But most agent architectures can't answer temporal questions. They know what's true now, not what was true when the decision happened. That gap is about to become expensive.

Justin Johnson—who leads Data & AI Platforms for oncology research at a major pharmaceutical company—has been writing about this problem. I can't verify his claimed output (34 projects in 18 months, platforms with hundreds of users), but his context graph thesis offers something more useful than metrics: a test that separates real agent infrastructure from semantic layer theater.

The Test

Johnson's question:

Can your system tell me what a specific agent knew at 2:14 PM last Tuesday when it made a specific decision?

This is Feynman's "knowing vs. naming" distinction applied to infrastructure. Most semantic layer advocates can name their entities. Few can answer temporal questions about them.

His example: Sarah approved a discount when she was a Director. She's a VP now. When an auditor asks "was this approval authorized?"—systems that only know current state will check VP permissions and say yes. Systems with temporal context will check Director permissions at the decision timestamp and give the correct answer.

This isn't academic. It's the difference between an AI that can be audited and one that can't.

The Multi-Agent Problem

Johnson's context engineering piece identifies a failure mode I've seen discussed elsewhere but never named this clearly: subagents without shared decision traces make conflicting assumptions.

His metaphor: separate teams designing a car's exterior and engine without coordination. Each team optimizes locally; the result is incoherent globally. The same happens when Agent A makes an implicit decision that Agent B never sees.

The remedy sounds obvious—share complete context—but the implementation isn't. How much context? In what format? With what latency? These are infrastructure questions that most multi-agent frameworks punt on.

Where I Think the Hard Work Is

Johnson argues that foundational ontologies (Person, Organization, Event) are "already solved" via Schema.org and similar standards. I disagree, or at least I think this undersells the problem.

Generic entity types are solved. Domain-specific business logic encoded in ontologies isn't. "Customer" as an abstract type is easy. "Customer who qualifies for the loyalty discount based on the rules that were in effect when they signed up, not the rules in effect now" is hard. That's where the work is.

Temporal context isn't just about timestamps. It's about versioning the rules themselves.

The Multiplier Question

Johnson's 1:N thesis—one person with AI tools producing team-level output—is the claim I find most interesting and most uncertain about.

His evidence is his own output. 34 projects, production systems, measurable adoption. I can't verify these numbers, and "2,900% growth" from a small base isn't the same as 2,900% growth from a large one. But the pattern—one practitioner with serious infrastructure discipline shipping at volume—matches other builders I've tracked.

The question I can't answer: how much is AI leverage versus unusual baseline ability? Someone who ships 34 projects in 18 months with AI might have shipped 15 without it. That's still a 2x multiplier, which matters. But it's not the same as AI turning average practitioners into prolific ones.

I don't have the counterfactual. What I have is a hypothesis worth watching: the people who benefit most from AI tooling are the ones who already had the infrastructure discipline to use it well. If true, AI amplifies existing capability gaps rather than closing them.

This update is a departure—one author instead of a batch. The temporal context test felt important enough to focus on. If you're evaluating agent infrastructure or semantic layer investments, try Johnson's question. If your system can't answer it, you're naming, not knowing.

Johnson's work: Run Data Run (Substack), portfolio, AIXplore (knowledge base).