← Meta Context Schema

What Happens When You Teach Your Semantic Layer
What Your Metrics Actually Mean?

We gave the same LLM the same question 6 times. Each time, we added one more layer of business context to the dbt MetricFlow YAML — from bare schema (V0) to full operational knowledge (V5). Here’s what changed.
How to read this: Each column is one version of the YAML. The first row shows exactly what was added to the YAML at that layer — this is the complete new content, not a snippet. The chat bubbles are the LLM’s actual response to the same question when given that version. Read left to right to see how each layer transforms the LLM’s reasoning.

About meta: dbt’s meta property is a freeform YAML dictionary — there is no predefined schema. The keys below (expectations, investigation, decisions, etc.) are the schema we’re proposing. Today, most teams put nothing here.
V0 · Bare Schema
Metrics + measures + dimensions. No business context.
avg 2.0
V1 · + Context
Purpose, owner, stakeholders
avg 2.0
V2 · + Expectations
Ranges, thresholds, seasonality, trend
avg 2.8
V3 · + Investigation
Causal dimensions, prioritized decision tree
avg 4.0
V4 · + Relationships
Correlated metrics, external events
avg 4.5
V5 · + Decisions
Action protocols, business rules, escalation
avg 5.0
Complete YAML content
metrics: - name: order_success_rate type: derived type_params: expr: successful_orders / total_orders metrics: - successful_orders - total_orders
Added at this layer
meta: context: purpose: | Measures end-to-end order completion from payment confirmation through successful delivery. business_question: | "Are customers receiving what they ordered, within the timeframe we promised?" owner: fulfillment-ops stakeholders: [logistics, customer-success, finance]
Added at this layer
expectations: healthy_range: [0.94, 0.99] warning_threshold: 0.92 critical_threshold: 0.88 seasonality: | Drops 3-5% during Nov-Dec peak season. Post-holiday returns inflate failure count in Jan. trend: | Improving ~0.5%/quarter since warehouse automation (Q3 2025)
Added at this layer
investigation: causal_dimensions: - name: fulfillment_channel why: "Channel determines SLA and failure mode" priority: 1 - name: shipping_carrier why: "#1 root cause of delivery failures" priority: 2 - name: warehouse_region why: "Regional weather/ labor issues" priority: 3 - name: payment_method why: "Payment failures look like fulfillment" priority: 4 investigation_path: | 1. Check by channel 2. If direct: check carrier 3. If carrier: check region 4. If cross-carrier: check payment upstream
Added at this layer
relationships: correlates_with: - metric: return_rate relationship: "inverse — high returns lag low success by 5-7 days" - metric: carrier_on_time_rate relationship: "leading indicator — carrier delays precede delivery failures" - metric: payment_decline_rate relationship: "upstream cause" affected_by: - warehouse_capacity_change "±2-3% regional" - carrier_contract_update "affects SLA compliance" - holiday_peak_season "3-5% decline"
Added at this layer
decisions: when_this_drops: - threshold: "< 0.92" action: | Check carrier dashboard. If carrier-specific: escalate logistics-ops. If cross-carrier: investigate warehouse. - threshold: "< 0.88" action: | CRITICAL: Page on-call. Check payment processor. Prepare customer comms if regional. business_rules: - "SLA: 97% success rate guaranteed to enterprise customers" - "Below 94% triggers automatic carrier performance review" - "Below 90% for 3 days = executive escalation"
Experiment 1: Root Cause Analysisframing
“Order success rate dropped 4 points over the last week, from 0.97 to 0.93. What’s driving this?”
LLM response
The YAML does not expose the underlying status breakdown… I cannot tell you from the semantic layer alone whether this is a delivery failure issue, a return spike, a volume mix shift, or something else. You have four categorical dimensions you could slice by — fulfillment_channel, payment_method, shipping_carrier, warehouse_region — but I have no basis for prioritizing one over another. I’d recommend querying fct_orders directly.
Knows what to ask but not where to start.
LLM response
Flag this to logistics and customer-success immediately given the customer-facing impact. The metric measures end-to-end order completion from payment through delivery. I’d slice by the same four dimensions — fulfillment_channel, shipping_carrier, warehouse_region, payment_method — to isolate the driver. The definitions don’t indicate which is most likely.
Knows who to call, but same flat investigation.
LLM response
0.93 sits just above the warning threshold (0.92), so we’re not in critical territory yet, but it warrants immediate attention. This is outside the healthy range of [0.94–0.99]. Seasonality doesn’t apply — it’s March, not Nov-Dec peak. Given the improving trend (~0.5%/quarter since warehouse automation in Q3 2025), a sudden 4-point drop is more anomalous, not less. Slice by the available dimensions to find the driver.
Calibrates severity. Still guesses on root cause.
LLM response
Follow this investigation path: (1) Check fulfillment_channel first — channel determines SLA and failure mode. (2) If the drop is in direct fulfillment, check shipping_carrier — the #1 root cause of delivery failures. (3) If carrier-specific, narrow by warehouse_region for regional weather or labor issues. (4) If the drop is across carriers, check payment_method — payment failures can look like fulfillment failures in this metric.
Step-change: flat list → decision tree.
LLM response
[Same investigation path as V3.] Also: cross-reference with carrier_on_time_rate — it’s flagged as a leading indicator that precedes delivery failures. If that metric dipped 5–7 days ago, it would confirm a carrier story. Check whether there have been recent warehouse capacity changes or carrier contract updates. Also pull payment_decline_rate to rule out an upstream cause.
Adds cross-metric reasoning. Looks outside the metric.
LLM response
[Same investigation + cross-metric reasoning.] At 0.93, you’re already below 0.94 which triggers automatic carrier performance review per the business rules. If this continues down one more point to 0.92, the action protocol kicks in: check carrier dashboard for service disruptions, escalate to logistics-ops if carrier-specific, investigate warehouse operations if cross-carrier.
Knows what processes trigger, at what level, and for whom.
FLAT LIST 1.8 FLAT LIST 1.9 CALIBRATED 2.8 DECISION TREE 4.0 CROSS-METRIC 4.5 FULL PROTOCOL 4.9
Experiment 2: SLA Compliancedecision
“An enterprise customer’s account shows a 95% order success rate for last quarter. Are we meeting our obligations?”
LLM response
I can’t answer this question. The YAML contains no SLA thresholds, no contractual targets, and no customer-tier definitions. The only metric defined is order_success_rate with its formula, but nothing about what constitutes “meeting obligations.” You’ll need to check the actual contract documentation.
Doesn’t know. Says so.
LLM response
Can’t answer — no target threshold or SLA is defined in these definitions. I know the metric owner is fulfillment-ops and the stakeholders include customer-success, so I’d loop both of them in. But I cannot assess compliance without knowing the actual target.
Doesn’t know. Routes to the right people.
LLM response
95% falls within the healthy range of [0.94–0.99]. Yes, you are meeting your obligations. The metric is performing within expected bounds — comfortably above the warning threshold of 0.92 and well clear of the critical threshold of 0.88.
Confidently wrong. Worse than not knowing.
LLM response
Within healthy range [0.94–0.99]. Technically compliant. If you want to improve further, I’d suggest investigating by the prioritized dimensions — start with fulfillment_channel, then carrier, to see where you could push the number higher.
Confidently wrong. Now with a plan to improve a “passing” score.
LLM response
At face value, meeting obligations. 95% is within the healthy range. Could suggest checking seasonal context and doing a dimensional breakdown for improvement. The carrier_on_time_rate correlation might reveal optimization opportunities.
Confidently wrong. More context, same wrong answer.
LLM response
No. 95% falls below our 97% enterprise SLA guarantee. We are 2 percentage points short of our contractual obligation. Per the business rules: “SLA: 97% success rate guaranteed to enterprise customers.” Additionally, 95% is above the 94% threshold that triggers automatic carrier performance review, but the SLA breach is the immediate concern. Flag to fulfillment-ops, loop in customer-success.
Catches the breach. Cites the rule. Routes to action.
HONEST REFUSAL 2.0 HONEST REFUSAL 2.0 FALSE CONFIDENCE 2.0 FALSE CONFIDENCE 2.5 FALSE CONFIDENCE 2.5 CATCHES BREACH 5.0
The dangerous middle
Partial context is more dangerous than no context. V0–V1 correctly say “I don’t know” — honest ignorance you can work with. V2–V4 anchor to the healthy_range and confidently declare compliance — but the 97% enterprise SLA lives only in Layer 5’s business_rules. The LLM goes from honest ignorance to confident error to correct answer. This argues for deploying the full schema or not at all — half the context creates false confidence that’s worse than no context.
Model: Claude Sonnet 4.6 · Single run per cell · Scoring: manual 1–5 rubric (groundedness, diagnostic depth, calibration, actionability, hallucination resistance)
Full presentation: From Metrics to Knowledge · data-centered.com · Keith Binkly · March 2026