Semantic Layers Deep Dive
Learn the full YAML specification for semantic layers and how metrics get compiled to SQL.
YAML Specification
Semantic layers are defined in YAML with three main sections:
Semantic Models
Models represent business entities (usually tables):
semantic_models:
- id: orders_model
description: Customer orders
source:
table: public.orders # schema.table reference
dimensions:
- id: order_date
type: time
expr_sql: created_at
- id: region
type: categorical
expr_sql: shipping_region
- id: customer_id
type: primary
expr_sql: customer_id
measures:
- id: order_count
agg: count
expr_sql: id
- id: revenue
agg: sum
expr_sql: order_total
Dimension Types
| Type | Use Case |
|---|---|
categorical | Text groupings (region, status) |
time | Date/timestamp columns |
primary | Primary key (for joins) |
Measure Aggregations
| Agg | SQL |
|---|---|
count | COUNT(expr) |
count_distinct | COUNT(DISTINCT expr) |
sum | SUM(expr) |
avg | AVG(expr) |
min | MIN(expr) |
max | MAX(expr) |
Relationships
Define how models join together:
relationships:
- id: orders_to_customers
from_model: orders_model
to_model: customers_model
sql_join: left
keys:
- left: customer_id
right: id
Join types: inner, left, right, full
Metrics
Metrics are derived from measures:
metrics:
- id: total_revenue
type: simple
measure: revenue # References measure from model
- id: average_order_value
type: derived
expr_sql: total_revenue / order_count
How Metrics Resolve to SQL
When you request a metric, 42Cells compiles it to SQL:
Request:
metric: total_revenue
group_by: [region, order_date]
Generated SQL:
SELECT
orders_model.shipping_region AS region,
orders_model.created_at AS order_date,
SUM(orders_model.order_total) AS total_revenue
FROM public.orders AS orders_model
GROUP BY
orders_model.shipping_region,
orders_model.created_at
Multi-Hop Joins
If you group by a dimension from a related model, joins are added automatically:
Request:
metric: total_revenue
group_by: [customer_name] # From customers_model
Generated SQL:
SELECT
customers_model.name AS customer_name,
SUM(orders_model.order_total) AS total_revenue
FROM public.orders AS orders_model
LEFT JOIN public.customers AS customers_model
ON orders_model.customer_id = customers_model.id
GROUP BY customers_model.name
Dialect Translation
SQL is automatically adapted to your database:
| Feature | PostgreSQL | MySQL |
|---|---|---|
| String concat | ` | |
| Date functions | DATE_TRUNC() | DATE() |
| Boolean | TRUE/FALSE | 1/0 |
Validation
Before publishing, bundles are validated against your live database:
| Check | Error |
|---|---|
| Table exists | Table 'xyz' not found |
| Column exists | Column 'abc' not found in table 'xyz' |
| Type compatibility | Type mismatch: expected numeric |
| Circular references | Circular dependency in views |
Errors include hints and candidate fixes to help the agent correct issues.
Pinning to Notebooks
Semantic layers are versioned. When you pin a version to a notebook:
- The notebook uses that exact version
- Updates to the semantic layer don't affect existing notebooks
- You can update the pin to use a newer version
This ensures reproducibility — your analysis won't break when someone updates a metric definition.
Agent Integration
When a semantic layer is pinned, the agent can:
- Search — Find relevant metrics/dimensions by name or description
- Resolve — Compile metrics to SQL automatically
- Validate — Check that queries use valid definitions
This grounds the agent in your business logic, reducing hallucinations.