Skip to main content

Semantic Layers Deep Dive

Learn the full YAML specification for semantic layers and how metrics get compiled to SQL.

YAML Specification

Semantic layers are defined in YAML with three main sections:

Semantic Models

Models represent business entities (usually tables):

semantic_models:
- id: orders_model
description: Customer orders
source:
table: public.orders # schema.table reference

dimensions:
- id: order_date
type: time
expr_sql: created_at

- id: region
type: categorical
expr_sql: shipping_region

- id: customer_id
type: primary
expr_sql: customer_id

measures:
- id: order_count
agg: count
expr_sql: id

- id: revenue
agg: sum
expr_sql: order_total

Dimension Types

TypeUse Case
categoricalText groupings (region, status)
timeDate/timestamp columns
primaryPrimary key (for joins)

Measure Aggregations

AggSQL
countCOUNT(expr)
count_distinctCOUNT(DISTINCT expr)
sumSUM(expr)
avgAVG(expr)
minMIN(expr)
maxMAX(expr)

Relationships

Define how models join together:

relationships:
- id: orders_to_customers
from_model: orders_model
to_model: customers_model
sql_join: left
keys:
- left: customer_id
right: id

Join types: inner, left, right, full

Metrics

Metrics are derived from measures:

metrics:
- id: total_revenue
type: simple
measure: revenue # References measure from model

- id: average_order_value
type: derived
expr_sql: total_revenue / order_count

How Metrics Resolve to SQL

When you request a metric, 42Cells compiles it to SQL:

Request:

metric: total_revenue
group_by: [region, order_date]

Generated SQL:

SELECT
orders_model.shipping_region AS region,
orders_model.created_at AS order_date,
SUM(orders_model.order_total) AS total_revenue
FROM public.orders AS orders_model
GROUP BY
orders_model.shipping_region,
orders_model.created_at

Multi-Hop Joins

If you group by a dimension from a related model, joins are added automatically:

Request:

metric: total_revenue
group_by: [customer_name] # From customers_model

Generated SQL:

SELECT
customers_model.name AS customer_name,
SUM(orders_model.order_total) AS total_revenue
FROM public.orders AS orders_model
LEFT JOIN public.customers AS customers_model
ON orders_model.customer_id = customers_model.id
GROUP BY customers_model.name

Dialect Translation

SQL is automatically adapted to your database:

FeaturePostgreSQLMySQL
String concat`
Date functionsDATE_TRUNC()DATE()
BooleanTRUE/FALSE1/0

Validation

Before publishing, bundles are validated against your live database:

CheckError
Table existsTable 'xyz' not found
Column existsColumn 'abc' not found in table 'xyz'
Type compatibilityType mismatch: expected numeric
Circular referencesCircular dependency in views

Errors include hints and candidate fixes to help the agent correct issues.

Pinning to Notebooks

Semantic layers are versioned. When you pin a version to a notebook:

  1. The notebook uses that exact version
  2. Updates to the semantic layer don't affect existing notebooks
  3. You can update the pin to use a newer version

This ensures reproducibility — your analysis won't break when someone updates a metric definition.

Agent Integration

When a semantic layer is pinned, the agent can:

  1. Search — Find relevant metrics/dimensions by name or description
  2. Resolve — Compile metrics to SQL automatically
  3. Validate — Check that queries use valid definitions

This grounds the agent in your business logic, reducing hallucinations.