The Three Levels of AI Analytics Maturity

April 06, 2026 · 8 min read · Huy Nguyen

A VP of Sales asks the AI assistant: "What was our revenue last quarter?" The system returns $4.2M. She follows up: "How does that compare to the previous quarter?" The system returns a different number — $3.9M — but now "revenue" means something slightly different. The first answer used net revenue. The second silently switched to gross. Nobody flagged it. The AI answered confidently both times.

This is the kind of failure that erodes trust in AI analytics. And it happens more often than most teams realize, because the root cause is architectural.

The industry has largely converged on a shared view: AI analytics needs a semantic layer. Direct text-to-SQL is too fragile. A governed layer of business definitions gives the AI grounding — metric definitions, join logic, relationships — so it stops guessing from raw schemas.

That consensus is correct. But it obscures a harder question.

How expressive is the semantic layer? Because that determines whether AI stays governed for one question, or for the full range of questions your team actually asks.

A maturity model in three levels

There are three distinct architectural levels emerging in AI analytics, and understanding them is the key to choosing the right AI BI tools for your team. Each one handles the "revenue last quarter vs. previous quarter" scenario differently — and the differences matter.

Level 1: Text-to-SQL

At the first level, the AI reads your database schema and generates SQL directly from natural language.

This is the version you see in demos. Someone types "Show me revenue by region," and the system produces a query that runs. Impressive for a live audience. Unreliable in a production environment.

The problem is that raw schemas carry almost no business context. "Revenue" might map to three different columns depending on the table. Join paths are ambiguous. Metric definitions are contested. The AI has to infer all of this from column names and table structures — which is roughly equivalent to asking a new analyst to write queries on their first day, with no documentation and no one to ask.

Direct text-to-SQL is a probabilistic translation engine operating without guardrails. It produces executable SQL that may completely misrepresent what the business actually means by the question. (For a deeper look at why this translation step is so fragile, see our text-to-SQL explainer.)

For simple, well-structured queries, it works often enough to be seductive. For anything requiring business judgment — which metric definition to use, which join path to follow, how to handle time zones or fiscal calendars — it degrades fast.

Level 2: AI over a conventional semantic layer

At the second level, the AI queries a governed semantic layer instead of raw schemas. Metric definitions are centralized. Relationships are encoded. Join logic is controlled by the platform.

This is a real improvement. The AI no longer guesses what "revenue" means — it looks it up. For simple requests like "monthly revenue by region for the last 12 months," the system maps the question into a structured intermediary format (typically something like a metric + dimension + time range payload) and compiles it to SQL.

For first-order questions, Level 2 works well.

The ceiling appears when questions get harder.

A stakeholder sees the revenue-by-region chart and immediately asks: "How does this compare to last quarter?" That requires a period-over-period comparison. Or: "What's the average deal size by sales rep?" — nested aggregation. Or: "Show me the top 10 products as a percentage of total revenue" — ranking plus composition. Or: "What's the retention rate by acquisition cohort?" — multi-step analytical logic.

These are routine follow-up questions in any serious analytics workflow. They are also the questions that most conventional semantic layers cannot express natively.

The intermediary format — that metric-dimension-filter payload — was designed for simple retrieval. When the question exceeds what the format can represent, the system faces a choice: refuse the question, or fall back to generating SQL directly.

Most systems fall back. And when they do, they have re-entered the text-to-SQL loop from Level 1. The semantic layer is still there, technically. But the AI has stepped outside it.

This is what I call conditional governance : governed when the question is simple, probabilistic when it gets hard.

Level 3: AI over an expressive semantic layer with a composable query language

At the third level, the AI translates natural language into a composable, analytics-specific query language that operates on the semantic layer. The intermediary is no longer a rigid payload. It is an expressive language purpose-built for analytical reasoning.

In this architecture, the AI generates queries in a language that treats metrics as first-class objects, supports period-over-period comparisons as native operations, handles nested aggregations through composable pipes, and expresses cross-grain ratios without escaping to raw SQL.

The governed surface area expands dramatically. The AI can answer the simple question and the three follow-ups that come after it — all within the same governed model.

Holistics is built around this architecture. Natural language is translated into AQL (Analytics Query Language), which operates on governed models, metrics, dimensions, and relationships defined in the semantic layer. AQL then compiles to SQL. The AI reasons in analytics-native abstractions. The compilation layer handles the dialect-specific details.

The result: governance extends across the full question space — every question, including the follow-ups.

Conditional governance: the hidden failure mode

The gap between Level 2 and Level 3 matters more than the gap between Level 1 and Level 2. Here is why.

When a team moves from text-to-SQL to a conventional semantic layer, they gain obvious reliability improvements. Simple questions get consistent answers. That feels like progress, and it is.

But the dangerous part is what happens at the boundary — when a question exceeds the semantic layer's expressive capacity.

In classic BI, this boundary was manageable. An analyst would hit the limit of the tool, recognize it, and write custom SQL themselves. The human was the safety net.

In AI analytics, there is no analyst in the loop for most queries. The system receives a question, determines it cannot express the answer within the semantic layer, and silently generates SQL. The user sees an answer. They have no way to know that this particular answer came from ungoverned SQL generation rather than the governed model.

The consequence is worse than having no semantic layer at all for that question — because the user trusts the system more. They believe the semantic layer is protecting them. It was, for the first question. For this follow-up, it quietly stopped.

A weak semantic layer postpones the text-to-SQL problem rather than eliminating it. And postponement with false confidence is a worse failure mode than the original problem.

What expressiveness actually means

"Semantic layer present" is the wrong evaluation criterion. The right one: what percentage of the analytical question space can the layer express natively? (We compare how today's leading platforms handle this in our semantic layer BI tools comparison.)

Expressiveness means the system can represent these patterns without pushing logic to raw SQL:

Period-over-period comparisons. Revenue this quarter vs. last quarter. Growth rates. Year-over-year trends.
Nested aggregations. Average of a sum. Median deal size across reps. Aggregate-of-aggregate patterns.
Cross-grain ratios. Conversion rates where the numerator and denominator exist at different levels of granularity.
Ranking and composition. Top N products by revenue, with each product's share of total.
Cohort analysis. Retention by acquisition month. Behavior segmented by signup date.
Parameterized metrics. N-day conversion windows where N varies dynamically.
Context-aware calculations. Metrics that respond correctly when dimensions, filters, or time windows change around them.

These are routine analytical patterns — the second and third questions that follow every first question in a real analytics session. Any semantic layer that cannot express them natively will force the AI back to SQL generation — governed for the opener, ungoverned for the follow-up.

A practical diagnostic

If you are evaluating BI tools for AI readiness, the maturity model gives you a concrete way to test. (For a side-by-side look at how current vendors stack up, see our AI-powered BI tools comparison.)

Start with a simple question: "Show revenue by region." Every vendor will answer this correctly. It tells you almost nothing about their architecture.

Then ask the follow-ups:

"How does that compare to last quarter?" (period-over-period)
"Which regions grew fastest?" (ranking on a derived metric)
"What's the average deal size by rep in the top 3 regions?" (nested aggregation + filtering)
"Show retention by acquisition cohort for those regions." (multi-step analytical logic)

Watch what happens. Does the system stay within governed semantics for all four questions? Or does it silently switch to SQL generation after the first one?

Ask the vendor directly: what percentage of the analytical question space can your AI answer without escaping the governed layer?

If the answer is vague — "we handle most common questions" — you are likely looking at Level 2. Governed for simple queries, probabilistic for everything else.

If the answer is specific and architectural — "our query language natively expresses these patterns, and we can show you the governed query the AI generated" — you are closer to Level 3.

Where this leads

The market will figure this out. As more teams deploy AI analytics in production environments (beyond demos), the conditional governance problem will surface repeatedly. An executive will get contradictory answers to related questions. A finance team will find that AI-generated numbers diverge from the governed dashboard. Someone will trace the discrepancy back to an ungoverned SQL fallback.

When that happens, the evaluation criteria will shift. "Has a semantic layer" will stop being sufficient. Teams will start asking how much of the question space the semantic layer actually covers.

That shift favors architectures built for expressiveness from the ground up — systems where the AI reasons in a composable analytical language, where metrics are first-class objects, and where governance extends to the full range of questions real stakeholders ask — the first question and every follow-up after it.

The diagnostic is simple. The architectural implications are not. But the question you should ask your vendor today is clear enough: when the AI answers a hard question, is it still inside the governed model? Or has it quietly left?

Huy Nguyen

Data Engineer turned Product; writes SQL for a living.