Phase 2 of 4: Data & Ontology

Define what your data means
before you build on it.

Ontology design, classification, governed pipelines, and access control. In that order. Raw data is not an asset until someone defines what it is, who owns it, how reliable it is, and who can see it.

Talk to Chris See Phase 2 in the journey

The data architecture

Six layers. Each one necessary.

Data moves from raw sources through classification and curation to a governed semantic model with controlled access. Every layer has a job. None are optional.

01 ◎

Source Systems

Where your data lives today.

Every piece of business data originates somewhere: relational databases, REST APIs, event streams, SaaS platforms, flat files, IoT feeds, legacy systems. Before we can govern anything, we need a complete inventory of what exists, who owns it, and what it actually means. Most organizations discover they have two or three authoritative sources of truth for the same entity, and none of them agree.

Catalog all data sources: databases, APIs, files, streams, SaaS platforms
Assess current data quality and completeness at each source
Identify authoritative sources vs. derived or stale copies
Map ownership: who controls each source, who depends on it

02 ↓

Ingestion Layer

Data moves. Reliably, with full fidelity.

The ingestion layer moves data from source systems into the governed environment at the right frequency, with the right fidelity, without breaking production systems. Poorly designed ingestion is the root cause of most downstream data quality failures. Not bad data, bad movement. A record that gets dropped, duplicated, or silently truncated in ingestion corrupts every downstream consumer.

Define ingestion frequency per source: streaming, micro-batch, or scheduled
Build connectors for each source type: standard and custom
Implement dead-letter queues and retry logic for failed records
Record raw-as-received data before any transformation (full fidelity landing)

03 ◈

Classification Layer

Data is typed, tagged, scored, and tracked.

This is the layer most organizations skip, and the absence of it is exactly why their AI fails. Classification inspects incoming data and answers the questions that governance requires: What type is it? How sensitive is it? What business object does it represent? How trustworthy is it? How did it get here? Without classification, you do not have governed data. You have a pile of bytes that looks like data.

Schema inference: column types, formats, nullable patterns, cardinality
PII detection: names, emails, SSNs, addresses, financial data, health records
Sensitivity classification: Public, Internal, Confidential, Restricted
Business object mapping: tag which ontology entity each column or record represents

04 ▦

Curated Data Zones

Raw becomes reliable. Reliable becomes business-ready.

Curated zones apply progressive refinement: data moves from raw to validated to cleaned to enriched to business-ready. Each zone has explicit quality contracts. Nothing advances without passing them. This is where most of the transformation engineering lives, and where most DIY data pipelines quietly fail because the contracts were never defined.

Bronze: raw as received, immutable, full fidelity, retained for audit and replay
Silver: validated, deduplicated, type-cast, nulls handled, schema normalized
Gold: joined, enriched, aggregated, business rules applied, ontology-aligned
Quality gates at each zone boundary: data does not advance without passing defined tests

05 ⬡

Ontology & Semantic Layer

Data becomes knowledge. Objects have relationships.

The semantic layer maps curated data to your business ontology: the formal definition of what your business objects are, how they relate, and what they mean. This is what gives AI real reasoning capability instead of pattern matching over raw column names. Palantir built a billion-dollar company around exactly this concept.

Formal ontology definition: Customer, Order, Asset, Employee and all their properties
Relationship mapping: Customer has Orders, Order contains Products, Asset belongs to Location
Semantic API: query business objects by name, not by table join and column alias
Knowledge graph: relationships traversable by AI agents and analytical query engines

06 ⇌

Service & Access Layer

Governed access. Role-aware. Fully audited.

The service layer is where governed data becomes accessible to applications, analysts, and AI. MCP servers live here. Every read, every action is mediated by access controls built on the ontology and sensitivity classifications from lower layers. AI does not reach raw data; it reaches a governed interface that knows exactly what it is allowed to see and do, and logs everything it touches.

MCP servers: governed AI access to read data and take action in your systems
REST APIs: application access to business objects, not raw tables or SQL
Row-level security: data filtered by user role, team, or sensitivity classification tier
Column-level masking: PII redacted or tokenized based on classification

The hard layer

Classification is where most data projects fail.

Most teams treat ingestion as the hard part. It is not. Getting data into a system is easy. Knowing what that data is (what it means, how sensitive it is, what business entity it represents, and whether it can be trusted) is what separates governed data from a pile of bytes.

We build the classification layer as a discrete, testable service. Not a convention, not a naming scheme, not documentation nobody reads. It runs on every record that lands, it emits structured metadata to the catalog, and it blocks bad data from advancing to curated zones.

When AI needs to reason over your data, this is the work that makes it possible. The model does not hallucinate a customer record when the classification layer has already told it what a customer is, what fields are authoritative, and what sensitivity tier applies.

◎

Schema detection Type inference, format detection, nullable patterns, cardinality analysis

⚿

PII detection Names, emails, SSNs, addresses, health records. Tagged before they move anywhere.

▦

Sensitivity tiers Public, Internal, Confidential, Restricted. Enforced downstream.

⇌

Lineage tracking Where did it come from, what happened to it, what depends on it

◈

Quality scoring Completeness, uniqueness, validity, freshness. Per field, per table, per source.

⬡

Trust scoring Source reliability, recency, known issue flags. One number AI can use.

The access bridge

MCP servers sit on top of the service layer.
That is not an accident.

MCP (Model Context Protocol) servers are the last-mile access layer for AI agents. They can only be built correctly when the ontology defines their schema, the classification layer tells them what is sensitive, and the service layer tells them who can see what. The data architecture is what makes governed AI access possible.

Ontology defines What business objects exist and what they mean

↓

Classification governs What data is sensitive and what tier applies

↓

Service layer enforces Who can read what, with full audit logging

↓

MCP exposes Governed, typed, contextual access for AI agents

↓

AI reasons Over real business knowledge, not raw noise

Find out where your data actually stands.

The AI Readiness Assessment scores your data architecture across all six layers. Know what is governed, what is missing, and what to fix first.

Take the Assessment → Talk to Chris

Define what your data meansbefore you build on it.