Landscape · what's already out there · what's missing

The standards landscape, honestly mapped.

Before publishing yet another standard, we did the survey. There are a lot of good, mature, well-adopted conventions touching this problem — and three specific gaps none of them fill. This page lays both out, and invites anyone with skin in the game to challenge the framing.

On this page

TL;DR — the three gaps
What exists today
Discovery & web compatibility
Financial & reporting
Identity, auth & legal entity
Trust, verification & provenance
Data governance & quality (ISO)
The emerging agent-web stack
Where the gaps are, in one picture
Where agentic-first fits
A starting point, not a fait accompli

TL;DR — the three gaps

We can't find an open, machine-readable, publisher-controlled standard for any of these:

Gap 01

Public general info about a company

Schema.org gets you a name, a URL, and a logo. There's nothing canonical for jurisdiction + registry ID + stage + headcount band + canonical contact channel — the things an investor agent actually needs in the first 30 seconds.

Gap 02

Public structured business info, beyond the regulators

XBRL covers regulated financial filings for listed companies. The other ~99% of companies have no equivalent — no banded revenue, no growth band, no traction summary, no consistent way to publish "here is our shape" in a non-promotional, FCA-aware form.

Gap 03

Private, diligence-grade info, on the company's terms

OAuth gates access. It doesn't tell you what's behind the gate. Verifiable Credentials cover individual claims, not whole company files. Nobody has standardised the shape of the deal-grade detail an investor wants once they've been let in.

The rest of this page goes through every standard worth taking seriously, what it does well, what it doesn't cover, and where agentic-first sits without competing with any of them.

What exists today, at a glance

All of these are in production somewhere; none of them, on its own, gives an agent the answer to "who is this company, and what do they want a serious reader to know?"

Per-standard deep dives

Five of the rows below have their own page that walks through side-by-side, when to use each, how they compose, and the honest summary: Schema.org · XBRL · mcp.json · Verifiable Credentials · GLEIF / LEI.

Standard	Owner	Covers	Doesn't cover
Schema.org `Organization` / `Person` deep-dive →	Schema.org community (Google, Microsoft, Yahoo, Yandex)	Name, URL, logo, address, social profiles, simple contact info — for SEO and rich results	Stage, funding, banded financials, structured contact preference, evidence-backed claims, anything diligence-grade
OpenGraph & Twitter Cards	Meta, then de-facto	Social-share previews — title, image, description	Anything a machine wants to act on; everything below the fold
XBRL / iXBRL deep-dive →	XBRL International + national filing regulators	Mandatory machine-readable financial filings for listed firms (SEC EDGAR, Companies House, ESEF in the EU)	Private companies, banded summaries, anything outside the statutory P&L / balance sheet
OAuth 2.0 / OIDC	IETF / OpenID Foundation	Token-based authentication and consent for accessing a protected resource	The shape of the resource itself — OAuth doesn't tell you what's behind the gate, only that one is there
W3C Verifiable Credentials (VC) + DIDs deep-dive →	W3C	Cryptographically signed, issuer-attested individual claims (your degree, your professional licence, your KYC)	A whole company profile object; the day-to-day "this company has 11–50 staff" non-credential information
GLEIF / LEI (ISO 17442) deep-dive →	Global Legal Entity Identifier Foundation	20-character globally unique legal-entity identifier, mandated for financial counterparties since 2017	Anything beyond the identifier itself — not a profile, not a schema
Companies House & equivalents (Delaware, EDGAR, BvD/Orbis)	National registries	Statutory filings, directors, share capital, accounts (where required)	Anything voluntary, current, marketing-shaped, or under NDA; foreign jurisdictions
ISO 8000 (data quality)	ISO	Process and quality framework for master data management	Specific schemas; nothing immediately implementable
ISO 27001 / 27701	ISO	Information-security and privacy management systems	Data shape; these are management systems, not formats
`/.well-known/mcp.json` deep-dive →	modelcontextprotocol working group (SEP-1960 / SEP-2127)	Discovery of an MCP server: endpoint, transport, tools, auth, capabilities	Identity of the publisher running the MCP — covers protocol, not who
`/.well-known/agent-card.json`	A2A protocol (Linux Foundation, IANA-registered Aug 2025)	An A2A agent's capabilities, identity, contact-on-behalf-of	The company or individual behind the agent
`/llms.txt`	De-facto, ~844k adopters incl. Anthropic, Cloudflare, Stripe	A Markdown index of your site for LLMs to read instead of crawling everything	Structure — by design it's narrative Markdown, not data
`/agents-brief.txt`	Draft v0.4, early 2026	What an AI agent is permitted to do on your site (book, buy, submit)	Identity; this is permissions, not content
`robots.txt` + TDM Reservation Protocol (W3C, EU AI Act-aligned)	De-facto / W3C	Crawler permissions; opt-out for AI training	Anything affirmative — these are signals about what not to do
JSON-LD context (W3C)	W3C	Linked-data serialisation; the syntax Schema.org rides on	Specific company / person vocabulary — JSON-LD is a transport, not a schema

Discovery & web compatibility

What works well: Schema.org's Organization + Person vocabulary, embedded as JSON-LD on a homepage, gives Google enough to render a Knowledge Panel and gives most LLM crawlers enough to know your name, URL and logo. OpenGraph gives every social product a preview. These are mature and worth implementing on day one.

Where the gap is: both are descriptive of content, not of capability. Schema.org has no field for "fundraise stage", no banded headcount, no structured contact-channel preference, no protected-tier pointer. By design — that's not what it's for. It was built when the consumer was a search engine, not an agent doing diligence.

How agentic-first relates: we explicitly recommend adopters publish both. Schema.org JSON-LD on the homepage covers SEO; /.well-known/agentic-profile.json covers the agent-readable diligence shape. They don't conflict; they describe different things.

Financial & reporting

What works well: XBRL and its inline form iXBRL have done the hard work of giving regulators, exchanges, and auditors a consistent vocabulary for financial filings. SEC EDGAR, Companies House (UK), and ESEF (EU listed firms) all consume it. The taxonomies are exhaustive.

Where the gap is: XBRL is for the regulated minority. Private companies — the vast majority of any directory's population — have no obligation, no tooling, and no reason to produce XBRL filings of their own. There is no equivalent for "here's our shape, in bands, deliberately non-promotional, hosted at our own website" — exactly the surface a private-company directory needs.

How agentic-first relates: the public tier borrows the discipline of XBRL — explicit currencies, explicit reporting periods (as_of), enumerated bands — without trying to model the full P&L. The protected tier carries precise figures, where the audience is identified and the controls are the publisher's own. We don't compete with XBRL; we sit below the regulatory threshold most adopters live below.

Identity, auth & legal entity

What works well: OAuth 2.0 + OIDC is the universal auth substrate; every serious protected-tier MCP should use it. GLEIF's LEI (ISO 17442) is the closest thing the world has to a unique global company identifier; it's mandatory for financial counterparties already.

Where the gap is: these tell you that a principal is authenticated, and that a legal entity is uniquely named — they say nothing about what data the entity publishes about itself. The "token access" idea has no profile-shaped object on the other side of the gate.

How agentic-first relates: we anchor identity on the standards that already exist — adopters declare company.registry.{type,id,url} (Companies House, Delaware, EDGAR, …) and company.lei (GLEIF) on their public profile, and the directory verifies them publicly via the registry's own URL. The protected tier expects OAuth with section-scoped tokens (financials:read, fundraise:read, …). We don't reinvent any of this layer — we describe how to wire to it.

Trust, verification & provenance

What works well: W3C Verifiable Credentials (VCs) give you cryptographically signed, issuer-attested claims with a clean revocation model. Pair them with a DID and you have a portable, self-sovereign identity layer that survives platform churn.

Where the gap is: VCs are claim-shaped, not file-shaped. There's no canonical VC for "this company's whole public profile, signed by the company"; nor is there a clean way to mark a single field inside a larger document as independently verified versus self-asserted. In practice every commercial company-data product invents its own confidence model.

How agentic-first relates: v0.1 takes a pragmatic first step: every material claim can carry an evidence entry pointing at a public URL (a press release, a Companies House filing, a third-party article), and the directory's confidence score weights evidence density. Provenance signing — either VCs over individual fields or a JWS envelope over the whole file — is on the v0.2 roadmap. We'd much rather adopt the W3C work than fork it.

Data governance & quality (the ISO layer)

What works well: ISO 8000 (data quality), ISO 27001 (information security), and ISO 27701 (privacy) are the reference frameworks regulated buyers expect to hear named in any data-handling pitch.

Where the gap is: all three are management system standards, not formats. They tell you how to run processes, not what fields to publish. They give credibility but not implementability.

How agentic-first relates: we name them as the reference frame our governance practices map onto, without pretending v0.1 of an open spec ships an ISO certification. As and when the project gets serious, formal alignment is on the table.

The emerging agent-web stack

The most adjacent — and most likely to be confused with what we're doing — is the small cluster of /.well-known/ conventions that have appeared in the last 18 months for the agent web specifically:

/.well-known/mcp.json describes your MCP server (SEP-1960 / 2127).
/.well-known/agent-card.json describes your A2A agent (IANA-registered, Linux Foundation, Aug 2025).
/llms.txt describes your site, as Markdown, for LLM crawlers (~844k adopters).
/agents-brief.txt describes what an agent is allowed to do on your site (draft v0.4).
robots.txt + the W3C TDM Reservation Protocol describe what crawlers shouldn't do.

These are all about protocol, permission, or content. None of them describe the publisher — the company or individual operating the site, their identity, their banded shape, their preferred contact channel. That's the slot we're trying to fill, and it's why the publisher-identity slot is its own well-known file rather than an extension to one of the existing four.

Where the gaps are, in one picture

Layer	Public, generally about the publisher	Public, structured business info	Private, diligence-grade detail
Today	Schema.org Organization / Person — name, URL, logo. Nothing canonical for jurisdiction, registry ID, stage, headcount band, contact preference.	XBRL for listed firms only (~1% of companies). Nothing for the rest.	Bilateral data rooms behind manual NDAs. No standard shape.
Gap	An open, machine-readable, publisher-controlled file that an agent can fetch in one HTTP GET.	An open, banded, FCA-aware vocabulary for the 99% of companies XBRL doesn't reach.	An open schema for what sits behind a scoped OAuth token, served from the publisher's own MCP.
agentic-first v0.1	`/.well-known/agentic-profile.json` with `profile_kind` + `tier: "public"`	The same file's `funding`, `team`, `metrics` sections — banded by the schema	The matching `*-private-profile` schema, served from the publisher's own MCP at their own auth

Where agentic-first fits — and where it doesn't

The point of this page is not to claim agentic-first replaces any of the standards above. It doesn't. It sits next to them and describes the one thing none of them describe: the publisher themselves, in a shape an agent can act on.

Adopt Schema.org for SEO and Knowledge-Panel coverage. Embed it as JSON-LD on the homepage.
Adopt /.well-known/mcp.json if you publish an MCP server. Adopt agent-card.json if you ship an A2A agent.
Carry your LEI (GLEIF) and your registry ID inside agentic-first — that's the verifiability anchor.
Use OAuth/OIDC for your protected-tier MCP. Don't roll your own.
Match the discipline of XBRL on currency, period, and revision dates inside the metric blocks.
Use Verifiable Credentials for individual signed claims as VCs land on the v0.2 roadmap.

The contribution agentic-first makes is the composition: one well-known file, two tiers, four schemas, a publisher-controlled directory that indexes them. Every layer it touches is built on the existing standard for that layer.

A starting point, not a fait accompli

v0.1 is deliberately small. We'd rather ship something three adopters have shipped against than a 200-page spec nobody has implemented. The schemas, the validator, the directory MCP, and the deployment scripts are all open-source under Apache-2.0; the governance lives in SCHEMA.md and decisions of consequence land as ADRs in docs/adr.

The honest pitch: nobody has done this for general company information yet. We've put a small, opinionated first cut in the open, with a working directory and four published schemas, so the conversation about what an open publisher-controlled company-data layer should look like can happen against running code rather than slides. If you've got strong views — or you've already published your profile — we'd genuinely like to hear from you.

Read the v0.1 standard → Open the directory MCP Send feedback