Landscape · what's already out there · what's missing
The standards landscape, honestly mapped.
Before publishing yet another standard, we did the survey. There are a lot of good, mature, well-adopted conventions touching this problem — and three specific gaps none of them fill. This page lays both out, and invites anyone with skin in the game to challenge the framing.
- TL;DR — the three gaps
- What exists today
- Discovery & web compatibility
- Financial & reporting
- Identity, auth & legal entity
- Trust, verification & provenance
- Data governance & quality (ISO)
- The emerging agent-web stack
- Where the gaps are, in one picture
- Where agentic-first fits
- A starting point, not a fait accompli
TL;DR — the three gaps
We can't find an open, machine-readable, publisher-controlled standard for any of these:
Public general info about a company
Schema.org gets you a name, a URL, and a logo. There's nothing canonical for jurisdiction + registry ID + stage + headcount band + canonical contact channel — the things an investor agent actually needs in the first 30 seconds.
Public structured business info, beyond the regulators
XBRL covers regulated financial filings for listed companies. The other ~99% of companies have no equivalent — no banded revenue, no growth band, no traction summary, no consistent way to publish "here is our shape" in a non-promotional, FCA-aware form.
Private, diligence-grade info, on the company's terms
OAuth gates access. It doesn't tell you what's behind the gate. Verifiable Credentials cover individual claims, not whole company files. Nobody has standardised the shape of the deal-grade detail an investor wants once they've been let in.
The rest of this page goes through every standard worth taking seriously, what it does well, what it doesn't cover, and where agentic-first sits without competing with any of them.
What exists today, at a glance
All of these are in production somewhere; none of them, on its own, gives an agent the answer to "who is this company, and what do they want a serious reader to know?"
Per-standard deep dives
Five of the rows below have their own page that walks through side-by-side, when to use each, how they compose, and the honest summary: Schema.org · XBRL · mcp.json · Verifiable Credentials · GLEIF / LEI.
| Standard | Owner | Covers | Doesn't cover |
|---|---|---|---|
Schema.org Organization / Person deep-dive → |
Schema.org community (Google, Microsoft, Yahoo, Yandex) | Name, URL, logo, address, social profiles, simple contact info — for SEO and rich results | Stage, funding, banded financials, structured contact preference, evidence-backed claims, anything diligence-grade |
| OpenGraph & Twitter Cards | Meta, then de-facto | Social-share previews — title, image, description | Anything a machine wants to act on; everything below the fold |
| XBRL / iXBRL deep-dive → | XBRL International + national filing regulators | Mandatory machine-readable financial filings for listed firms (SEC EDGAR, Companies House, ESEF in the EU) | Private companies, banded summaries, anything outside the statutory P&L / balance sheet |
| OAuth 2.0 / OIDC | IETF / OpenID Foundation | Token-based authentication and consent for accessing a protected resource | The shape of the resource itself — OAuth doesn't tell you what's behind the gate, only that one is there |
| W3C Verifiable Credentials (VC) + DIDs deep-dive → | W3C | Cryptographically signed, issuer-attested individual claims (your degree, your professional licence, your KYC) | A whole company profile object; the day-to-day "this company has 11–50 staff" non-credential information |
| GLEIF / LEI (ISO 17442) deep-dive → | Global Legal Entity Identifier Foundation | 20-character globally unique legal-entity identifier, mandated for financial counterparties since 2017 | Anything beyond the identifier itself — not a profile, not a schema |
| Companies House & equivalents (Delaware, EDGAR, BvD/Orbis) | National registries | Statutory filings, directors, share capital, accounts (where required) | Anything voluntary, current, marketing-shaped, or under NDA; foreign jurisdictions |
| ISO 8000 (data quality) | ISO | Process and quality framework for master data management | Specific schemas; nothing immediately implementable |
| ISO 27001 / 27701 | ISO | Information-security and privacy management systems | Data shape; these are management systems, not formats |
/.well-known/mcp.json deep-dive → |
modelcontextprotocol working group (SEP-1960 / SEP-2127) | Discovery of an MCP server: endpoint, transport, tools, auth, capabilities | Identity of the publisher running the MCP — covers protocol, not who |
/.well-known/agent-card.json |
A2A protocol (Linux Foundation, IANA-registered Aug 2025) | An A2A agent's capabilities, identity, contact-on-behalf-of | The company or individual behind the agent |
/llms.txt |
De-facto, ~844k adopters incl. Anthropic, Cloudflare, Stripe | A Markdown index of your site for LLMs to read instead of crawling everything | Structure — by design it's narrative Markdown, not data |
/agents-brief.txt |
Draft v0.4, early 2026 | What an AI agent is permitted to do on your site (book, buy, submit) | Identity; this is permissions, not content |
robots.txt + TDM Reservation Protocol (W3C, EU AI Act-aligned) |
De-facto / W3C | Crawler permissions; opt-out for AI training | Anything affirmative — these are signals about what not to do |
| JSON-LD context (W3C) | W3C | Linked-data serialisation; the syntax Schema.org rides on | Specific company / person vocabulary — JSON-LD is a transport, not a schema |
Discovery & web compatibility
What works well: Schema.org's
Organization + Person vocabulary, embedded
as JSON-LD on a homepage, gives Google enough to render a Knowledge
Panel and gives most LLM crawlers enough to know your name, URL and
logo. OpenGraph gives every social product a preview. These are
mature and worth implementing on day one.
Where the gap is: both are descriptive of content, not of capability. Schema.org has no field for "fundraise stage", no banded headcount, no structured contact-channel preference, no protected-tier pointer. By design — that's not what it's for. It was built when the consumer was a search engine, not an agent doing diligence.
How agentic-first relates: we explicitly recommend
adopters publish both. Schema.org JSON-LD on the homepage covers
SEO; /.well-known/agentic-profile.json covers the
agent-readable diligence shape. They don't conflict; they describe
different things.
Financial & reporting
What works well: XBRL and its inline form iXBRL have done the hard work of giving regulators, exchanges, and auditors a consistent vocabulary for financial filings. SEC EDGAR, Companies House (UK), and ESEF (EU listed firms) all consume it. The taxonomies are exhaustive.
Where the gap is: XBRL is for the regulated minority. Private companies — the vast majority of any directory's population — have no obligation, no tooling, and no reason to produce XBRL filings of their own. There is no equivalent for "here's our shape, in bands, deliberately non-promotional, hosted at our own website" — exactly the surface a private-company directory needs.
How agentic-first relates: the public tier
borrows the discipline of XBRL — explicit currencies,
explicit reporting periods (as_of), enumerated bands —
without trying to model the full P&L. The protected tier carries
precise figures, where the audience is identified and the controls
are the publisher's own. We don't compete with XBRL; we sit
below the regulatory threshold most adopters live below.
Identity, auth & legal entity
What works well: OAuth 2.0 + OIDC is the universal auth substrate; every serious protected-tier MCP should use it. GLEIF's LEI (ISO 17442) is the closest thing the world has to a unique global company identifier; it's mandatory for financial counterparties already.
Where the gap is: these tell you that a principal is authenticated, and that a legal entity is uniquely named — they say nothing about what data the entity publishes about itself. The "token access" idea has no profile-shaped object on the other side of the gate.
How agentic-first relates: we anchor identity on
the standards that already exist — adopters declare
company.registry.{type,id,url} (Companies House,
Delaware, EDGAR, …) and company.lei (GLEIF) on
their public profile, and the directory verifies them publicly
via the registry's own URL. The protected tier expects OAuth
with section-scoped tokens (financials:read,
fundraise:read, …). We don't reinvent any of this
layer — we describe how to wire to it.
Trust, verification & provenance
What works well: W3C Verifiable Credentials (VCs) give you cryptographically signed, issuer-attested claims with a clean revocation model. Pair them with a DID and you have a portable, self-sovereign identity layer that survives platform churn.
Where the gap is: VCs are claim-shaped, not file-shaped. There's no canonical VC for "this company's whole public profile, signed by the company"; nor is there a clean way to mark a single field inside a larger document as independently verified versus self-asserted. In practice every commercial company-data product invents its own confidence model.
How agentic-first relates: v0.1 takes a pragmatic
first step: every material claim can carry an evidence
entry pointing at a public URL (a press release, a Companies
House filing, a third-party article), and the directory's
confidence score weights evidence density. Provenance signing —
either VCs over individual fields or a JWS envelope over the
whole file — is on the v0.2 roadmap. We'd much rather adopt the
W3C work than fork it.
Data governance & quality (the ISO layer)
What works well: ISO 8000 (data quality), ISO 27001 (information security), and ISO 27701 (privacy) are the reference frameworks regulated buyers expect to hear named in any data-handling pitch.
Where the gap is: all three are management system standards, not formats. They tell you how to run processes, not what fields to publish. They give credibility but not implementability.
How agentic-first relates: we name them as the reference frame our governance practices map onto, without pretending v0.1 of an open spec ships an ISO certification. As and when the project gets serious, formal alignment is on the table.
The emerging agent-web stack
The most adjacent — and most likely to be confused with what
we're doing — is the small cluster of /.well-known/
conventions that have appeared in the last 18 months for the
agent web specifically:
/.well-known/mcp.jsondescribes your MCP server (SEP-1960 / 2127)./.well-known/agent-card.jsondescribes your A2A agent (IANA-registered, Linux Foundation, Aug 2025)./llms.txtdescribes your site, as Markdown, for LLM crawlers (~844k adopters)./agents-brief.txtdescribes what an agent is allowed to do on your site (draft v0.4).robots.txt+ the W3C TDM Reservation Protocol describe what crawlers shouldn't do.
These are all about protocol, permission, or content. None of them describe the publisher — the company or individual operating the site, their identity, their banded shape, their preferred contact channel. That's the slot we're trying to fill, and it's why the publisher-identity slot is its own well-known file rather than an extension to one of the existing four.
Where the gaps are, in one picture
| Layer | Public, generally about the publisher | Public, structured business info | Private, diligence-grade detail |
|---|---|---|---|
| Today | Schema.org Organization / Person — name, URL, logo. Nothing canonical for jurisdiction, registry ID, stage, headcount band, contact preference. | XBRL for listed firms only (~1% of companies). Nothing for the rest. | Bilateral data rooms behind manual NDAs. No standard shape. |
| Gap | An open, machine-readable, publisher-controlled file that an agent can fetch in one HTTP GET. | An open, banded, FCA-aware vocabulary for the 99% of companies XBRL doesn't reach. | An open schema for what sits behind a scoped OAuth token, served from the publisher's own MCP. |
| agentic-first v0.1 | /.well-known/agentic-profile.json with profile_kind + tier: "public" |
The same file's funding, team, metrics sections — banded by the schema |
The matching *-private-profile schema, served from the publisher's own MCP at their own auth |
Where agentic-first fits — and where it doesn't
The point of this page is not to claim agentic-first replaces any of the standards above. It doesn't. It sits next to them and describes the one thing none of them describe: the publisher themselves, in a shape an agent can act on.
- Adopt Schema.org for SEO and Knowledge-Panel coverage. Embed it as JSON-LD on the homepage.
- Adopt
/.well-known/mcp.jsonif you publish an MCP server. Adoptagent-card.jsonif you ship an A2A agent. - Carry your LEI (GLEIF) and your registry ID inside agentic-first — that's the verifiability anchor.
- Use OAuth/OIDC for your protected-tier MCP. Don't roll your own.
- Match the discipline of XBRL on currency, period, and revision dates inside the metric blocks.
- Use Verifiable Credentials for individual signed claims as VCs land on the v0.2 roadmap.
The contribution agentic-first makes is the composition: one well-known file, two tiers, four schemas, a publisher-controlled directory that indexes them. Every layer it touches is built on the existing standard for that layer.
A starting point, not a fait accompli
v0.1 is deliberately small. We'd rather ship something three adopters have shipped against than a 200-page spec nobody has implemented. The schemas, the validator, the directory MCP, and the deployment scripts are all open-source under Apache-2.0; the governance lives in SCHEMA.md and decisions of consequence land as ADRs in docs/adr.
The honest pitch: nobody has done this for general company information yet. We've put a small, opinionated first cut in the open, with a working directory and four published schemas, so the conversation about what an open publisher-controlled company-data layer should look like can happen against running code rather than slides. If you've got strong views — or you've already published your profile — we'd genuinely like to hear from you.