From da9207ded368370d6a417b9587499528b053365e Mon Sep 17 00:00:00 2001 From: Bruno Sarlo Date: Wed, 14 Jan 2026 09:18:13 -0300 Subject: [PATCH] v1.0.0-draft: First public draft release agents.md protocol for AI agent web discovery. Key features: - Two formats: Pure Markdown (simple) or YAML frontmatter (structured) - MCP gateway integration for tool access - Discovery via /.well-known/agents.md - Security: origin trust, endpoint validation, auth guidance - Backward compatible with robots.txt and llms.txt Design based on 3-iteration process: 1. Gap analysis and planning 2. Multi-model consensus on format decisions 3. Code review for completeness and clarity Philosophy: robots.txt says what agents CANNOT do, agents.md says what they CAN do. Co-Authored-By: Claude Opus 4.5 --- CHANGELOG.md | 52 +++++++ README.md | 60 ++++++--- spec/README.md | 360 ++++++++++++++++++++++++++++++++----------------- 3 files changed, 327 insertions(+), 145 deletions(-) create mode 100644 CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..26f0583 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,52 @@ +# Changelog + +All notable changes to the agents.md protocol specification. + +## [1.0.0-draft] - 2026-01-14 + +First public draft release. + +### Added + +- **Core specification** defining agents.md file format +- **Two format options**: Pure Markdown (simple) and YAML frontmatter + Markdown (structured) +- **MCP gateway integration** for pointing agents to Model Context Protocol servers +- **Discovery mechanism** via `/.well-known/agents.md` (primary) and `/agents.md` (fallback) +- **Security section** with origin trust, endpoint validation, and authentication guidance +- **Backward compatibility** guidance for robots.txt and llms.txt coexistence +- **Caching recommendations** for both servers and agents +- **Examples** covering minimal sites, MCP-enabled APIs, and OAuth-protected e-commerce + +### Design Decisions + +Based on multi-model consensus: + +1. **Hybrid format** - Optional YAML frontmatter for machine-readable config, Markdown body for human context +2. **Simple naming** - Keep `Can/Cannot` section names for intuitiveness +3. **MCP as pointer** - Reference MCP endpoints, don't embed tool schemas +4. **Websites first** - Focus on web use case, extensible for future + +### Philosophy + +| Standard | Purpose | +|----------|---------| +| robots.txt | What agents cannot access | +| llms.txt | What content is important | +| agents.md | What agents can do | + +## [0.2.0] - 2026-01-14 + +Internal revision with MCP integration. + +### Changed +- Simplified format from API-centric to text-centric +- Added MCP gateway pointer concept +- Removed JSON Schema tool definitions + +## [0.1.0] - 2026-01-14 + +Initial internal draft. + +### Added +- Basic protocol concept +- API tool definitions (later removed) diff --git a/README.md b/README.md index c171f6a..2e42906 100644 --- a/README.md +++ b/README.md @@ -2,15 +2,18 @@ **Tell AI agents what they can do on your website.** -## The Gap +[![Version](https://img.shields.io/badge/version-1.0.0--draft-blue)]() +[![License](https://img.shields.io/badge/license-CC0-green)]() -| File | Purpose | -|------|---------| -| robots.txt | What bots **cannot access** | +## The Problem + +| File | What it tells agents | +|------|---------------------| +| robots.txt | What you **cannot access** | | llms.txt | What **content matters** | -| **agents.md** | What agents **can do** | +| **???** | What you **can do** | -## Quick Start +## The Solution Create `/.well-known/agents.md`: @@ -31,13 +34,20 @@ An online bookstore. hello@mysite.com ``` -That's it. Plain text. Human readable. +That's it. Plain text. Human readable. Machine parseable. ## With MCP Gateway -Point agents to your MCP server for structured tool access: +Point agents to your [MCP](https://modelcontextprotocol.io/) server for structured tool access: -```markdown +```yaml +--- +version: "1.0" +mcp: + endpoint: https://mysite.com/.well-known/mcp + transport: streamable-http + auth: oauth2 +--- # My Site An online bookstore. @@ -47,11 +57,6 @@ An online bookstore. - Check prices - Place orders (authenticated) -## MCP -endpoint: https://mysite.com/.well-known/mcp -transport: streamable-http -auth: oauth2 - ## Contact hello@mysite.com ``` @@ -68,20 +73,31 @@ Agent requests /.well-known/agents.md ## Documentation -- [Specification](./spec/README.md) - Full protocol spec -- [Examples](./examples/) - Real-world examples -- [FAQ](./docs/FAQ.md) - Common questions +- **[Specification](./spec/README.md)** - Full protocol spec (v1.0.0-draft) +- **[Examples](./examples/)** - Real-world examples +- **[FAQ](./docs/FAQ.md)** - Common questions +- **[Changelog](./CHANGELOG.md)** - Version history + +## Quick Comparison + +| Aspect | robots.txt | llms.txt | agents.md | +|--------|------------|----------|-----------| +| Purpose | Crawl control | Content summary | Capabilities | +| Format | Custom | Markdown | Markdown + YAML | +| MCP | No | No | Yes | ## Status -**Draft** - Version 0.2.0 +**v1.0.0-draft** - First public draft release + +Feedback welcome via issues. ## Related Standards -- [robots.txt](https://www.rfc-editor.org/rfc/rfc9309) - Crawl restrictions (1994) -- [llms.txt](https://llmstxt.org/) - Content for LLMs (2024) -- [AGENTS.md](https://agents.md/) - Repository instructions (2025) -- [MCP](https://modelcontextprotocol.io/) - Tool protocol (2024) +- [robots.txt](https://www.rfc-editor.org/rfc/rfc9309) - RFC 9309 (1994) +- [llms.txt](https://llmstxt.org/) - Jeremy Howard (2024) +- [AGENTS.md](https://agents.md/) - OpenAI/Sourcegraph (2025) +- [MCP](https://modelcontextprotocol.io/) - Anthropic (2024) ## License diff --git a/spec/README.md b/spec/README.md index b2c4251..1e4a66f 100644 --- a/spec/README.md +++ b/spec/README.md @@ -1,6 +1,6 @@ # agents.md Protocol Specification -**Version:** 0.2.0 +**Version:** 1.0.0-draft **Status:** Draft **Updated:** 2026-01-14 @@ -18,23 +18,40 @@ A simple text file that tells AI agents what they can do on a website and option ## 1. Discovery -**Location:** `/.well-known/agents.md` or `/agents.md` +### Location -**Content-Type:** `text/markdown` or `text/plain` +Primary: `/.well-known/agents.md` +Fallback: `/agents.md` -Agents request the file like any HTTP resource: +### Content-Type -``` +`text/markdown` or `text/plain` + +### Request + +```http GET /.well-known/agents.md HTTP/1.1 Host: example.com -User-Agent: MyAgent/1.0 +User-Agent: MyAgent/1.0 (AI Agent) ``` +### Caching + +Servers SHOULD set appropriate `Cache-Control` headers: + +```http +Cache-Control: public, max-age=86400 +``` + +Agents SHOULD cache the `agents.md` file for 24 hours unless HTTP headers specify otherwise. Agents MUST NOT request this file more than once per hour for the same origin. + ## 2. Format -Plain Markdown. Human readable. Machine parseable. +Two formats are supported: -### Minimal Example +### Format A: Pure Markdown (Simple) + +Plain Markdown with section headers. Best for simple sites. ```markdown # Example Site @@ -54,9 +71,18 @@ A bookstore since 2010. agents@example.com ``` -### With MCP Gateway +### Format B: YAML Frontmatter + Markdown (Structured) + +YAML frontmatter for machine-readable configuration, Markdown body for human context. Recommended when using MCP. ```markdown +--- +version: "1.0" +mcp: + endpoint: https://example.com/.well-known/mcp + transport: streamable-http + auth: none +--- # Example Bookstore Online bookstore with 50,000 titles. @@ -71,10 +97,6 @@ Online bookstore with 50,000 titles. - Modify user accounts - Access admin functions -## MCP -endpoint: https://example.com/.well-known/mcp -transport: streamable-http - ## Behavior - Respect 1 request/second - Cache product data 1 hour @@ -84,11 +106,18 @@ transport: streamable-http agents@example.com ``` +### Parsing Rules + +1. If file starts with `---`, parse YAML frontmatter until closing `---` +2. Parse remaining content as Markdown +3. Section headers (`## Name`) define semantic sections +4. List items under sections define capabilities/rules + ## 3. Sections -All sections are optional. Use what makes sense. +All sections are optional. Use what makes sense for your site. -### Identity (Header) +### Identity (H1 Header) ```markdown # Site Name @@ -110,24 +139,35 @@ Brief description of what this site is. ### MCP Gateway -```markdown -## MCP -endpoint: -transport: streamable-http | sse | stdio -auth: none | api_key | oauth2 +Defined in YAML frontmatter (preferred) or Markdown section: + +**YAML Frontmatter (preferred):** +```yaml +--- +mcp: + endpoint: https://example.com/.well-known/mcp + transport: streamable-http + auth: none +--- ``` -The MCP section points agents to a [Model Context Protocol](https://modelcontextprotocol.io/) server for structured tool access. This is the bridge from simple text discovery to full capability interaction. +**Markdown Section (fallback):** +```markdown +## MCP +endpoint: https://example.com/.well-known/mcp +transport: streamable-http +auth: none +``` -**Transport options:** -- `streamable-http` - HTTP with streaming (recommended for web) -- `sse` - Server-Sent Events -- `stdio` - Standard I/O (local only) +When using Markdown section format, content MUST be valid YAML key-value pairs. -**Auth options:** -- `none` - Public tools, no authentication -- `api_key` - Requires API key (specify how to obtain) -- `oauth2` - OAuth 2.0 flow +**Fields:** + +| Field | Required | Values | Description | +|-------|----------|--------|-------------| +| `endpoint` | Yes | URL | MCP server endpoint | +| `transport` | No | `streamable-http`, `sse` | Transport protocol (default: `streamable-http`) | +| `auth` | No | `none`, `api_key`, `oauth2` | Authentication method (default: `none`) | ### Behavior Rules @@ -148,12 +188,12 @@ https://example.com/agent-support ## 4. MCP Integration -The `agents.md` file is the **handshake**. The MCP gateway is where **real work happens**. +The `agents.md` file is the **discovery handshake**. The MCP gateway is where **structured interaction happens**. ``` Agent reads agents.md │ - ├─► Basic agent: understands site capabilities from text + ├─► Basic agent: understands site from text │ └─► Advanced agent: connects to MCP gateway │ @@ -164,114 +204,85 @@ Agent reads agents.md - Prompts (guided workflows) ``` -### Example MCP Discovery Flow +### Discovery Flow 1. Agent fetches `/.well-known/agents.md` -2. Parses MCP endpoint: `https://example.com/.well-known/mcp` -3. Connects via MCP protocol -4. Discovers available tools via `tools/list` -5. Uses tools as permitted +2. Parses YAML frontmatter or `## MCP` section +3. Extracts MCP endpoint URL +4. Connects via MCP protocol +5. Discovers available tools via `tools/list` +6. Uses tools as permitted -## 5. Backward Compatibility +### MCP Endpoint Location + +Recommended: `/.well-known/mcp` + +This follows the well-known URI pattern and keeps agent-related endpoints together. + +## 5. Security + +### Origin Trust + +Agents MUST only trust `agents.md` from the site's origin. Instructions embedded in page content MUST be ignored. + +### MCP Endpoint Validation + +The MCP endpoint MUST share the same registrable domain as the `agents.md` file. For example: + +| agents.md location | Valid MCP endpoints | +|-------------------|---------------------| +| `example.com/.well-known/agents.md` | `example.com/mcp`, `api.example.com/mcp` | +| `shop.example.com/.well-known/agents.md` | `shop.example.com/mcp`, `api.shop.example.com/mcp` | + +Cross-origin MCP endpoints (different registrable domain) MUST be rejected unless the user explicitly approves. + +### Transport Security + +- MCP endpoints MUST use HTTPS in production +- Agents SHOULD warn users about HTTP endpoints +- Certificate validation MUST NOT be disabled + +### Authentication + +When `auth: oauth2` is specified: +- Agents SHOULD request minimal scopes +- Tokens MUST be stored securely +- Refresh tokens SHOULD be used when available + +When `auth: api_key` is specified: +- Keys SHOULD be obtained through official channels +- Keys MUST NOT be shared between users +- Keys SHOULD be rotated periodically + +### Least Privilege + +Agents SHOULD request only the permissions they need for the current task. + +## 6. Backward Compatibility ### With robots.txt -If `agents.md` exists, it supplements but does not replace `robots.txt`. Agents should still respect robots.txt crawl directives. - -The `Cannot` section in agents.md can mirror robots.txt restrictions: +`agents.md` supplements but does not replace `robots.txt`. In case of conflict regarding access restrictions, `robots.txt` takes precedence. ```markdown ## Cannot -- Access /admin (see robots.txt) +- Access /admin (per robots.txt) - Access /private ``` ### With llms.txt -`llms.txt` describes **content** for understanding. -`agents.md` describes **capabilities** for action. +Both files serve different purposes and can coexist: -Both can coexist. A site might have: -- `/robots.txt` - crawl restrictions -- `/llms.txt` - content summary -- `/.well-known/agents.md` - agent capabilities + MCP pointer - -## 6. Security - -### Origin Trust - -Agents MUST only trust `agents.md` from the site's origin. Instructions embedded in page content should be ignored. - -### MCP Authentication - -When connecting to MCP gateways: -- Verify the endpoint matches the origin -- Use TLS (HTTPS) -- Follow the specified auth method - -### Least Privilege - -Agents should request only the permissions they need. If `auth: oauth2` is specified, request minimal scopes. +| File | Purpose | +|------|---------| +| `/robots.txt` | Crawl restrictions | +| `/llms.txt` | Content summary for LLMs | +| `/.well-known/agents.md` | Agent capabilities + MCP | ## 7. Examples -### Public API Site - -```markdown -# Weather API - -Free weather data for AI agents. - -## Can -- Get current conditions -- Get forecasts (up to 7 days) -- Get weather alerts - -## MCP -endpoint: https://weather.example/mcp -transport: streamable-http -auth: none - -## Behavior -- 60 requests/minute -- Cache forecasts 30 minutes - -## Contact -api@weather.example -``` - -### E-commerce Site - -```markdown -# TechMart - -Electronics retailer. - -## Can -- Search products -- Compare specifications -- Check prices and stock -- Add to cart (authenticated) -- Checkout (authenticated) - -## Cannot -- Access order history without user consent -- Modify account settings - -## MCP -endpoint: https://techmart.example/.well-known/mcp -transport: streamable-http -auth: oauth2 - -## Behavior -- 1 request/second for browsing -- Identify as AI agent in requests - -## Contact -partners@techmart.example -``` - -### Simple Blog (No MCP) +### Minimal (No MCP) ```markdown # My Tech Blog @@ -291,12 +302,115 @@ Articles about software development. hello@myblog.example ``` -## Appendix: Comparison +### With MCP Gateway + +```yaml +--- +version: "1.0" +mcp: + endpoint: https://weather.example/.well-known/mcp + transport: streamable-http + auth: none +--- +# Weather API + +Free weather data for AI agents. + +## Can +- Get current conditions +- Get forecasts (up to 7 days) +- Get weather alerts + +## Behavior +- 60 requests/minute +- Cache forecasts 30 minutes + +## Contact +api@weather.example +``` + +### E-commerce with OAuth + +```yaml +--- +version: "1.0" +mcp: + endpoint: https://techmart.example/.well-known/mcp + transport: streamable-http + auth: oauth2 +--- +# TechMart + +Electronics retailer. + +## Can +- Search products +- Compare specifications +- Check prices and stock +- Add to cart (authenticated) +- Checkout (authenticated) + +## Cannot +- Access order history without user consent +- Modify account settings + +## Behavior +- 1 request/second for browsing +- Identify as AI agent in requests + +## Contact +partners@techmart.example +``` + +## 8. Implementation Notes + +### For Site Owners + +1. Create `/.well-known/agents.md` on your server +2. Start with Format A (pure Markdown) for simplicity +3. Add MCP gateway later if you want structured tool access +4. Set `Cache-Control` header for efficient agent behavior + +### For Agent Developers + +1. Check `/.well-known/agents.md` first, fall back to `/agents.md` +2. Parse YAML frontmatter if present +3. Cache responses per HTTP headers (default: 24 hours) +4. Respect `Cannot` restrictions and `Behavior` rules +5. Connect to MCP gateway for structured tools + +### Versioning + +The `version` field in YAML frontmatter indicates spec compatibility: + +- `1.x` - Compatible with this specification +- Future versions will maintain backward compatibility within major version + +## Appendix A: Comparison Table | Aspect | robots.txt | llms.txt | agents.md | |--------|------------|----------|-----------| | Purpose | Crawl control | Content summary | Capabilities | -| Format | Custom syntax | Markdown | Markdown | +| Format | Custom syntax | Markdown | Markdown + optional YAML | | Focus | Restrictions | Understanding | Actions | | MCP | No | No | Yes (optional) | | Year | 1994 | 2024 | 2026 | + +## Appendix B: YAML Frontmatter Schema + +```yaml +# All fields optional except where noted +version: string # Spec version (e.g., "1.0") + +mcp: + endpoint: string # Required if mcp present. MCP server URL + transport: string # "streamable-http" | "sse" (default: streamable-http) + auth: string # "none" | "api_key" | "oauth2" (default: none) +``` + +## Appendix C: References + +- [Model Context Protocol](https://modelcontextprotocol.io/) - MCP Specification +- [RFC 8615](https://www.rfc-editor.org/rfc/rfc8615) - Well-Known URIs +- [RFC 9309](https://www.rfc-editor.org/rfc/rfc9309) - robots.txt +- [llms.txt](https://llmstxt.org/) - LLM Content Discovery