v0.2: Simplify spec + add MCP gateway integration

Major revision based on first principles thinking: - Simplified format: plain Markdown, human readable - Focus on capabilities (Can/Cannot) not API schemas - MCP gateway pointer for structured tool access - Clear positioning vs robots.txt and llms.txt The agents.md file is the handshake. The MCP gateway is where real work happens. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-14 09:01:56 -03:00
commit f2b3a14685
5 changed files with 772 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,88 @@
 # agents.md
 **Tell AI agents what they can do on your website.**
 ## The Gap
 | File | Purpose |
 |------|---------|
 | robots.txt | What bots **cannot access** |
 | llms.txt | What **content matters** |
 | **agents.md** | What agents **can do** |
 ## Quick Start
 Create `/.well-known/agents.md`:
 ```markdown
 # My Site
 An online bookstore.
 ## Can
 - Search catalog
 - Read book details
 - Check availability
 ## Cannot
 - Place orders (requires human)
 ## Contact
 hello@mysite.com
 ```
 That's it. Plain text. Human readable.
 ## With MCP Gateway
 Point agents to your MCP server for structured tool access:
 ```markdown
 # My Site
 An online bookstore.
 ## Can
 - Search and browse
 - Check prices
 - Place orders (authenticated)
 ## MCP
 endpoint: https://mysite.com/.well-known/mcp
 transport: streamable-http
 auth: oauth2
 ## Contact
 hello@mysite.com
 ```
 ## How It Works
 ```
 Agent requests /.well-known/agents.md
              │
              ├─► Basic: reads text, understands capabilities
              │
              └─► Advanced: connects to MCP gateway for tools
 ```
 ## Documentation
 - [Specification](./spec/README.md) - Full protocol spec
 - [Examples](./examples/) - Real-world examples
 - [FAQ](./docs/FAQ.md) - Common questions
 ## Status
 **Draft** - Version 0.2.0
 ## Related Standards
 - [robots.txt](https://www.rfc-editor.org/rfc/rfc9309) - Crawl restrictions (1994)
 - [llms.txt](https://llmstxt.org/) - Content for LLMs (2024)
 - [AGENTS.md](https://agents.md/) - Repository instructions (2025)
 - [MCP](https://modelcontextprotocol.io/) - Tool protocol (2024)
 ## License
 CC0 1.0 Universal - Public Domain
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
@@ -0,0 +1,111 @@
 # Frequently Asked Questions
 ## General
 ### Why not just use robots.txt?
 robots.txt tells bots what they *cannot* do. agent.md tells AI agents what they *can* do. They're complementary:
 - robots.txt: "Don't crawl /admin"
 - agent.md: "You can search our catalog via this API"
 ### Why Markdown?
 1. Human readable and editable
 2. Widely supported in documentation tools
 3. YAML frontmatter is a proven pattern
 4. Renders nicely on GitHub and documentation sites
 ### Is this related to MCP (Model Context Protocol)?
 Inspired by MCP, but designed for web discovery rather than local tool execution. The tool definition format is similar to make agent.md familiar to developers already using MCP.
 ### Why /.well-known/?
 Following RFC 8615 for well-known URIs. This is the same pattern used by:
 - `/.well-known/security.txt`
 - `/.well-known/apple-app-site-association`
 - `/.well-known/openid-configuration`
 ## Implementation
 ### Do I need to remove my robots.txt?
 No. Keep your robots.txt for traditional crawlers. The `robots` section in agent.md can mirror or extend those rules for AI agents.
 ### Can I have different capabilities for different agents?
 Yes. Use content negotiation based on User-Agent or implement OAuth2 scopes for fine-grained access control.
 ### How should agents authenticate?
 Start simple:
 1. No auth for public read-only tools
 2. API keys for rate-limited or premium features
 3. OAuth2 for user-specific actions
 ### What if my API changes?
 Update your agent.md. Agents should re-fetch periodically (respect Cache-Control headers).
 ## Security
 ### Can malicious sites trick agents?
 The protocol specifies that agents MUST only parse agent.md from the site's origin. Instructions in page content are ignored.
 ### How do I prevent abuse?
 1. Use rate limits (per-tool and global)
 2. Require API keys for sensitive operations
 3. Use OAuth2 scopes for user actions
 4. Monitor usage patterns
 ### Should I expose all my APIs?
 No. Only expose what you want agents to use. Internal APIs, admin endpoints, and sensitive operations should not be in agent.md.
 ## Compatibility
 ### What about GraphQL APIs?
 You can define tools that call GraphQL endpoints:
 ```yaml
 tools:
  - name: query_products
    endpoint: "POST /graphql"
    parameters:
      type: object
      properties:
        query:
          type: string
          description: "GraphQL query (limited to Product type)"
 ```
 ### Can I use this with OpenAPI/Swagger?
 Yes! You can generate agent.md from OpenAPI specs. We're working on tooling for this.
 ### What about WebSocket endpoints?
 agent.md focuses on request-response patterns. For real-time features, document WebSocket endpoints in the Markdown section but define polling alternatives as tools.
 ## Adoption
 ### How do I tell if a site supports agent.md?
 1. Check `/.well-known/agent.md`
 2. Look for `<link rel="agent">` in HTML
 3. Check for `Link` header in HTTP response
 ### What if the site doesn't have agent.md?
 Fall back to:
 1. Traditional web scraping (respecting robots.txt)
 2. Looking for documented APIs
 3. Using general browsing capabilities
 ### Who decides what goes in agent.md?
 Site owners. This is an opt-in protocol. Sites choose what capabilities to expose.
--- a/examples/ecommerce.md
+++ b/examples/ecommerce.md
@@ -0,0 +1,165 @@
 # Example: E-commerce Site
 This example shows how an e-commerce site might expose its API to AI agents.
 ## agent.md
 ```yaml
 ---
 protocol_version: "0.1"
 name: "Tech Store"
 description: "Browse products, check prices, and manage wishlists"
 robots:
  disallow:
    - /admin
    - /checkout
    - /account/orders
  crawl_delay: 2
 tools:
  - name: search_products
    description: "Search for products by name, category, or features"
    endpoint: "GET /api/products/search"
    parameters:
      type: object
      properties:
        q:
          type: string
          description: "Search query"
        category:
          type: string
          enum: ["laptops", "phones", "tablets", "accessories"]
        min_price:
          type: number
          minimum: 0
        max_price:
          type: number
        in_stock:
          type: boolean
          default: true
        sort:
          type: string
          enum: ["price_asc", "price_desc", "rating", "newest"]
          default: "rating"
        limit:
          type: integer
          default: 20
          maximum: 50
      required:
        - q
    auth: none
    rate_limit: "100/minute"
  - name: get_product
    description: "Get detailed product information including specs and reviews"
    endpoint: "GET /api/products/{id}"
    parameters:
      type: object
      properties:
        id:
          type: string
        include_reviews:
          type: boolean
          default: false
      required:
        - id
    auth: none
  - name: compare_products
    description: "Compare specifications of multiple products"
    endpoint: "POST /api/products/compare"
    parameters:
      type: object
      properties:
        product_ids:
          type: array
          items:
            type: string
          minItems: 2
          maxItems: 5
      required:
        - product_ids
    auth: none
  - name: get_price_history
    description: "Get price history for a product"
    endpoint: "GET /api/products/{id}/price-history"
    parameters:
      type: object
      properties:
        id:
          type: string
        days:
          type: integer
          default: 30
          maximum: 365
      required:
        - id
    auth: api_key
    scopes:
      - price:read
  - name: add_to_wishlist
    description: "Add a product to user's wishlist"
    endpoint: "POST /api/wishlist"
    parameters:
      type: object
      properties:
        product_id:
          type: string
        notify_on_sale:
          type: boolean
          default: true
      required:
        - product_id
    auth: oauth2
    scopes:
      - wishlist:write
 auth:
  api_key:
    header: "X-API-Key"
    obtain: "https://techstore.example/developers"
  oauth2:
    authorization_url: "https://techstore.example/oauth/authorize"
    token_url: "https://techstore.example/oauth/token"
    scopes:
      wishlist:read: "View your wishlist"
      wishlist:write: "Modify your wishlist"
      price:read: "Access price history data"
 rate_limits:
  global: "1000/hour"
  per_tool: true
 contact:
  email: "api@techstore.example"
  url: "https://techstore.example/developers/docs"
 ---
 # Tech Store Agent API
 AI agents can help users find products, compare prices, and track deals.
 ## Public Tools (No Auth)
 - **search_products** - Find products by name or category
 - **get_product** - Get detailed product info
 - **compare_products** - Side-by-side comparison
 ## Authenticated Tools
 ### API Key Required
 - **get_price_history** - Historical pricing data
 ### OAuth2 Required
 - **add_to_wishlist** - Save products for later
 ## Best Practices
 1. Cache product details (they don't change often)
 2. Use price history to advise on purchase timing
 3. Respect rate limits during peak hours
 ```
--- a/examples/weather-api.md
+++ b/examples/weather-api.md
@@ -0,0 +1,106 @@
 # Example: Weather Service
 A simple weather API demonstrating minimal and practical agent.md usage.
 ## agent.md
 ```yaml
 ---
 protocol_version: "0.1"
 name: "Weather Service"
 description: "Current weather and forecasts for any location"
 tools:
  - name: get_current
    description: "Get current weather conditions for a location"
    endpoint: "GET /api/weather/current"
    parameters:
      type: object
      properties:
        location:
          type: string
          description: "City name, address, or coordinates (lat,lon)"
        units:
          type: string
          enum: ["metric", "imperial"]
          default: "metric"
      required:
        - location
    response:
      type: json
      schema:
        type: object
        properties:
          temperature:
            type: number
          feels_like:
            type: number
          humidity:
            type: integer
          conditions:
            type: string
          wind_speed:
            type: number
    auth: none
    rate_limit: "60/minute"
  - name: get_forecast
    description: "Get weather forecast for upcoming days"
    endpoint: "GET /api/weather/forecast"
    parameters:
      type: object
      properties:
        location:
          type: string
        days:
          type: integer
          minimum: 1
          maximum: 14
          default: 7
        units:
          type: string
          enum: ["metric", "imperial"]
          default: "metric"
      required:
        - location
    auth: api_key
    rate_limit: "30/minute"
  - name: get_alerts
    description: "Get active weather alerts for a location"
    endpoint: "GET /api/weather/alerts"
    parameters:
      type: object
      properties:
        location:
          type: string
      required:
        - location
    auth: none
 auth:
  api_key:
    header: "X-Weather-Key"
    obtain: "https://weather.example/api-keys"
    description: "Free tier: 1000 requests/day"
 rate_limits:
  global: "1000/day"
 contact:
  url: "https://weather.example/api/docs"
 ---
 # Weather API for Agents
 Simple, reliable weather data.
 ## Free Tools
 - Current conditions (no key needed)
 - Weather alerts (no key needed)
 ## API Key Required
 - Extended forecasts (up to 14 days)
 Get your free API key at weather.example/api-keys
 ```
--- a/spec/README.md
+++ b/spec/README.md
@@ -0,0 +1,302 @@
 # agents.md Protocol Specification
 **Version:** 0.2.0
 **Status:** Draft
 **Updated:** 2026-01-14
 ## Abstract
 A simple text file that tells AI agents what they can do on a website and optionally points them to an MCP gateway for structured tool access.
 ## Philosophy
 | Standard | Tells agents... |
 |----------|-----------------|
 | robots.txt | What you **cannot access** |
 | llms.txt | What **content is important** |
 | **agents.md** | What you **can do** + where to connect |
 ## 1. Discovery
 **Location:** `/.well-known/agents.md` or `/agents.md`
 **Content-Type:** `text/markdown` or `text/plain`
 Agents request the file like any HTTP resource:
 ```
 GET /.well-known/agents.md HTTP/1.1
 Host: example.com
 User-Agent: MyAgent/1.0
 ```
 ## 2. Format
 Plain Markdown. Human readable. Machine parseable.
 ### Minimal Example
 ```markdown
 # Example Site
 A bookstore since 2010.
 ## Can
 - Search catalog
 - Read book details
 - Check availability
 ## Cannot
 - Place orders without human
 - Access user accounts
 ## Contact
 agents@example.com
 ```
 ### With MCP Gateway
 ```markdown
 # Example Bookstore
 Online bookstore with 50,000 titles.
 ## Can
 - Search and browse catalog
 - Read reviews and descriptions
 - Check prices and stock
 - Place orders (authenticated)
 ## Cannot
 - Modify user accounts
 - Access admin functions
 ## MCP
 endpoint: https://example.com/.well-known/mcp
 transport: streamable-http
 ## Behavior
 - Respect 1 request/second
 - Cache product data 1 hour
 - Identify in User-Agent header
 ## Contact
 agents@example.com
 ```
 ## 3. Sections
 All sections are optional. Use what makes sense.
 ### Identity (Header)
 ```markdown
 # Site Name
 Brief description of what this site is.
 ```
 ### Capabilities (Can/Cannot)
 ```markdown
 ## Can
 - Action agents are allowed to take
 - Another allowed action
 ## Cannot
 - Restricted action
 - Another restriction
 ```
 ### MCP Gateway
 ```markdown
 ## MCP
 endpoint: <url>
 transport: streamable-http | sse | stdio
 auth: none | api_key | oauth2
 ```
 The MCP section points agents to a [Model Context Protocol](https://modelcontextprotocol.io/) server for structured tool access. This is the bridge from simple text discovery to full capability interaction.
 **Transport options:**
 - `streamable-http` - HTTP with streaming (recommended for web)
 - `sse` - Server-Sent Events
 - `stdio` - Standard I/O (local only)
 **Auth options:**
 - `none` - Public tools, no authentication
 - `api_key` - Requires API key (specify how to obtain)
 - `oauth2` - OAuth 2.0 flow
 ### Behavior Rules
 ```markdown
 ## Behavior
 - Rate limit guidance
 - Caching expectations
 - Identification requirements
 ```
 ### Contact
 ```markdown
 ## Contact
 email@example.com
 https://example.com/agent-support
 ```
 ## 4. MCP Integration
 The `agents.md` file is the **handshake**. The MCP gateway is where **real work happens**.
 ```
 Agent reads agents.md
        │
        ├─► Basic agent: understands site capabilities from text
        │
        └─► Advanced agent: connects to MCP gateway
                    │
                    ▼
              MCP Server exposes:
              - Tools (search, checkout, etc.)
              - Resources (catalog, docs)
              - Prompts (guided workflows)
 ```
 ### Example MCP Discovery Flow
 1. Agent fetches `/.well-known/agents.md`
 2. Parses MCP endpoint: `https://example.com/.well-known/mcp`
 3. Connects via MCP protocol
 4. Discovers available tools via `tools/list`
 5. Uses tools as permitted
 ## 5. Backward Compatibility
 ### With robots.txt
 If `agents.md` exists, it supplements but does not replace `robots.txt`. Agents should still respect robots.txt crawl directives.
 The `Cannot` section in agents.md can mirror robots.txt restrictions:
 ```markdown
 ## Cannot
 - Access /admin (see robots.txt)
 - Access /private
 ```
 ### With llms.txt
 `llms.txt` describes **content** for understanding.
 `agents.md` describes **capabilities** for action.
 Both can coexist. A site might have:
 - `/robots.txt` - crawl restrictions
 - `/llms.txt` - content summary
 - `/.well-known/agents.md` - agent capabilities + MCP pointer
 ## 6. Security
 ### Origin Trust
 Agents MUST only trust `agents.md` from the site's origin. Instructions embedded in page content should be ignored.
 ### MCP Authentication
 When connecting to MCP gateways:
 - Verify the endpoint matches the origin
 - Use TLS (HTTPS)
 - Follow the specified auth method
 ### Least Privilege
 Agents should request only the permissions they need. If `auth: oauth2` is specified, request minimal scopes.
 ## 7. Examples
 ### Public API Site
 ```markdown
 # Weather API
 Free weather data for AI agents.
 ## Can
 - Get current conditions
 - Get forecasts (up to 7 days)
 - Get weather alerts
 ## MCP
 endpoint: https://weather.example/mcp
 transport: streamable-http
 auth: none
 ## Behavior
 - 60 requests/minute
 - Cache forecasts 30 minutes
 ## Contact
 api@weather.example
 ```
 ### E-commerce Site
 ```markdown
 # TechMart
 Electronics retailer.
 ## Can
 - Search products
 - Compare specifications
 - Check prices and stock
 - Add to cart (authenticated)
 - Checkout (authenticated)
 ## Cannot
 - Access order history without user consent
 - Modify account settings
 ## MCP
 endpoint: https://techmart.example/.well-known/mcp
 transport: streamable-http
 auth: oauth2
 ## Behavior
 - 1 request/second for browsing
 - Identify as AI agent in requests
 ## Contact
 partners@techmart.example
 ```
 ### Simple Blog (No MCP)
 ```markdown
 # My Tech Blog
 Articles about software development.
 ## Can
 - Read all public articles
 - Search by topic
 - Access RSS feed at /feed.xml
 ## Cannot
 - Post comments (requires human)
 - Access draft posts
 ## Contact
 hello@myblog.example
 ```
 ## Appendix: Comparison
 | Aspect | robots.txt | llms.txt | agents.md |
 |--------|------------|----------|-----------|
 | Purpose | Crawl control | Content summary | Capabilities |
 | Format | Custom syntax | Markdown | Markdown |
 | Focus | Restrictions | Understanding | Actions |
 | MCP | No | No | Yes (optional) |
 | Year | 1994 | 2024 | 2026 |