Introduction
In 2026, the primary traffic source for many SaaS products is no longer a human sitting in front of a browser. It is increasingly an AI agent — autonomously discovering tools, reading capabilities, and calling APIs on behalf of users.
If your website is still optimized only for human readers, you are already behind. AI agents do not skim marketing copy or click buttons. They look for structured, machine-readable information: what you do, how to call you, and what you return.
This tutorial walks you through a lightweight, practical strategy to make any website — especially a web-based SaaS — discoverable, readable, and callable by AI agents, without rebuilding your architecture.
Prerequisites
- Basic understanding of JSON and HTML
- Access to deploy static files or configure routes on your web server (Nginx, Cloudflare, Vercel, etc.)
- A working website with at least one API endpoint
No MCP server required. No complex OpenAPI tooling required. This guide prioritizes the minimum viable implementation.
How AI Agents Read Your Website
Before writing any code, it helps to understand how modern AI agents currently access web content. They operate in three broad modes:
- HTML scraping — similar to a traditional web crawler, reading page content as text
- Structured data parsing — reading JSON-LD, Schema.org markup, OpenAPI specs, or custom schemas
- Direct API or tool calls — invoking an MCP endpoint or REST API using a discovered tool schema
The goal of this tutorial is to satisfy modes 2 and 3 with minimal effort. You want an agent to understand in a single request:
- Who you are
- What you can do
- How to call you
Step 1: Create a Machine-Readable Schema at /.well-known/ai.json
The .well-known/ directory is an established web convention (RFC 5785) for hosting discovery metadata. In 2026, AI agents from multiple vendors are beginning to probe this path automatically.
Create the file at /.well-known/ai.json with the following structure:
{
"name": "AnalyticsPlatform",
"description": "Location analytics and risk scoring platform for commercial real estate.",
"version": "1.0",
"base_url": "https://example.com",
"geo_support": true,
"coordinate_system": "WGS84",
"endpoints": [
{
"name": "Generate Location Report",
"path": "/api/report",
"method": "POST",
"description": "Returns a structured risk and accessibility score for a given coordinate.",
"input": {
"lat": "number",
"lng": "number"
},
"output": {
"score": "number",
"risk_level": "string",
"summary": "string"
}
},
{
"name": "Accessibility Analysis",
"path": "/api/accessibility",
"method": "GET",
"description": "Returns a reachability polygon for a given coordinate and travel mode.",
"input": {
"lat": "number",
"lng": "number",
"mode": "string"
},
"output": {
"polygon": "GeoJSON"
}
}
],
"formats_supported": ["JSON", "GeoJSON", "CSV"]
}Why this works: This schema does not require OpenAPI tooling or a running MCP server. It is a static JSON file that any agent can fetch, parse, and understand. It answers the three fundamental questions an agent needs answered before attempting to use your service.
Tips:
- Keep descriptions short and imperative: "Returns a risk score" not "Our powerful AI-driven scoring engine analyzes..."
- List every public-facing endpoint you want agents to call
- Include
geo_supportandcoordinate_systemfields if your platform handles geographic data — agents working with spatial context will actively look for these
Serving the File
If you use Nginx:
location /.well-known/ai.json {
alias /var/www/.well-known/ai.json;
default_type application/json;
}If you use Next.js, place the file in public/.well-known/ai.json and it will be served automatically.
If you use Cloudflare Pages, place it in the public/.well-known/ directory of your build output.
Step 2: Add a <link> Tag in Your Homepage <head>
This optional but recommended step tells crawlers and agents that a machine schema exists for your site, without waiting for them to probe .well-known/ on their own.
<head>
<link rel="alternate" type="application/json" href="/.well-known/ai.json" />
</head>This mirrors the established pattern for RSS feeds (<link rel="alternate" type="application/rss+xml">), which is widely understood by automated tools.
Step 3: Add JSON-LD Structured Data to Your Homepage
JSON-LD (JavaScript Object Notation for Linked Data) is the Schema.org-standard way to embed structured metadata directly in your HTML. Both search engines and AI agents parse it.
Add this inside your homepage <body> or <head>:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "SoftwareApplication",
"name": "AnalyticsPlatform",
"description": "Location analytics and risk scoring for commercial real estate.",
"applicationCategory": "BusinessApplication",
"operatingSystem": "Web",
"url": "https://example.com",
"offers": {
"@type": "Offer",
"priceCurrency": "USD",
"description": "Subscription-based access for enterprise clients."
}
}
</script>Why bother with both JSON-LD and ai.json? They serve different layers. JSON-LD helps with semantic understanding of what your product is. The ai.json schema tells agents exactly how to call it. Use both.
Step 4: Create a Dedicated AI Summary Page at /ai
This is the highest-value addition you can make for discoverability. Create a plain, static HTML page at the /ai route. Its purpose is to give any LLM or agent an unambiguous, structured description of your product — with no JavaScript required, no login wall, and no marketing fluff.
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<title>AnalyticsPlatform - AI Summary</title>
</head>
<body>
<h1>AnalyticsPlatform - Machine Readable Summary</h1>
<p>
AnalyticsPlatform is a SaaS platform that provides location risk scoring,
accessibility analysis, and demographic data for commercial real estate research.
</p>
<h2>Capabilities</h2>
<ul>
<li>Location risk scoring based on environmental and infrastructure data</li>
<li>Accessibility analysis with polygon output (transit, walking, driving)</li>
<li>Demographic mesh data at district resolution</li>
<li>Hazard overlay integration (flood, earthquake risk)</li>
</ul>
<h2>API Endpoints</h2>
<ul>
<li><strong>POST /api/report</strong> — Generate a location risk report given latitude and longitude</li>
<li><strong>GET /api/accessibility</strong> — Return a reachability polygon for a coordinate and travel mode</li>
</ul>
<h2>Machine Schema</h2>
<p>
Full machine-readable schema available at:
<a href="/.well-known/ai.json">/.well-known/ai.json</a>
</p>
<h2>Output Formats</h2>
<ul>
<li>JSON</li>
<li>GeoJSON</li>
<li>CSV</li>
</ul>
<h2>Pricing</h2>
<p>Subscription-based access for enterprise and agency clients. Contact for pricing.</p>
<h2>Contact</h2>
<p>support@example.com</p>
</body>
</html>Critical requirements for this page:
- Must be server-side rendered or fully static. Agents often do not execute JavaScript.
- No authentication barrier. This page must be publicly accessible.
- No redirects that require cookies or sessions.
- Use plain semantic HTML:
<h1>,<h2>,<ul>,<p>. Avoid div soup.
Add a footer link from your main site to this page. Keep it understated:
<footer>
<a href="/ai">AI Access</a>
</footer>Step 5: Review Your robots.txt
Many SaaS products have overly aggressive bot-blocking policies. Check your robots.txt and ensure you are not accidentally blocking legitimate AI agent crawlers.
Minimal safe configuration:
User-agent: *
Allow: /ai
Allow: /.well-known/ai.json
Disallow: /dashboard
Disallow: /account
This allows agents to reach your public-facing discovery files while still protecting authenticated areas.
Additionally, check that your CDN or WAF (e.g., Cloudflare Bot Fight Mode) is not silently blocking requests to these paths. Overly strict WAF rules are one of the most common reasons AI-optimized endpoints fail in practice.
Step 6: Verify the Setup
Run a quick sanity check from the command line:
# Check that ai.json is publicly accessible and returns valid JSON
curl -s https://example.com/.well-known/ai.json | python3 -m json.tool
# Check that the AI summary page returns HTML without redirect loops
curl -I https://example.com/ai
# Verify JSON-LD is present in the homepage source
curl -s https://example.com | grep "application/ld+json"If all three commands return expected output, your site is now AI-agent ready.
Complete Implementation Checklist
[ ] Create /.well-known/ai.json with product name, description, and endpoint list
[ ] Add <link rel="alternate" type="application/json"> to homepage <head>
[ ] Add JSON-LD structured data block to homepage
[ ] Create /ai static HTML summary page (SSR or static, no auth required)
[ ] Add "AI Access" link in site footer pointing to /ai
[ ] Review robots.txt to allow /.well-known/ and /ai
[ ] Verify WAF/CDN is not blocking agent-like requests to these paths
[ ] Test all paths with curl
What Not to Do
These are common pitfalls that actively prevent agents from reading your site:
- Client-side-only rendering (pure CSR/SPA): If your content only appears after JavaScript runs, many agents cannot see it. Provide at least an SSR or static version for public marketing and capability pages.
- Blocking all bots in robots.txt: If
Disallow: /is present for all user agents, you block agents from reaching your schema files. - Hiding API docs behind login: Any capability description locked behind authentication is invisible to agents.
- Writing marketing copy instead of capability descriptions: Agents do not benefit from "the industry's most powerful analytics suite." They need "POST /api/report returns a JSON object with score, risk_level, and summary."
Why This Lightweight Approach Is Sufficient
In 2026, the dominant agent discovery pattern follows a predictable sequence:
- Probe
/.well-known/for known schema files - Check
robots.txtfor access rules - Parse
<link>tags in the homepage<head> - Crawl structured semantic pages
Two files and one static route — ai.json, a <link> tag, and the /ai page — satisfy all four steps. You do not need an MCP server, a full OpenAPI specification, or an agent orchestration layer to be discoverable. Those are worth building eventually, but this foundation gets you into the agent-accessible web immediately.
Summary
This tutorial covered a minimal, deployable strategy for making any SaaS website readable and callable by AI agents:
| Step | What You Add | Why It Matters |
|---|---|---|
| 1 | /.well-known/ai.json |
Primary machine-readable schema entry point |
| 2 | <link rel="alternate"> in <head> |
Signals schema availability to crawlers |
| 3 | JSON-LD block in homepage | Semantic product identity for agents and search |
| 4 | /ai static summary page |
Plain-text, no-JS capability description for LLMs |
| 5 | Updated robots.txt |
Ensures agents can reach the above paths |
| 6 | curl verification | Confirms everything is publicly accessible |
The web is moving from human-first pages to agent-callable tool nodes. This structure gives you a clean foundation to build on — whether you eventually add a full MCP server, an OpenAPI spec, or a tool registry integration.
Start with the two files. Ship it this week.
