The problem

If you use AI coding assistants, you've done this dance:

Find the docs for a library on GitHub
Click through the file tree to find the right .md file
Click "Raw"
Copy the entire thing
Paste it into ChatGPT or Claude
Repeat for the next file

GitHub only surfaces the top-level README. The rest of the documentation — CONTRIBUTING.md, docs/, .github/, hidden SKILL.md files buried in subdirectories — requires manually walking the file tree.

And if you're building an AI agent that needs to fetch docs programmatically? You get GitHub's rendered HTML with all the chrome, or you have to construct raw.githubusercontent.com URLs by hand.

I built rawdocs to fix this.

What rawdocs does

Given any public GitHub repo, rawdocs:

Fetches the entire file tree
Filters to documentation files only (.md, .txt)
Renders a navigable tree view with syntax-highlighted markdown
Serves raw markdown to bots and LLMs via content negotiation
Provides one-click "Copy raw markdown" for the human-in-the-loop workflow
Auto-generates /llms.txt and /llms-full.txt per repo

Zero indexing. Zero storage. Zero AI processing. A thin, fast viewer that respects the original author's exact words.

Try it now: rawdocs.sivaram.dev/facebook/react

The key insight: content negotiation

The same URL serves different formats based on what you ask for:

# Browser gets rendered HTML with syntax highlighting
open https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md

# curl gets raw markdown — just request a .md URL
curl https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md

# Or be explicit with Accept header
curl -H "Accept: text/markdown" \
  https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md

# Get JSON metadata
curl -H "Accept: application/json" \
  https://rawdocs.sivaram.dev/facebook/react

This isn't cloaking — it's standard HTTP content negotiation (RFC 7231). APIs have done Accept: application/json vs Accept: text/xml for decades. We're doing the same for documentation.

Built for LLMs, not just humans

Every response to an AI agent includes headers that help it understand what it's getting:

Content-Type: text/markdown; charset=utf-8
x-markdown-tokens: 31371
Content-Signal: ai-train=yes, search=yes, ai-input=yes
X-Llms-Txt: /facebook/react/llms.txt
Link: </facebook/react/llms.txt>; rel="llms-txt"
Vary: Accept

The x-markdown-tokens header tells the agent the estimated token count before it reads the body — useful for context window sizing. Content-Signal explicitly signals that this content is available for AI training, search, and inference input.

Every repo automatically gets two LLM-focused endpoints:

/owner/repo/llms.txt — An index of all documentation files
/owner/repo/llms-full.txt — All docs concatenated into a single file for large-context models

The architecture

Browser / LLM / curl
        |
   Cloudflare (edge cache + compression)
        |
   Railway CDN (Fastly, 100+ PoPs)
        |
   Hono app (Bun runtime)
   - Content negotiation
   - Tree filtering
   - Markdown rendering + Shiki highlighting
   - Frontmatter parsing
   - LRU cache (5 min TTL)
        |
   ungh.cc     -> tree + metadata (primary)
   jsDelivr    -> file content (SHA-pinned, immutable)
   raw.githubusercontent.com -> fallback
   GitHub API  -> fallback (circuit breaker)

No database. No Redis. No queue. No background jobs.

The entire thing is a single Hono app running on Bun. Here's why:

Hono + Bun — Smallest viable web framework. Native TypeScript, built-in JSX for SSR, fast cold starts. No Next.js because there's no React UI complexity to justify it.
ungh.cc — Free, cached GitHub proxy. Returns the recursive file tree in one call with the commit SHA. No auth needed.
jsDelivr — Serves any file from any public GitHub repo at cdn.jsdelivr.net/gh/{owner}/{repo}@{sha}/{path}. By using the SHA from ungh's tree response, content URLs become immutable and cache forever. Falls back to raw.githubusercontent.com when jsDelivr returns 403.
LRU cache — The lru-cache npm package handles 500 entries with 5-minute TTL. Rendered HTML (the expensive Shiki output) is cached so the first request takes ~2s but subsequent requests are instant.
Cloudflare + Railway CDN — Two layers of edge caching. Raw markdown responses cache at the CDN edge. HTML is served fresh from origin but the LRU cache keeps it fast.

What I learned building this

Hono JSX escapes everything

The biggest recurring issue: Hono's JSX escapes HTML entities in attributes and text. Font URLs with & become &. Emoji HTML entities like 📄 get double-escaped. CSS with quotes gets mangled inside <style> tags.

The fix: use raw() from hono/html for anything that needs unescaped HTML — font links, style blocks, emoji icons. For JavaScript strings in onclick handlers, use Unicode escapes (\u2713) instead of HTML entities.

Content negotiation is trickier than it looks

The Accept header priority order matters. Browsers send Accept: text/html,application/xhtml+xml,... which includes text/html. But curl sends */* by default. And .md URLs should return raw markdown to curl users without requiring an Accept header.

The final priority: explicit Accept header > URL extension hint > User-Agent heuristic > default to HTML.

YAML frontmatter needs special handling

Many documentation files (especially SKILL.md files in agent repos) start with YAML frontmatter between --- markers. marked doesn't parse this natively — it renders the --- as horizontal rules and the key-value pairs as paragraphs.

I convert frontmatter into a clean HTML table injected before the rendered markdown body. No data lost, looks like GitHub's rendering.

Cache the rendered HTML, not just the raw content

Shiki syntax highlighting is the most expensive operation (~1-2s for large files). I initially only cached the raw markdown from jsDelivr. Adding a second cache layer for the rendered HTML (keyed by owner/repo@sha:path) made subsequent page loads near-instant.

The tech stack

Layer	Choice
Runtime	Bun
Framework	Hono (JSX SSR)
Markdown	marked + sanitize-html
Syntax Highlighting	Shiki (GitHub Dark theme)
Fonts	DM Sans + Space Mono
Cache	lru-cache (in-process)
Upstream	ungh.cc + jsDelivr + GitHub API
Hosting	Railway + Cloudflare

Total dependencies: 5 runtime packages. The entire codebase is ~2,000 lines across 27 files.

SEO and AI discoverability

rawdocs implements the full emerging stack for AI-discoverable web content:

JSON-LD structured data — TechArticle for file views, CollectionPage for tree views, WebSite + WebApplication for the homepage. Server-rendered, because AI crawlers don't execute JavaScript.
<link rel="alternate" type="text/markdown"> — On every HTML page, so AI agents can discover the raw markdown version.
robots.txt — Explicitly welcomes GPTBot, ClaudeBot, PerplexityBot, and every other major AI crawler.
/llms.txt — Site-level and per-repo documentation indexes following the llms.txt spec.

Try it

Browse: rawdocs.sivaram.dev/openclaw/openclaw
Raw markdown: curl https://rawdocs.sivaram.dev/facebook/react/README.md
Full docs dump: curl https://rawdocs.sivaram.dev/vercel/next.js/llms-full.txt
Source: github.com/SivaramPg/rawdocs

It works on any public GitHub repo. Paste a URL and go.

rawdocs is open source and deployed on Railway. Built by Sivaram Pandariganthan.