I built rawdocs — a clean documentation viewer for GitHub repos, built for humans and LLMs
The problem
If you use AI coding assistants, you've done this dance:
- Find the docs for a library on GitHub
- Click through the file tree to find the right
.mdfile - Click "Raw"
- Copy the entire thing
- Paste it into ChatGPT or Claude
- Repeat for the next file
GitHub only surfaces the top-level README. The rest of the documentation — CONTRIBUTING.md, docs/, .github/, hidden SKILL.md files buried in subdirectories — requires manually walking the file tree.
And if you're building an AI agent that needs to fetch docs programmatically? You get GitHub's rendered HTML with all the chrome, or you have to construct raw.githubusercontent.com URLs by hand.
I built rawdocs to fix this.
What rawdocs does
Given any public GitHub repo, rawdocs:
- Fetches the entire file tree
- Filters to documentation files only (
.md,.txt) - Renders a navigable tree view with syntax-highlighted markdown
- Serves raw markdown to bots and LLMs via content negotiation
- Provides one-click "Copy raw markdown" for the human-in-the-loop workflow
- Auto-generates
/llms.txtand/llms-full.txtper repo
Zero indexing. Zero storage. Zero AI processing. A thin, fast viewer that respects the original author's exact words.
Try it now: rawdocs.sivaram.dev/facebook/react
The key insight: content negotiation
The same URL serves different formats based on what you ask for:
# Browser gets rendered HTML with syntax highlighting
open https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md
# curl gets raw markdown — just request a .md URL
curl https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md
# Or be explicit with Accept header
curl -H "Accept: text/markdown" \
https://rawdocs.sivaram.dev/facebook/react/CONTRIBUTING.md
# Get JSON metadata
curl -H "Accept: application/json" \
https://rawdocs.sivaram.dev/facebook/reactThis isn't cloaking — it's standard HTTP content negotiation (RFC 7231). APIs have done Accept: application/json vs Accept: text/xml for decades. We're doing the same for documentation.
Built for LLMs, not just humans
Every response to an AI agent includes headers that help it understand what it's getting:
Content-Type: text/markdown; charset=utf-8
x-markdown-tokens: 31371
Content-Signal: ai-train=yes, search=yes, ai-input=yes
X-Llms-Txt: /facebook/react/llms.txt
Link: </facebook/react/llms.txt>; rel="llms-txt"
Vary: AcceptThe x-markdown-tokens header tells the agent the estimated token count before it reads the body — useful for context window sizing. Content-Signal explicitly signals that this content is available for AI training, search, and inference input.
Every repo automatically gets two LLM-focused endpoints:
/owner/repo/llms.txt— An index of all documentation files/owner/repo/llms-full.txt— All docs concatenated into a single file for large-context models
The architecture
Browser / LLM / curl
|
Cloudflare (edge cache + compression)
|
Railway CDN (Fastly, 100+ PoPs)
|
Hono app (Bun runtime)
- Content negotiation
- Tree filtering
- Markdown rendering + Shiki highlighting
- Frontmatter parsing
- LRU cache (5 min TTL)
|
ungh.cc -> tree + metadata (primary)
jsDelivr -> file content (SHA-pinned, immutable)
raw.githubusercontent.com -> fallback
GitHub API -> fallback (circuit breaker)No database. No Redis. No queue. No background jobs.
The entire thing is a single Hono app running on Bun. Here's why:
- Hono + Bun — Smallest viable web framework. Native TypeScript, built-in JSX for SSR, fast cold starts. No Next.js because there's no React UI complexity to justify it.
- ungh.cc — Free, cached GitHub proxy. Returns the recursive file tree in one call with the commit SHA. No auth needed.
- jsDelivr — Serves any file from any public GitHub repo at
cdn.jsdelivr.net/gh/{owner}/{repo}@{sha}/{path}. By using the SHA from ungh's tree response, content URLs become immutable and cache forever. Falls back toraw.githubusercontent.comwhen jsDelivr returns 403. - LRU cache — The
lru-cachenpm package handles 500 entries with 5-minute TTL. Rendered HTML (the expensive Shiki output) is cached so the first request takes ~2s but subsequent requests are instant. - Cloudflare + Railway CDN — Two layers of edge caching. Raw markdown responses cache at the CDN edge. HTML is served fresh from origin but the LRU cache keeps it fast.
What I learned building this
Hono JSX escapes everything
The biggest recurring issue: Hono's JSX escapes HTML entities in attributes and text. Font URLs with & become &. Emoji HTML entities like 📄 get double-escaped. CSS with quotes gets mangled inside <style> tags.
The fix: use raw() from hono/html for anything that needs unescaped HTML — font links, style blocks, emoji icons. For JavaScript strings in onclick handlers, use Unicode escapes (\u2713) instead of HTML entities.
Content negotiation is trickier than it looks
The Accept header priority order matters. Browsers send Accept: text/html,application/xhtml+xml,... which includes text/html. But curl sends */* by default. And .md URLs should return raw markdown to curl users without requiring an Accept header.
The final priority: explicit Accept header > URL extension hint > User-Agent heuristic > default to HTML.
YAML frontmatter needs special handling
Many documentation files (especially SKILL.md files in agent repos) start with YAML frontmatter between --- markers. marked doesn't parse this natively — it renders the --- as horizontal rules and the key-value pairs as paragraphs.
I convert frontmatter into a clean HTML table injected before the rendered markdown body. No data lost, looks like GitHub's rendering.
Cache the rendered HTML, not just the raw content
Shiki syntax highlighting is the most expensive operation (~1-2s for large files). I initially only cached the raw markdown from jsDelivr. Adding a second cache layer for the rendered HTML (keyed by owner/repo@sha:path) made subsequent page loads near-instant.
The tech stack
| Layer | Choice |
|---|---|
| Runtime | Bun |
| Framework | Hono (JSX SSR) |
| Markdown | marked + sanitize-html |
| Syntax Highlighting | Shiki (GitHub Dark theme) |
| Fonts | DM Sans + Space Mono |
| Cache | lru-cache (in-process) |
| Upstream | ungh.cc + jsDelivr + GitHub API |
| Hosting | Railway + Cloudflare |
Total dependencies: 5 runtime packages. The entire codebase is ~2,000 lines across 27 files.
SEO and AI discoverability
rawdocs implements the full emerging stack for AI-discoverable web content:
- JSON-LD structured data —
TechArticlefor file views,CollectionPagefor tree views,WebSite + WebApplicationfor the homepage. Server-rendered, because AI crawlers don't execute JavaScript. <link rel="alternate" type="text/markdown">— On every HTML page, so AI agents can discover the raw markdown version.- robots.txt — Explicitly welcomes GPTBot, ClaudeBot, PerplexityBot, and every other major AI crawler.
/llms.txt— Site-level and per-repo documentation indexes following the llms.txt spec.
Try it
- Browse: rawdocs.sivaram.dev/openclaw/openclaw
- Raw markdown:
curl https://rawdocs.sivaram.dev/facebook/react/README.md - Full docs dump:
curl https://rawdocs.sivaram.dev/vercel/next.js/llms-full.txt - Source: github.com/SivaramPg/rawdocs
It works on any public GitHub repo. Paste a URL and go.
rawdocs is open source and deployed on Railway. Built by Sivaram Pandariganthan.