I kept getting "your AI visibility is low" reports from various tools that wouldn't tell me
*why*
. Was the block in robots.txt? At the CDN? At origin? Different fixes, different teams.
I guess this sits somewhere in the "generative engine optimization" bucket, but I wanted the tool to stay very concrete: can these crawlers reach the site, and if not, where are they being blocked?
So I wrote a small Node CLI that just answers that question deterministically:
```
npx u/geosuite/ai-crawler-bots robots https://my-site.com
```
What it actually does:
- Parses robots.txt with line-level provenance — when a bot is Disallow'd it tells me
*which line in which group*
.
- For each tracked bot (24 right now: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, PerplexityBot, Perplexity-User, Bytespider, etc.), reports the verdict.
- Detects Cloudflare's "Managed Content" markers (`# BEGIN Cloudflare Managed content` … `# END`) and tells me whether my own rules would've allowed the bot.
- Also has a `check <url>` mode that does an actual HTTP probe with each bot's UA, and distinguishes edge blocks (CDN fingerprints) from origin blocks. Different remediation.
Zero runtime dependencies, MIT, Node 20+. Source: github.com/TryGeoSuite/ai-crawler-bots
There are three companion tools in the same scope:
- `@geosuite/schema-templates` — 23 schema.org JSON-LD templates + offline validator.
- `@geosuite/llms-txt-generator` — sitemap.xml → llms.txt.
- `@geosuite/sitemap-builder` — crawl + valid sitemap.xml for custom sites without one.
Honest disclaimer: I also build a hosted SaaS (trygeosuite.it) on top of similar logic, but the four CLIs are MIT and stand alone. I open-sourced them because I find it dishonest to sell a black box that does things any dev can verify.
Curious what other people are using to debug AI bot reachability — especially anyone running through Cloudflare, Akamai, or Vercel. The "managed content" injection broke my mental model the first time I hit it.