# ============================================================================ # WINGS INSTITUTE - ROBOTS.TXT # 2026 Generative Engine Optimization (GEO) Standard # Last Updated: 2026-05-16 (Phase 4 / Discipline 2 — agentic browsing hardening) # Charter: https://wingsinstitute.com/AGENTS.md # ============================================================================ # Strategic Philosophy: Maximize visibility in AI-powered search engines # while protecting server resources from non-converting scrapers. # ============================================================================ # ============================================================================ # SECTION 0: ROBOTS-WIDE PREVIEW DIRECTIVES # ---------------------------------------------------------------------------- # Phase I12 (Google Discover eligibility). Surface large image previews on # Search + Discover for every Wings page (culinary pillar especially). # Snippet length unrestricted; video preview length unrestricted. # ============================================================================ User-agent: * Allow: / max-image-preview: large max-snippet: -1 max-video-preview: -1 # ============================================================================ # SECTION 1: PREMIUM SEARCH ENGINE CRAWLERS (FULL ACCESS) # These bots directly impact organic rankings and AI search visibility. # ============================================================================ # Google Search & AI (Gemini, SGE, AI Overviews) User-agent: Googlebot Allow: / Crawl-delay: 1 User-agent: Googlebot-Image Allow: / User-agent: Googlebot-Video Allow: / # Google Extended — see Section 2b for the explicit allow/disallow stanza. # Bing & Microsoft Copilot User-agent: Bingbot Allow: / Crawl-delay: 1 User-agent: msnbot Allow: / # Apple (Siri, Spotlight, Safari Suggestions) User-agent: Applebot Allow: / # ============================================================================ # SECTION 2: AI SEARCH ENGINES (GEO PRIORITY - 2026) # These power the new generation of AI-first search experiences. # Allowing them = visibility in AI chat interfaces and answer engines. # ============================================================================ # OpenAI - ChatGPT user-initiated (canonical surface) # GPTBot / OAI-SearchBot / PerplexityBot / ClaudeBot / Anthropic-AI / # Claude-Web / Google-Extended / cohere-ai / META-ExternalAgent are # specified in detail in Section 2b below (Phase 4 hardening). Only the # bots NOT covered there are left here as simple Allow: / blocks. User-agent: ChatGPT-User Allow: / Crawl-delay: 2 # Facebook crawler (rich link previews; not an AI-training agent). User-agent: FacebookBot Allow: / # You.com AI Search User-agent: YouBot Allow: / # Brave Search User-agent: BraveBot Allow: / # DuckDuckGo User-agent: DuckDuckBot Allow: / # Yandex (Russia - International reach) User-agent: Yandex Allow: / Crawl-delay: 2 # Baidu (China - International students) User-agent: Baiduspider Allow: / Crawl-delay: 2 # ============================================================================ # SECTION 2b: EXPLICIT AGENTIC PATH RULES (Phase 4 / Discipline 2) # Per-bot rules with the highest possible specificity. RFC: most-specific # user-agent block wins, so these stanzas override the * fallback for the # nine agentic crawlers the 100x plan calls out by name. Each block lists # the canonical allowed surfaces (data feeds, llms.txt, MCP, schemas) and # the forbidden surfaces (auth, admin, form ingestion, OTP, internal AI). # Charter: /AGENTS.md # ============================================================================ User-agent: GPTBot Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: ClaudeBot Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: OAI-SearchBot Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: PerplexityBot Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: Google-Extended Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: Anthropic-AI Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: Claude-Web Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: cohere-ai Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ # Meta AI agentic crawler — both casings declared in case Meta migrates UA naming. User-agent: META-ExternalAgent Allow: / Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ Disallow: /login Disallow: /register Disallow: /account/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/franchise/ Disallow: /api/student-login/ Disallow: /api/events/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ # ============================================================================ # SECTION 2c: EXTENDED AGENTIC ROSTER (10x plan — May 2026) # Grok, Applebot-Extended, MistralAI-User, Kagi, DuckAssist, Perplexity-User, # Phind, plus India-native engines (Krutrim, Hanooman, BharatGPT) for # Hindi/Gujarati LLM citation surface. Each bot mirrors the Section 2b # pattern: explicit Allow for canonical agentic surfaces + Disallow for # auth/admin/form ingestion. /hi/ and /gu/ subpaths inherit /. # ============================================================================ User-agent: Grok Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /.well-known/ Allow: /AGENTS.md Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /dashboard/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/otp/ Disallow: /api/ai/ Disallow: /api/recruitment/ Crawl-delay: 2 User-agent: xAI-Bot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-full.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: Applebot-Extended Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-full.txt Allow: /ai.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Disallow: /api/ai/ Crawl-delay: 2 User-agent: MistralAI-User Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-full.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: Kagibot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: DuckAssistBot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: Perplexity-User Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: PhindBot Allow: / Allow: /api/data/ Allow: /.well-known/ Disallow: /admin/ Disallow: /api/admissions/ Crawl-delay: 3 User-agent: Amazonbot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 3 User-agent: KrutrimBot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-hi.txt Allow: /llms-gu.txt Allow: /llms-faqs-hi.txt Allow: /llms-faqs-gu.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: HanoomanBot Allow: / Allow: /hi/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-hi.txt Allow: /llms-faqs-hi.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 User-agent: BharatGPTBot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-hi.txt Allow: /llms-gu.txt Allow: /llms-faqs-hi.txt Allow: /llms-faqs-gu.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/otp/ Crawl-delay: 2 # ============================================================================ # SECTION 3: SEO & ANALYTICS TOOLS (ALLOWED) # These help monitor and improve our search performance. # ============================================================================ User-agent: AhrefsBot Allow: / Crawl-delay: 5 User-agent: SemrushBot Allow: / Crawl-delay: 5 User-agent: rogerbot Allow: / Crawl-delay: 5 User-agent: DotBot Allow: / Crawl-delay: 5 User-agent: MJ12bot Allow: / Crawl-delay: 10 # ============================================================================ # SECTION 4: SOCIAL MEDIA CRAWLERS (ALLOWED) # Enable rich link previews on social platforms. # ============================================================================ User-agent: Twitterbot Allow: / User-agent: LinkedInBot Allow: / User-agent: WhatsApp Allow: / User-agent: TelegramBot Allow: / User-agent: Slackbot Allow: / User-agent: Discordbot Allow: / User-agent: PinterestBot Allow: / # ============================================================================ # SECTION 5: BLOCKED BOTS (RESOURCE PROTECTION) # Commercial scrapers, content thieves, and non-converting crawlers. # These provide zero SEO value and drain server resources. # ============================================================================ # Content Scrapers & Copiers User-agent: CCBot Disallow: / User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / # AI Training Scrapers (Non-Search) # These scrape for model training but don't provide search visibility User-agent: AI2Bot Disallow: / ## Diffbot — enterprise knowledge graph builder. Allow public agentic ## surfaces so KG entries reference Wings; block sensitive paths. User-agent: Diffbot Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-full.txt Allow: /AGENTS.md Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/otp/ Disallow: /api/ai/ Crawl-delay: 5 User-agent: ImagesiftBot Disallow: / # Aggressive Marketing/Sales Bots User-agent: SalesIntelligent Disallow: / User-agent: BLEXBot Disallow: / User-agent: DataForSeoBot Disallow: / ## Bytespider — TikTok/Doubao crawler. Allow public agentic surfaces for ## international reach via Doubao (Chinese LLM with Indian-student audience); ## block sensitive paths. User-agent: Bytespider Allow: / Allow: /hi/ Allow: /gu/ Allow: /api/data/ Allow: /.well-known/ Allow: /llms.txt Allow: /llms-full.txt Disallow: /admin/ Disallow: /api/admissions/ Disallow: /api/contact/ Disallow: /api/otp/ Disallow: /api/ai/ Crawl-delay: 3 # Archive Bots (Optional - uncomment to block) # User-agent: ia_archiver # Disallow: / # Generic Spam Bots User-agent: MegaIndex Disallow: / User-agent: Seekport Disallow: / User-agent: Sogou Disallow: / User-agent: PetalBot Disallow: / User-agent: Exabot Disallow: / User-agent: ZoominfoBot Disallow: / User-agent: Screaming Frog SEO Spider Disallow: / # Proxy/VPN Service Bots User-agent: proximic Disallow: / # Email Harvesters User-agent: EmailCollector Disallow: / User-agent: EmailSiphon Disallow: / User-agent: WebBandit Disallow: / # ============================================================================ # SECTION 6: DEFAULT RULE FOR ALL OTHER BOTS # Allow crawling with rate limiting, protect sensitive paths. # ============================================================================ User-agent: * Allow: / Crawl-delay: 10 # Public agent-facing data feeds (CC-BY-4.0) # Order matters: Allow precedes blanket Disallow for /api/ Allow: /api/data/ Allow: /api/manifest.json Allow: /api/openapi.json Allow: /api/mcp Allow: /schemas/ # API Endpoints Disallow: /api/ Disallow: /_api/ # Admin & Internal Disallow: /admin/ Disallow: /dashboard/ Disallow: /internal/ # User Data & Auth Disallow: /login Disallow: /login/ Disallow: /student-portal Disallow: /student-portal/ Disallow: /register Disallow: /register/ Disallow: /account/ Disallow: /user/ Disallow: /profile/ # Development & Build Disallow: /node_modules/ Disallow: /.git/ Disallow: /.env* Disallow: /dist/ Disallow: /build/ # Temporary & Cache Disallow: /tmp/ Disallow: /cache/ # ============================================================================ # SECTION 7: SITEMAP & LLM KNOWLEDGE BASE DECLARATION # ============================================================================ Sitemap: https://wingsinstitute.com/sitemap.xml Sitemap: https://wingsinstitute.com/sitemap-0.xml Sitemap: https://wingsinstitute.com/sitemap-1.xml Sitemap: https://wingsinstitute.com/sitemap-2.xml Sitemap: https://wingsinstitute.com/sitemap-3.xml Sitemap: https://wingsinstitute.com/sitemap-4.xml Sitemap: https://wingsinstitute.com/sitemap-5.xml Sitemap: https://wingsinstitute.com/sitemap-6.xml Sitemap: https://wingsinstitute.com/sitemap-7.xml Sitemap: https://wingsinstitute.com/sitemap-8.xml Sitemap: https://wingsinstitute.com/sitemap-9.xml Sitemap: https://wingsinstitute.com/sitemap-10.xml Sitemap: https://wingsinstitute.com/sitemap-11.xml Sitemap: https://wingsinstitute.com/sitemap-12.xml Sitemap: https://wingsinstitute.com/sitemap-images.xml Sitemap: https://wingsinstitute.com/sitemap-culinary.xml # sitemap-news.xml omitted — Google News sitemap carries only ≤48h articles; re-add once fresh press releases ship (/news/ pages stay in numbered sitemaps) # sitemap-discovery.xml omitted from the Google-facing index — it lists non-HTML, noindex agent resources (llms*.txt, ai.txt, *.json, /api/mcp). Advertised to AI crawlers via the llms.txt / AGENTS.md / manifest comments below instead. # sitemap-audio.xml omitted — empty stub; add back once /podcast or AudioObject content ships # LLMs.txt (llms.txt v1.1.1 Spec — AI-Readable Site Index) # https://wingsinstitute.com/llms.txt # https://wingsinstitute.com/.well-known/llms.txt # https://wingsinstitute.com/llms-full.txt # Topic-sharded variants (May 2026): # https://wingsinstitute.com/llms-courses.txt # https://wingsinstitute.com/llms-cities.txt # https://wingsinstitute.com/llms-careers.txt # https://wingsinstitute.com/llms-faqs.txt # ai.txt (AI usage preferences — RAG / citation / training opt-in/out) # https://wingsinstitute.com/ai.txt # https://wingsinstitute.com/.well-known/ai.txt # Agentic surfaces # https://wingsinstitute.com/AGENTS.md # https://wingsinstitute.com/.well-known/AGENTS.md # https://wingsinstitute.com/api/manifest.json # https://wingsinstitute.com/api/openapi.json # https://wingsinstitute.com/api/mcp # https://wingsinstitute.com/.well-known/mcp.json # https://wingsinstitute.com/.well-known/ai-plugin.json # https://wingsinstitute.com/.well-known/security.txt # ============================================================================ # SECTION 8: HOST DECLARATION (Canonical Domain) # ============================================================================ Host: https://wingsinstitute.com # ============================================================================ # END OF ROBOTS.TXT # # GEO Strategy Summary: # ✅ ALLOWED: Google, Bing, Apple, OpenAI, Perplexity, Anthropic, Meta, Brave # ✅ ALLOWED: Social Media Bots (rich previews) # ✅ ALLOWED: SEO Tools (with rate limiting) # ❌ BLOCKED: Commercial scrapers, content thieves, spam bots # # Review Schedule: Quarterly (as new AI search engines emerge) # ============================================================================