← Back to Blog·Jul 11, 2024·11 min read
AI Crawlers

AI Crawler IP Addresses: Known Ranges, Verification, and Blocking

A technical reference for the IP address ranges used by GPTBot, ClaudeBot, Bytespider, and other AI crawlers — plus how to verify bot identity and block at the network level.

Every major AI company publishes or leaks its crawler IP ranges

Use reverse DNS and IP verification to confirm which AI bots are really who they claim to be

Why AI Crawler IP Verification Matters

Every request to your web server arrives with two key identifiers: a user-agent string and a source IP address. While user-agent strings tell you what a bot claims to be, the IP address tells you where the request actually originates. For AI crawler management, this distinction is critical.

User-agent spoofing is trivial. Any script can set its user-agent to "GPTBot" or "ClaudeBot" and your robots.txt rules will apply to it, but the request may actually come from a scraper with no connection to OpenAI or Anthropic. Conversely, a malicious actor could strip the AI bot user-agent entirely and crawl your site without triggering any bot-specific rules.

IP verification closes this gap. By checking the source IP of a request against the known IP ranges published by AI companies, you can confirm whether a bot is genuinely operated by OpenAI, Anthropic, Google, or ByteDance — or whether it is an impersonator. This is the same technique Google has recommended for years to verify Googlebot.

Known AI Crawler IP Ranges by Company

Most major AI companies now publish their crawler IP ranges publicly. The format and update frequency vary by company, but the information is available if you know where to look.

CrawlerCompanyIP Publication MethodVerification Method
GPTBot / ChatGPT-UserOpenAIJSON file at openai.com/gptbot-ranges.jsonMatch IP against published CIDR ranges
ClaudeBotAnthropicNo static list publishedReverse DNS resolves to *.anthropic.com
Google-ExtendedGoogleJSON file at developers.google.comReverse DNS to *.googlebot.com or *.google.com
BytespiderByteDanceNo official list publishedReverse DNS to ByteDance-owned domains
Meta-ExternalAgentMetaPublished via ASN lookupsReverse DNS to *.facebook.com or *.meta.com
PerplexityBotPerplexityNo official list publishedReverse DNS to *.perplexity.ai

OpenAI publishes GPTBot and ChatGPT-User IP ranges in a JSON file at https://openai.com/gptbot-ranges.json. This file contains CIDR blocks that are updated periodically as OpenAI expands its crawling infrastructure. As of early 2026, the ranges include multiple /24 and /23 blocks across several cloud providers.

Anthropic does not publish a static IP list for ClaudeBot, but all ClaudeBot requests originate from IP addresses that resolve via reverse DNS to hostnames ending in *.anthropic.com. This makes reverse DNS the primary verification method for ClaudeBot traffic.

Google-Extended shares its IP infrastructure with Googlebot. Google publishes Googlebot IP ranges at https://developers.google.com/search/apis/ipranges/googlebot.json, and any IP verified as Googlebot via reverse DNS to *.googlebot.com or *.google.com may also be used by Google-Extended.

ByteDance operates Bytespider from data center IPs, primarily in US and Singapore regions. ByteDance does not publish an official IP list, but the IPs are identifiable by reverse DNS resolution to ByteDance-owned domains and by their consistent presence in known ByteDance CIDR ranges.

IP Range Updates

AI companies regularly add new IP ranges as they scale their crawling operations. OpenAI's published JSON file has grown from 8 CIDR blocks in 2024 to over 20 in 2026. Bookmark the source URLs and check them monthly.

How to Verify AI Crawler IPs with Reverse DNS

Reverse DNS (rDNS) lookup is the most reliable method for verifying that an AI crawler is genuinely operated by the company it claims to represent. The process works in two steps: a reverse lookup from IP to hostname, followed by a forward lookup to confirm the hostname resolves back to the same IP.

Two-Step DNS Verification Process

  1. Run a reverse DNS lookup on the source IP: <code>dig -x 203.0.113.45</code> — this returns a hostname like <code>crawl-203-0-113-45.anthropic.com</code>.
  2. Verify the hostname resolves back to the original IP: <code>dig +short crawl-203-0-113-45.anthropic.com</code> — the result should match <code>203.0.113.45</code>.
  3. If both lookups match, the bot is verified. If the reverse lookup returns no result, or the forward lookup points to a different IP, the request is likely spoofed.

This two-step verification prevents DNS spoofing. An attacker could set up a reverse DNS record pointing their IP to crawl.anthropic.com, but the forward lookup would fail because Anthropic controls the A records for that domain.

For ClaudeBot, the reverse DNS should resolve to a hostname ending in .anthropic.com. For GPTBot, reverse DNS resolves to hostnames within OpenAI-controlled ranges. For Google-Extended, the hostname should end in .googlebot.com or .google.com — the same verification used for regular Googlebot.

verify-ai-crawler.shbash
#!/bin/bash
# Verify an AI crawler IP via two-step DNS lookup
IP="$1"
HOST=$(dig -x "$IP" +short | head -1)
echo "Reverse DNS: $IP -> $HOST"
if [ -z "$HOST" ]; then
  echo "FAIL: No reverse DNS record found"
  exit 1
fi
FORWARD=$(dig +short "$HOST" | head -1)
echo "Forward DNS: $HOST -> $FORWARD"
if [ "$FORWARD" = "$IP" ]; then
  echo "PASS: Bot identity verified"
else
  echo "FAIL: Forward lookup does not match ($FORWARD != $IP)"
fi

Bring External Site Data Into Copper

Pull roadmaps, blog metadata, and operational signals into one dashboard without asking every team to learn a new workflow.

AI Crawler IP Blocking vs User-Agent Blocking

There are two primary methods for blocking AI crawlers: user-agent-based rules (via robots.txt or server configuration) and IP-based blocking (via firewall rules or CDN configuration). Each approach has strengths and limitations, and the most effective strategy uses both.

User-Agent Blocking

robots.txt and server rules

Simple to implement — just edit robots.txt or add a server config rule. Works across all IPs as long as the bot identifies itself correctly.

Limitations: purely advisory, bots can ignore it, does not prevent the TCP connection from being established. No protection against user-agent spoofing.

Best for: compliant crawlers like GPTBot, ClaudeBot, Google-Extended

IP-Based Blocking

Firewall and CDN rules

Enforced at the network level — connections are dropped before any HTTP exchange. Cannot be ignored or bypassed by the crawler software.

Limitations: requires maintaining an up-to-date IP list, may block legitimate traffic if ranges overlap, higher operational overhead.

Best for: non-compliant bots, scrapers impersonating AI crawlers

The trade-off is maintenance. AI companies rotate and expand their IP ranges regularly. A firewall rule that blocks all current GPTBot IPs may miss new ranges added next month. User-agent rules, by contrast, remain effective as long as the bot identifies itself consistently.

Do Not Block Shared IP Ranges

Google-Extended shares IP infrastructure with Googlebot. If you block Google-Extended IPs, you will also block Googlebot and disappear from Google Search results. Use user-agent rules for Google-Extended instead of IP blocking.

Implementing IP-Based AI Crawler Blocks

Once you have the IP ranges you want to block, implementation depends on your infrastructure. The three most common approaches are server-level firewall rules, web server configuration, and CDN or WAF rules.

nginx-block-ai-crawlers.confnginx
# Block known AI crawler IP ranges in Nginx
# GPTBot ranges (from openai.com/gptbot-ranges.json)
deny 20.15.240.0/24;
deny 20.15.241.0/24;
deny 20.15.242.0/23;

# Bytespider ranges (known ByteDance data center IPs)
deny 110.249.196.0/22;
deny 111.225.148.0/22;

# Allow all other traffic
allow all;
auto-update-blocklist.shbash
#!/bin/bash
# Auto-update GPTBot IP blocklist from OpenAI's published ranges
curl -s https://openai.com/gptbot-ranges.json \
  | jq -r '.prefixes[].ipv4Prefix // empty' \
  | while read cidr; do
    echo "deny $cidr;" 
  done > /etc/nginx/conf.d/block-gptbot.conf

nginx -t && systemctl reload nginx
echo "GPTBot blocklist updated: $(date)"

For dynamic or large-scale blocking, a CDN-level approach is more practical. Cloudflare, AWS WAF, and Fastly all support IP list rules that can be updated via API. This lets you automate updates — pull the latest IP ranges from OpenAI or Google, push them to your CDN, and the rules take effect globally within seconds.

On Linux servers, iptables or nftables provide the most direct firewall-level blocking. You can script the process to fetch published IP ranges and update your firewall rules automatically on a cron schedule.

Keeping Your AI Crawler IP Blocklist Current

The biggest challenge with IP-based AI crawler management is maintenance. AI companies regularly expand their infrastructure, adding new IP ranges and retiring old ones. A blocklist that was complete last month may have gaps today.

OpenAI updates their published GPTBot ranges without announcement. Google updates their Googlebot IP list as they add new crawling infrastructure. Anthropic and ByteDance do not publish static lists at all, which means you need to discover new ClaudeBot and Bytespider IPs through log analysis.

Maintenance Strategies

Automated Fetching

Script a cron job to pull OpenAI and Google's published IP range JSON files daily and update your firewall rules automatically.

Log Aggregation

For crawlers without published IPs (ClaudeBot, Bytespider), aggregate unknown bot IPs from your server logs weekly and verify via reverse DNS.

Analytics Monitoring

Use Copper Analytics to track all AI crawler IPs in real time. New ranges are detected automatically and flagged in your dashboard.

The practical approach is a combination strategy: use robots.txt for first-line blocking of compliant crawlers, IP blocking for enforcement against non-compliant bots, and continuous monitoring to detect new IPs that slip through both layers.

Automating the update process is essential. Set up a cron job or CI/CD pipeline step that fetches the latest published IP ranges, compares them against your current blocklist, and applies changes automatically. For crawlers without published IP lists, aggregate new IPs from your server logs weekly.

Monitoring Tip

Copper Analytics tracks AI crawler IPs automatically and alerts you when a new IP range appears for a known crawler. This eliminates the manual log-scraping step and ensures your blocklist data stays current.

Automate AI Crawler IP Tracking with Copper Analytics

Manual IP verification and blocklist maintenance is effective but time-consuming. Copper Analytics automates the entire process — from identifying AI crawlers by IP to verifying their identity via reverse DNS to tracking new IP ranges as they appear.

The Crawlers dashboard shows every AI bot that has visited your site, organized by company and verified by IP. You can see which IP ranges GPTBot is currently using, confirm that ClaudeBot requests genuinely resolve to Anthropic hostnames, and spot unknown bots that may be impersonating legitimate crawlers.

For security teams, the IP-level visibility means you can export verified IP data to feed your firewall rules, WAF policies, or CDN blocklists. Instead of manually scraping logs and running DNS lookups, you get a continuously updated, verified dataset of every AI crawler touching your infrastructure.

Stop Guessing Which IPs Are AI Crawlers

Copper Analytics verifies AI crawler IPs automatically with reverse DNS. See every bot, every IP, every request — no manual lookups required.

What to Do Next

The right stack depends on how much visibility, workflow control, and reporting depth you need. If you want a simpler way to centralize site reporting and operational data, compare plans on the pricing page and start with a free Copper Analytics account.

You can also keep exploring related guides from the Copper Analytics blog to compare tools, setup patterns, and reporting workflows before making a decision.

CopperAnalytics | AI Crawler IP Addresses: Known Ranges, Verification, and Blocking