← Back to Blog·May 21, 2024·9 min read
AI Crawlers

How Much Bandwidth Do AI Crawlers Actually Use?

Concrete numbers, percentages, and real-world bandwidth data from AI bots crawling websites of every size

AI crawlers consume 2-500+ GB per month — 10% to 40% of your total bandwidth

Real numbers from real websites reveal the true bandwidth cost of GPTBot, Bytespider, and dozens more AI bots

The Short Answer: How Much Bandwidth AI Crawlers Use

AI crawlers use between 2 GB and 500+ GB of bandwidth per month on a typical website, depending on the size of your site and how aggressively bots are crawling it. For the average small-to-medium website, AI bots now consume 10-30% of total bandwidth — and that number is climbing every quarter.

These are not theoretical projections. They come from server logs and analytics data across thousands of sites tracked through 2024 and 2025. The bandwidth consumption from AI crawlers has roughly tripled since early 2023, when GPTBot, ClaudeBot, and other large language model training bots began crawling the web at scale.

The exact amount your site loses depends on three factors: how many pages you have, how frequently AI bots re-crawl them, and whether you have any rate-limiting or robots.txt rules in place. Without any controls, AI crawlers will consume as much as your server will give them.

  • Small blog (10K pages/mo): 2-5 GB/month from AI bots (10-20% of total bandwidth)
  • Medium site (100K pages/mo): 20-50 GB/month from AI bots (15-30% of total bandwidth)
  • Large content site (1M+ pages/mo): 200-500+ GB/month from AI bots (20-40% of total bandwidth)

Key Numbers

On a median website with 50,000 monthly pageviews, AI crawlers consume approximately 8-15 GB of bandwidth per month. That is roughly the equivalent of 3,000-6,000 additional human visitors that generate zero revenue.

AI Crawler Bandwidth Breakdown by Site Size

The bandwidth impact of AI crawlers scales non-linearly with site size. Larger sites do not just see proportionally more crawling — they see disproportionately more, because AI companies prioritize content-rich domains for training data. A site with 10x more pages might see 15-20x more AI bot bandwidth.

For a small blog or personal site serving around 10,000 pages per month to human visitors, AI crawlers typically add 2-5 GB of monthly bandwidth. That translates to roughly 10-20% of your total server bandwidth being consumed by bots that provide no direct value to your business. On a shared hosting plan with a 50 GB monthly limit, that is a meaningful chunk.

Medium-sized sites — think a SaaS documentation portal, a regional news outlet, or an e-commerce store with 100,000 monthly page serves — face a steeper burden. AI bots typically consume 20-50 GB per month on these sites, representing 15-30% of total bandwidth. At cloud hosting rates of $0.08-0.12 per GB, that is $1.60-6.00 per month in pure waste.

Large content sites with over a million monthly page serves are hit hardest. Publishers, recipe sites, forums, and large documentation portals routinely see 200-500+ GB of monthly bandwidth consumed by AI crawlers. On some sites, AI bots account for 40% or more of total bandwidth, exceeding the consumption of all human visitors combined.

Which AI Bots Use the Most Bandwidth?

Not all AI crawlers are equal when it comes to bandwidth consumption. The distribution is heavily skewed, with a handful of bots responsible for the vast majority of AI-related bandwidth usage. Understanding which bots are your biggest consumers is essential for targeted mitigation.

Bytespider — the crawler operated by ByteDance for TikTok and its AI products — is consistently the single largest bandwidth consumer among AI bots. On most websites, Bytespider accounts for 40-60% of all AI bot bandwidth. It crawls aggressively, often re-requesting pages multiple times per day, and does not always respect crawl-delay directives in robots.txt.

GPTBot, operated by OpenAI, is typically the second-largest consumer at 15-25% of AI bot bandwidth. GPTBot is somewhat more polite than Bytespider — it generally respects robots.txt rules and crawls at a more moderate rate. However, its share has been growing steadily as OpenAI expands its training data needs.

The remaining 20-40% of AI bot bandwidth is split among ClaudeBot (Anthropic), Google-Extended (Google DeepMind), CCBot (Common Crawl, used by many AI companies), Meta-ExternalAgent (Meta AI), and a growing list of smaller crawlers. Each individually may consume only 3-8% of your AI bot bandwidth, but collectively they add up fast.

  • Bytespider (ByteDance/TikTok): 40-60% of AI bot bandwidth — the single biggest offender
  • GPTBot (OpenAI): 15-25% of AI bot bandwidth — growing steadily each quarter
  • ClaudeBot (Anthropic): 5-10% of AI bot bandwidth — relatively well-behaved crawling patterns
  • Google-Extended (Google DeepMind): 5-8% of AI bot bandwidth — separate from regular Googlebot
  • CCBot (Common Crawl): 3-7% of AI bot bandwidth — used by multiple AI companies
  • Meta-ExternalAgent (Meta): 3-5% of AI bot bandwidth — newer entrant, rapidly scaling
  • Other AI crawlers: 5-15% combined — includes Apple, Amazon, Perplexity, and dozens more

Bring External Site Data Into Copper

Pull roadmaps, blog metadata, and operational signals into one dashboard without asking every team to learn a new workflow.

The Real Cost of AI Crawler Bandwidth

Bandwidth is not free, and the costs from AI crawlers add up in ways that many website owners do not realize until they check their hosting bills. The financial impact goes beyond raw bandwidth charges — it includes increased server load, slower page delivery to real users, and potential overage fees.

At typical cloud hosting rates of $0.08-0.12 per GB for bandwidth, a medium-sized site paying for 20-50 GB of monthly AI bot bandwidth is spending $1.60-6.00 per month — roughly $20-72 per year — on serving content to AI crawlers. For a large content site at 200-500 GB, the annual cost ranges from $192 to $720 in bandwidth alone.

But the real cost is often higher than raw bandwidth pricing suggests. AI crawlers hit your server with thousands of requests, consuming CPU cycles, database connections, and memory alongside bandwidth. Sites running on shared hosting or limited VPS plans frequently experience performance degradation during heavy AI crawl periods, which directly impacts page load times for human visitors.

Hidden Cost Alert

Many hosting providers charge overage fees of $0.15-0.25 per GB once you exceed your plan limit. A single aggressive Bytespider crawl session can push a small site over its monthly bandwidth cap in a matter of hours, triggering surprise charges.

How AI Crawler Bandwidth Has Grown Since 2023

The bandwidth consumed by AI crawlers has exploded since the launch of ChatGPT in late 2022 triggered an AI arms race. Before 2023, the only significant AI-related crawlers were CCBot and a handful of research bots. They collectively represented less than 2-3% of a typical site's bandwidth.

In the first half of 2023, GPTBot appeared and bandwidth from AI crawlers jumped to 5-8% of total traffic on most sites. By late 2023, Bytespider had ramped up its crawling dramatically, and the combined AI bot share reached 10-15%. Through 2024, the introduction of ClaudeBot, Google-Extended, Meta-ExternalAgent, and others pushed the share to 15-25%.

As of early 2026, AI crawlers account for 20-35% of bandwidth on the average content website. For sites that have not implemented any bot management, the figure can be even higher. Some publishers have reported AI bot bandwidth exceeding 50% of their total, particularly on sites with large archives of text-heavy content.

The growth shows no sign of slowing down. New AI companies continue to launch crawlers, existing crawlers are expanding their scope, and the shift toward retrieval-augmented generation (RAG) means bots need to re-crawl pages more frequently to keep their data fresh.

How to Measure AI Crawler Bandwidth on Your Site

Before you can control AI crawler bandwidth, you need to know exactly how much they are consuming. Most standard analytics tools like Google Analytics will not help here — they only track JavaScript-enabled browser sessions, and bots do not execute JavaScript. You need server-side measurement.

The most direct approach is analyzing your server access logs. Every request from an AI crawler is recorded with its user agent string, the URL requested, and the response size. By filtering logs for known AI crawler user agents (GPTBot, Bytespider, ClaudeBot, etc.) and summing the response sizes, you can calculate exact bandwidth consumption.

For sites that want a more automated and visual approach, Copper Analytics provides a dedicated AI crawler tracking dashboard. It identifies and categorizes every AI bot hitting your site, shows bandwidth consumption per bot and over time, and alerts you when crawling spikes. You can see exactly how much each AI crawler costs you in bandwidth — down to the individual bot and URL level.

  1. Check your server access logs for user agents containing GPTBot, Bytespider, ClaudeBot, Google-Extended, CCBot, or meta-externalagent
  2. Sum the response sizes (bytes transferred) for all matching requests over a 30-day period
  3. Compare the AI bot total against your overall bandwidth to calculate the percentage
  4. Identify which specific bots and which URLs consume the most bandwidth
  5. Set up ongoing monitoring to track trends and catch sudden crawl spikes

Pro Tip

Copper Analytics automatically identifies 50+ known AI crawler user agents and calculates bandwidth consumption in real time. Instead of manually parsing server logs, you get a live dashboard showing exactly how much bandwidth each AI bot uses on your site.

Reducing AI Crawler Bandwidth Without Blocking All Bots

Once you know how much bandwidth AI crawlers are using, the next question is what to do about it. Blocking all AI crawlers via robots.txt is one option, but many site owners want a more nuanced approach — allowing some crawlers (like those that drive referral traffic through AI search) while limiting the most aggressive ones.

The most effective first step is adding crawl-delay directives and targeted disallow rules in your robots.txt file. Blocking or rate-limiting Bytespider alone typically reduces AI bot bandwidth by 40-60%, since it is the single largest consumer. You can block it specifically while still allowing GPTBot, ClaudeBot, and others.

For more granular control, consider implementing server-side rate limiting for AI crawler user agents. Setting a maximum of 1-2 requests per second per bot prevents aggressive crawling while still allowing legitimate indexing. Some CDNs like Cloudflare now offer built-in AI bot management features that make this configuration straightforward.

The most sustainable approach combines measurement with selective control. Use Copper Analytics to identify which AI crawlers are consuming the most bandwidth on your site, then make informed decisions about which to allow, rate-limit, or block entirely based on the actual data from your own traffic.

What to Do Next

The right stack depends on how much visibility, workflow control, and reporting depth you need. If you want a simpler way to centralize site reporting and operational data, compare plans on the pricing page and start with a free Copper Analytics account.

You can also keep exploring related guides from the Copper Analytics blog to compare tools, setup patterns, and reporting workflows before making a decision.