Here is a number that gets repeated in every ad fraud article on the internet: "51% of all internet traffic is bots."
That number comes from an Imperva report. It is accurate at the macro level. It is also completely useless for a media buyer running pop traffic in 2026.
Why? Because that 51% includes search engine crawlers, monitoring services, API scrapers, and every other legitimate automated request on the internet. It tells you nothing about your campaigns, your zones, or your budget waste.
The question that actually matters is this: what percentage of the traffic you are paying for right now is fake?
We built a framework to answer that question. It is based on analyzing 12.7 million pop traffic events through 18+ detection layers over 14 months of production traffic. Every step below is something you can implement today, regardless of what traffic source you use or what detection tool you choose.
Step 1: Establish Your Baseline (Days 1-3)
Before you can fix a bot problem, you need to know how big the problem is. Most media buyers skip this step. They either assume their traffic is clean because "the network said so," or they assume everything is bots because "pop traffic is cheap." Both assumptions cost money.
Here is what establishing a baseline actually means:
Run traffic through a detection engine for 48-72 hours
This is not optional. You cannot audit traffic by looking at conversion rates or bounce rates in your tracker. Bots click, bots load pages, some bots even trigger tracker pixels. The only way to separate real humans from fake traffic is to analyze the technical fingerprint of every single visitor.
A proper detection engine checks signals that bots cannot reliably fake:
- Browser-enforced headers — Sec-Fetch-Site, Sec-Fetch-Mode, Sec-Fetch-Dest. These headers are set by the browser engine itself. They cannot be spoofed by JavaScript or injected by a proxy. A real Chrome browser sending a navigational click will always include them. Most bot frameworks do not.
- Chrome build forensics — Chrome 110+ uses User-Agent Reduction, sending version numbers like
Chrome/131.0.0.0instead of full build numbers. This is a legitimate privacy feature. But Chrome versions below 110 sending.0.0.0patterns are suspicious — that combination did not exist in real Chrome releases. - Network origin analysis — Is the visitor connecting from a residential ISP, a mobile carrier, or an AWS datacenter? Real humans browse from homes and phones. Bots run on cloud servers. ASN lookups reveal the difference in milliseconds.
- IP reputation — Known threat intelligence feeds (FireHOL, CrowdSec) maintain real-time lists of IPs involved in scanning, brute-force attacks, and bot networks. A single IP match is not proof of a bot, but combined with other signals it is damning.
Track these four baseline metrics
| Metric | What It Tells You | Healthy Range |
|---|---|---|
| Accept Rate | Percentage of clicks that pass all detection layers | 65-85% |
| Average Trust Score | Mean score across all clicks (0-10 scale) | 6.0-8.0 |
| Hard Block Rate | Percentage killed by definitive signals (bot UA, blocklist IP) | < 5% |
| Datacenter Traffic % | Percentage of clicks from cloud/hosting ASNs | < 8% |
Your baseline numbers are not "good" or "bad" in isolation. They are the starting point. A 70% accept rate means 30% of your budget is going to non-human traffic. Whether that matters depends on your margins and your willingness to act on the data.
After 48-72 hours, you should have enough volume to see stable patterns. If your accept rate is fluctuating wildly hour to hour, you either need more volume or you have a source that mixes clean and dirty traffic throughout the day (which is itself useful intelligence).
Step 2: Analyze by Signal, Not Just Score (Days 3-5)
A single trust score is a summary. Summaries hide details. The details are where the money is.
After you have your baseline, drill into why traffic is being scored the way it is. Every blocked click should have a reason. Every penalty should trace back to a specific, verifiable signal. If your detection tool gives you a score but not a breakdown, you are flying blind.
What each signal category reveals
| Signal Category | What It Detects | Why It Matters for Media Buyers |
|---|---|---|
| Sec-Fetch Headers | Whether the browser engine itself confirms this is a real navigation | The highest-confidence browser signal available. Cannot be faked by JavaScript injection. Missing on 100% of curl/wget/Puppeteer-default requests. |
| Chrome Build Analysis | Whether the Chrome version number is real or fabricated | Headless Chrome and automation frameworks often ship with outdated or impossible version strings. UA Reduction awareness prevents false positives on modern Chrome. |
| Datacenter ASN Detection | Whether the IP belongs to a cloud hosting provider vs residential ISP | Legitimate users browse from ISPs like Comcast, Vodafone, or True Corp. Traffic from AWS, Hetzner, or DigitalOcean is almost never a real human clicking a pop ad. |
| Device Fingerprinting | OS/browser/device consistency across all headers | A request claiming to be Chrome on Android but sending Linux desktop headers is a misconfigured bot. These inconsistencies are invisible in tracker reports. |
| Behavioral Patterns | Click frequency, burst rates, timing patterns | A single IP sending 20 clicks in 60 seconds is not a human exploring the internet. Burst detection catches bot farms that rotate user agents but reuse IP addresses. |
| Threat Intelligence | Known malicious IPs from community blocklists | FireHOL and CrowdSec aggregate threat data from thousands of honeypots worldwide. An IP on these lists is involved in active attacks — not casual browsing. |
The signal mix tells the story
Here is what we have learned from analyzing millions of events:
- If your top block reason is "bot UA" — your traffic source is sending you crawlers and scrapers. This is the cheapest, lowest-effort bot traffic. The source either has no filtering or is actively selling junk inventory.
- If your top block reason is "datacenter ASN" — you are getting proxy and VPN traffic. This is more sophisticated than crawler traffic but still detectable at the network level. Common in programmatic/RTB inventory.
- If your top block reason is "missing Sec-Fetch headers" — you are getting headless browser traffic. These bots have real-looking user agents and reasonable headers but fail the browser-engine verification that only a real browser can pass. This is the most expensive bot traffic to produce and the hardest to catch with simple tools.
- If your block reasons are evenly distributed — your traffic source has a diverse bot problem across multiple categories. This often indicates the source aggregates from many sub-publishers with varying quality.
The goal of signal analysis is not to catch more bots. It is to understand what kind of bots you are dealing with so you can make better source-level decisions. A source with 5% bot rate from datacenter proxies is a different problem than a source with 5% bot rate from headless Chrome farms.
Step 3: Build Your Zone Intelligence (Days 5-10)
This is where the real money is. Click-level analysis tells you what happened. Zone-level analysis tells you what to do about it.
Every pop traffic source assigns each click a zone ID — the publisher site where the pop was triggered. Some sources call it a zone, some call it a site ID, some call it a source ID. Whatever the label, it represents the publisher origin. And publisher origins are not created equal.
Traffic Quality Tiers
After sufficient data (typically 200+ clicks per zone for statistical significance), every zone falls into one of three tiers:
| Tier | Criteria | What It Means | Action |
|---|---|---|---|
| Gold | Accept rate > 85%, avg trust > 7.0, real device signals present | Genuine human traffic from a real publisher. These zones are rare and valuable. | Scale budget, increase bids, protect at all costs |
| Silver | Accept rate 50-85%, avg trust 5.0-7.0, mixed signals | Some real traffic mixed with some bot traffic. The publisher likely has real visitors but also sells remnant inventory to bot sources. | Monitor closely, set tighter thresholds, review weekly |
| Filtered | Accept rate < 50%, avg trust < 5.0, or hard kill signals dominant | Predominantly bot traffic. The publisher is either a bot farm or does not control their traffic quality. | Block immediately, add to zone blocklist, stop wasting budget |
The zone distribution that changes everything
Here is the pattern we see consistently across every traffic source we have analyzed:
Read those numbers again. Roughly 40% of all zones across the pop traffic ecosystem are confirmed bot zones. That means nearly half of the publisher inventory you can buy is fake. But the other half — and especially that 8-12% of Gold zones — is genuinely valuable human traffic available at pop traffic prices.
This is why we call it "finding gold in trash." The gold exists. It is just buried under a mountain of bot zones that you need a system to identify and remove.
Zone-level metrics to track
For each zone, build a profile that includes:
- Accept rate — what percentage passes detection (most important single metric)
- Average trust score — mean score across all clicks from that zone
- Real device rate — percentage sending legitimate device fingerprints
- Sec-Fetch valid rate — percentage with browser-enforced navigation headers
- Datacenter rate — percentage from hosting/cloud ASNs
- Chrome zero rate — percentage with suspicious Chrome UA reduction patterns
- Revenue attribution — actual conversions or monetization from that zone (if applicable)
A zone that produces revenue is never blocked, regardless of its bot metrics. Revenue is the ultimate proof of human traffic. If real humans are converting, the detection engine should protect that zone even if some signals look suspicious.
Step 4: Implement and Monitor (Days 10-25)
You have your baseline. You understand your signal distribution. You have zone-level intelligence. Now it is time to act on it — carefully.
The 15-day shadow mode approach
Do not start blocking traffic on day one. Instead, run in shadow mode for 15 days:
| Phase | Duration | What Happens |
|---|---|---|
| Shadow Mode | Days 1-15 | Detection runs on every click. Every bot is identified and logged. But nothing is blocked. All traffic passes through to your destination. You see exactly what would have been filtered — without changing anything. |
| Hard Kill Only | Days 15-20 | Enable blocking for definitive signals only: bot user agents, blocklist IPs, known automation frameworks. These are zero-false-positive blocks. No legitimate human will ever be caught by these rules. |
| Full Protection | Day 20+ | Enable trust scoring, zone reputation, and all detection layers. Filtered zones are blocked. Gold zones are protected. Silver zones are monitored with tighter thresholds. |
Why 15 days of shadow mode? Because it gives you two things that are worth their weight in gold:
- Proof — You can show yourself (or your team, or your client) exactly how much money was wasted on bots during those 15 days. Real dollar amounts, real zone IDs, real evidence. No estimates, no industry averages. Your actual waste.
- Confidence — You have 15 days of data proving the detection engine does not flag your converting traffic. If your conversions stayed the same during shadow mode, you know that enabling blocking will not cost you money.
Automated zone blocklists
Manual zone blocking does not scale. With 21,000+ zones across a typical traffic source, you cannot review each one by hand. You need automated rules that evaluate zone quality on a schedule and export campaign-ready blocklists.
A good zone quality engine should run on a schedule (every 10-30 minutes is ideal) and evaluate zones against multiple criteria simultaneously:
- Bot evidence score combined with accept rate
- Minimum hit thresholds to avoid blocking zones with insufficient data
- Revenue protection — zones with conversions are never auto-blocked
- Chrome masquerade detection — zones where headless Chrome imitates real browsers
- Geo mismatch patterns — zones claiming one country but sending traffic from another
- IP concentration — zones where a small number of IPs generate most of the traffic
The output should be a blocklist file you can import directly into your traffic source's campaign settings. Most ad networks support zone exclusion lists in CSV or plain text format.
Configure trust thresholds per source
Not every traffic source needs the same threshold. A source that sells premium, compliance-verified traffic deserves a lower threshold (more permissive) than a source that sells bulk remnant inventory at $0.30 CPM.
Start with these guidelines:
| Traffic Type | Recommended Threshold | Rationale |
|---|---|---|
| Premium pop (higher CPM, verified) | 4.0 - 4.5 | Source does its own filtering. Your detection catches what slips through. Lower threshold avoids double-filtering good traffic. |
| Standard pop (mid CPM, general) | 5.0 - 5.5 | Default threshold. Balances protection with volume. Good starting point for most campaigns. |
| Cheap pop (low CPM, remnant) | 6.0 - 6.5 | Higher bot rate expected. Tighter threshold is justified because the low CPM gives you margin to be aggressive with filtering. |
| RTB/programmatic | 5.5 (pre-bid) + 5.0 (click) | Dual-phase detection. Pre-bid filtering at the bid level, then click-level verification for traffic that wins. |
What Good Looks Like After the Audit
After completing all four steps, here is what your traffic operation should look like:
- You know your real bot rate — not an industry average, not a guess. Your actual percentage, broken down by source, by zone, by signal type.
- You have a zone blocklist — automatically maintained, updated on a schedule, exportable to your campaigns. Every dollar you spend goes to a zone that has been evaluated and scored.
- You understand your signal distribution — you know whether your traffic problem is datacenter proxies, headless Chrome, or crawler bots. This tells you whether to change sources or tighten detection.
- You have Traffic Quality Tiers — Gold zones getting your best offers, Silver zones being monitored, Filtered zones blocked. Every click goes to the right place.
- You have proof — 15 days of shadow mode data proving exactly how much you were wasting and exactly what the detection engine catches. Numbers you can act on.
The Mistake Most Media Buyers Make
The biggest mistake is not having bots. Every traffic source has bots. The mistake is not knowing how many bots you have and not doing anything about it.
Media buyers who run without any detection are paying full price for traffic that will never convert, never click an ad, never generate a lead. That money is gone. And the worst part is they often blame the offer, the landing page, or the traffic source — when the real problem is that 20-40% of their "visitors" were never human to begin with.
The second biggest mistake is using the wrong tool. PPC-focused tools that rely on client-side JavaScript do not work for pop traffic. Pop traffic opens in a new window or tab. The page loads, the JavaScript runs (maybe), and by the time the client-side check reports back, the click has already been counted and billed. Server-side detection — running before the redirect — is the only approach that stops the waste before it happens.
The question is never "does my traffic have bots?" It does. The question is "do I know which zones are bots, and am I paying for them?" If the answer is no, you are leaving 15-30% of your budget on the table.
Start Your Audit Today
Everything in this guide can be implemented with PureGuard's free tier. 100,000 checks per month, all 18+ detection layers included, zone intelligence built automatically. No credit card, no commitment, no JavaScript to install.
The audit framework works the same whether you spend $10/day or $10,000/day. The only difference is how much money you save.
See Your Real Bot Rate in 48 Hours
Start with 100,000 free checks. All 18+ detection layers. Full zone intelligence. Shadow mode included. Find out exactly how much of your pop traffic budget goes to bots — before you block anything.
Start Free Audit Learn About Shadow Mode100,000 free checks. No credit card required. All detection layers included.