How We Collect AI Visibility Data: The Columbus Methodology

Last updated: March 2026

Every data point published in Columbus research reports — including our industry comparison series — comes from the same methodology. This page explains exactly how we collect data, why we made the technical decisions we did, and what the limitations are.

We're publishing this because we think transparency matters, especially in a space where a lot of "AI visibility data" is either API-approximated, sampled from small prompt sets, or not explained at all.

The core approach: real browser sessions, not APIs

Most AEO tools query AI platforms through their developer APIs. We don't.

API responses are different from what actual users see. The same prompt submitted via API versus via a logged-in browser session can return meaningfully different results — different sources cited, different brands mentioned, different levels of detail. If you're trying to understand what your customers actually see when they ask ChatGPT about your industry, API data gives you a proxy at best.

Columbus runs every prompt through real browser sessions using authenticated user accounts. The desktop app opens the AI platform in a controlled browser environment, submits the prompt exactly as a human would, and captures the full response including all cited sources. This is the same data your customers are seeing.

The tradeoff is that this requires a desktop app and authenticated accounts for each platform. That's a higher setup bar than a web dashboard. We think it's worth it for data accuracy.

Platforms covered

All research runs across six platforms:

ChatGPT
Gemini
Perplexity
Claude
Google AI Overviews (AIO)
Google AI Mode

Each platform is tracked independently. We do not aggregate results across platforms by default because, as our research consistently shows, platform behavior varies significantly — sometimes dramatically.

Prompt design

For each industry analysis, we design a set of prompts that represent realistic user queries. These are questions real people plausibly ask when researching tools or services in that category.

For our SEO tools study, examples included:

"What are the best SEO tools for a small business?"
"Which SEO tool should I use for keyword research?"
"What do professionals use for SEO audits?"

A few principles we follow when designing prompts:

Realistic phrasing. We write prompts the way users actually search, not the way a marketer would frame the category. "Best SEO tools" not "top enterprise SEO platforms."

Intent variation. We include informational, commercial, and comparison-style queries. Different intent types surface different sources.

No leading prompts. We don't include brand names or leading qualifiers in the prompts themselves. The goal is to see what AI recommends organically, not to test whether AI will mention a specific brand when prompted.

Volume: why we run each prompt many times

AI responses are non-deterministic. The same prompt submitted twice to the same platform can return different sources, different brands, and different recommendations. This is a fundamental property of how large language models work — they sample from probability distributions, not a fixed lookup table.

A single run tells you almost nothing. One mention could be noise; one non-appearance doesn't mean you're invisible.

We run each prompt a minimum of 40 times per platform. For our industry reports, with 25 prompts across 6 platforms, this produces 6,000+ individual response captures. Across all captures, we typically collect several thousand unique source citations.

This volume is what allows us to speak in terms of relative frequency — "Reddit was cited 240 times across all platforms" — rather than binary yes/no presence.

What this still doesn't tell you: Frequency is an estimate, not a guarantee. Saying a domain was cited in 18% of runs for a given prompt doesn't mean it will appear in exactly 18 of your next 100 queries. It means that's the observed rate in our sample. Treat it as a directional signal, not a precise prediction.

What we capture per response

For each prompt response we record:

All URLs cited as sources
The domain of each cited URL
Whether the tracked brand was mentioned by name in the response body
The platform and prompt that generated the response
Timestamp

We distinguish between citations (a URL was included as a source) and mentions (the brand name appeared in the response text). These are different signals. A brand can be heavily cited without being explicitly recommended by name, and vice versa.

Data export

All raw data is exportable from Columbus as a structured Excel file. The export includes every individual response capture, the full source list per response, and aggregate metrics by prompt and platform. The filename format is ColumbusAEO-export-[date-range].xlsx.

For our industry reports, the underlying export is linked in the post so readers can verify findings or run their own analysis.

Limitations

We'd rather be upfront about these than have someone find them later.

Sampling variance. As described above, more runs reduce variance but don't eliminate it. Unusual results in a single run get averaged out over hundreds of runs, but our estimates still carry uncertainty.

Temporal snapshot. AI model behavior changes over time as models are updated, fine-tuned, or trained on new data. A report published in March 2026 reflects platform behavior at that point in time. Sources that ranked highly six months ago may have shifted.

Account and region effects. We run scans from accounts in specific geographic regions. AI responses can vary by region, account history, and other factors we don't fully control. For multi-region analysis, we use proxy infrastructure to test specific markets — this is noted when relevant.

Platform access. If a platform changes its interface, authentication flow, or rate limiting behavior, it can temporarily affect our ability to collect data. We monitor for this and note any gaps in coverage.

Prompt coverage. No prompt set covers every possible way a user might ask about a category. Our findings reflect the specific prompts we designed. Different prompts would likely surface somewhat different results.

How to use this data

Our industry reports are most useful for directional decisions:

Which platforms prioritize which types of sources (UGC, brand sites, media)
Which domains have established strong AI visibility in a category
How platform behavior differs from each other
What the competitive landscape looks like in AI responses, independent of Google rankings

They're less suited for precise predictions ("I will appear in X% of queries") or real-time monitoring of your own brand (for that, use Columbus to run your own prompt sets against your own brand).

Running your own analysis

The methodology described here is exactly what Columbus does when you configure it for your own brand. You define the prompts, choose the platforms, set the run frequency, and the desktop app handles the rest.

The free tier includes 5 prompts across all 6 platforms. Paid tiers remove that limit.

Get started →

How we use Columbus AEO to create research-backed product comparisons

How We Collect AI Visibility Data: The Columbus Methodology

The core approach: real browser sessions, not APIs

Platforms covered

Prompt design

Volume: why we run each prompt many times

What we capture per response

Data export

Limitations

How to use this data

Running your own analysis

Related Articles

How to Show Up in Google AI Overviews and Track Your Brand's Visibility

How to Do AEO (Answer Engine Optimization) for Your Website

How to Rank in AEO: A Practical Guide to Answer Engine Optimization

Ready to improve your AI visibility?