How we use Columbus AEO to create research-backed product comparisons
Learn how we use our tool Columbus AEO to create product comparisons in different industries based on thousands of AI recommendations and used sources.
Emilio Irmscher
March 28, 2026
How We Collect AI Visibility Data: The Columbus Methodology
Last updated: March 2026
Every data point published in Columbus research reports — including our industry comparison series — comes from the same methodology. This page explains exactly how we collect data, why we made the technical decisions we did, and what the limitations are.
We're publishing this because we think transparency matters, especially in a space where a lot of "AI visibility data" is either API-approximated, sampled from small prompt sets, or not explained at all.
The core approach: real browser sessions, not APIs
Most AEO tools query AI platforms through their developer APIs. We don't.
API responses are different from what actual users see. The same prompt submitted via API versus via a logged-in browser session can return meaningfully different results — different sources cited, different brands mentioned, different levels of detail. If you're trying to understand what your customers actually see when they ask ChatGPT about your industry, API data gives you a proxy at best.
Columbus runs every prompt through real browser sessions using authenticated user accounts. The desktop app opens the AI platform in a controlled browser environment, submits the prompt exactly as a human would, and captures the full response including all cited sources. This is the same data your customers are seeing.
The tradeoff is that this requires a desktop app and authenticated accounts for each platform. That's a higher setup bar than a web dashboard. We think it's worth it for data accuracy.
Platforms covered
All research runs across six platforms:
- ChatGPT
- Gemini
- Perplexity
- Claude
- Google AI Overviews (AIO)
- Google AI Mode
Each platform is tracked independently. We do not aggregate results across platforms by default because, as our research consistently shows, platform behavior varies significantly — sometimes dramatically.
Prompt design
For each industry analysis, we design a set of prompts that represent realistic user queries. These are questions real people plausibly ask when researching tools or services in that category.
For our SEO tools study, examples included:
- "What are the best SEO tools for a small business?"
- "Which SEO tool should I use for keyword research?"
- "What do professionals use for SEO audits?"
A few principles we follow when designing prompts:
Realistic phrasing. We write prompts the way users actually search, not the way a marketer would frame the category. "Best SEO tools" not "top enterprise SEO platforms."
Intent variation. We include informational, commercial, and comparison-style queries. Different intent types surface different sources.
No leading prompts. We don't include brand names or leading qualifiers in the prompts themselves. The goal is to see what AI recommends organically, not to test whether AI will mention a specific brand when prompted.
Volume: why we run each prompt many times
AI responses are non-deterministic. The same prompt submitted twice to the same platform can return different sources, different brands, and different recommendations. This is a fundamental property of how large language models work — they sample from probability distributions, not a fixed lookup table.
A single run tells you almost nothing. One mention could be noise; one non-appearance doesn't mean you're invisible.
We run each prompt a minimum of 40 times per platform. For our industry reports, with 25 prompts across 6 platforms, this produces 6,000+ individual response captures. Across all captures, we typically collect several thousand unique source citations.
This volume is what allows us to speak in terms of relative frequency — "Reddit was cited 240 times across all platforms" — rather than binary yes/no presence.
What this still doesn't tell you: Frequency is an estimate, not a guarantee. Saying a domain was cited in 18% of runs for a given prompt doesn't mean it will appear in exactly 18 of your next 100 queries. It means that's the observed rate in our sample. Treat it as a directional signal, not a precise prediction.
What we capture per response
For each prompt response we record:
- All URLs cited as sources
- The domain of each cited URL
- Whether the tracked brand was mentioned by name in the response body
- The platform and prompt that generated the response
- Timestamp
We distinguish between citations (a URL was included as a source) and mentions (the brand name appeared in the response text). These are different signals. A brand can be heavily cited without being explicitly recommended by name, and vice versa.
Data export
All raw data is exportable from Columbus as a structured Excel file. The export includes every individual response capture, the full source list per response, and aggregate metrics by prompt and platform. The filename format is ColumbusAEO-export-[date-range].xlsx.
For our industry reports, the underlying export is linked in the post so readers can verify findings or run their own analysis.
Limitations
We'd rather be upfront about these than have someone find them later.
Sampling variance. As described above, more runs reduce variance but don't eliminate it. Unusual results in a single run get averaged out over hundreds of runs, but our estimates still carry uncertainty.
Temporal snapshot. AI model behavior changes over time as models are updated, fine-tuned, or trained on new data. A report published in March 2026 reflects platform behavior at that point in time. Sources that ranked highly six months ago may have shifted.
Account and region effects. We run scans from accounts in specific geographic regions. AI responses can vary by region, account history, and other factors we don't fully control. For multi-region analysis, we use proxy infrastructure to test specific markets — this is noted when relevant.
Platform access. If a platform changes its interface, authentication flow, or rate limiting behavior, it can temporarily affect our ability to collect data. We monitor for this and note any gaps in coverage.
Prompt coverage. No prompt set covers every possible way a user might ask about a category. Our findings reflect the specific prompts we designed. Different prompts would likely surface somewhat different results.
How to use this data
Our industry reports are most useful for directional decisions:
- Which platforms prioritize which types of sources (UGC, brand sites, media)
- Which domains have established strong AI visibility in a category
- How platform behavior differs from each other
- What the competitive landscape looks like in AI responses, independent of Google rankings
They're less suited for precise predictions ("I will appear in X% of queries") or real-time monitoring of your own brand (for that, use Columbus to run your own prompt sets against your own brand).
Running your own analysis
The methodology described here is exactly what Columbus does when you configure it for your own brand. You define the prompts, choose the platforms, set the run frequency, and the desktop app handles the rest.
The free tier includes 5 prompts across all 6 platforms. Paid tiers remove that limit.
Related Articles
Best AEO Tools for 2026: Optimize Your AI Visibility
Discover the top AEO tools of 2026 with our detailed comparison and guide to enhance AI-driven search visibility.
Top AI Search Visibility Tracking Tools to Boost Your Brand
Discover the best AEO tools for optimizing your brand's presence in AI-generated content. Explore features, pricing, and practical insights.
Best AEO Tools in 2026 (Free and Paid)
Discover the best AEO tools including Columbus AEO for improving brand visibility across AI platforms.
Ready to improve your AI visibility?
Start tracking your brand's presence in AI-generated responses today.
Get Started Free