Derived and Aggregate Metrics in BrandWise
Overall Score, Confidence, Mention Rate, Top-1 Rate, and other aggregate metrics in BrandWise. How they're calculated, what they mean, and how to use them.
Aggregation Levels
BrandWise aggregates scores at three levels — from individual model responses to scenario-wide summaries:
| Level | Description |
|---|---|
| Item | One response from one model to one intent |
| Model | All responses from one model within a run |
| Scenario | All models, all intents — the run summary |

Overall Item Score (0–100)
The score for a single response — a weighted average of applicable metrics:
| Metric | Weight |
|---|---|
| Top of Mind | 30% |
| Consideration | 25% |
| Visibility | 20% |
| Relevance | 10% |
| Positioning Match | 10% |
| Usefulness | 5% |
Calculation rules:
- Only metrics where the score is applicable are included in the calculation
- For Visibility, Top of Mind, and Consideration: additionally requires Query Context = Organic
- Mention context multiplier: if the brand is mentioned negatively or with caveats, a multiplier is applied to all metrics except Visibility (Positive = ×1.0, Mixed = ×0.5, Negative = ×0.2)
- Final score: sum of (weight × adjusted value) divided by sum of weights of applicable metrics
- If no metrics are applicable — Overall Score = N/A
Overall Model Score (0–100)
The median of Overall Item Scores across all model responses within a run. Shows how a given model generally represents the brand.
Overall Scenario Score (0–100)
Weighted average across all models in the run. This is the bottom-line indicator of brand representation quality in AI for the given scenario.
Confidence (0–1)
A measure of evaluation reliability. Higher Confidence means more trustworthy results.
Confidence is calculated per metric based on agreement between multiple judge runs:
- For each metric, standard deviation across judge runs is measured
- Per-metric confidence:
1.0 − min(1.0, std_dev / 25.0)— the lower the spread, the higher the confidence - With a single judge run, per-metric confidence defaults to 0.5
- Overall confidence: average of all per-metric confidence values
At the scenario level, confidence is additionally multiplied by coverage — the ratio of evaluated items to total items. This penalizes incomplete evaluations.
Interpreting Confidence
| Range | Level | Recommendation |
|---|---|---|
| 0.8–1.0 | High | Results can be fully trusted |
| 0.6–0.79 | Medium | Results are reliable, but worth verifying evidence quotes |
| 0.4–0.59 | Low | Verification recommended — potential data quality issues |
| 0–0.39 | Very Low | Data unreliable — check original model responses |

Statistical Metrics
Mention Rate %
Share of responses where the brand is mentioned. The baseline presence indicator.
Mention Rate = responses with mention / total responses × 100%Eligible Rate %
Share of responses where the brand is deemed eligible (Eligibility = Eligible). Shows what percentage of the time the brand is relevant to the query.
Organic Share %
Share of responses with organic context (Query Context = Organic). Higher Organic Share means more data for Visibility, Top of Mind, and Consideration metrics.
Top-1 Rate %
Share of responses where the brand is mentioned first. A key indicator of category leadership.
Top-3 Rate %
Share of responses where the brand is in the top 3 mentioned. A softer criterion showing consistent presence at the top.
Avg Mention Order
Average position of the brand's first mention. Lower numbers mean the brand appears earlier. Calculated only from responses where the brand is mentioned.
vs Target Brand Delta
Difference between a competitor's TOM score and your brand's TOM score. Positive values mean the competitor leads; negative values mean your brand is ahead.
Mention Context
Every response where the brand is mentioned is classified by mention context — the tone in which the brand is described:
| Context | Multiplier | Description |
|---|---|---|
| Positive | ×1.0 | Brand presented positively or neutrally |
| Mixed | ×0.5 | Brand mentioned with caveats ("good, but...") |
| Negative | ×0.2 | Brand used as a counter-example, not recommended, criticized |
The multiplier applies to all metrics except Visibility when computing the Overall Score. This means a negative mention of the brand significantly reduces the Overall Score even if individual metrics are high.
Query Context Classification
Every model response is classified by context type:
| Type | Description | Impact on Metrics |
|---|---|---|
| Organic | User didn't name the brand in their query | All 6 metrics applicable and count toward Overall Score |
| Brand Prompted | Brand explicitly mentioned in query | Visibility, Top of Mind, Consideration excluded from Overall Score |
| Unclear | Ambiguous context | Top of Mind and Consideration not applicable |
Organic context is the most valuable for analysis because it shows the model's autonomous behavior without prompting.
How to Use Aggregate Metrics
- Overall Score — for quick assessment and comparison between runs
- Confidence — for filtering unreliable results
- Mention Rate — for baseline brand presence evaluation
- Top-1 / Top-3 Rate — for competitive analysis
- vs Target Delta — for tracking dynamics relative to competitors
All these metrics are available on the overview dashboard and in custom reports.
Related Sections
- Metrics System Overview — the 6 core metrics
- Top of Mind — Brand TOM competitive table
- First Project in 10 Minutes — how to run an evaluation
Consideration — Does Your Brand Make the AI Shortlist?
The Consideration metric in BrandWise: whether your brand makes the recommendation shortlist in ChatGPT, Claude, and Gemini. Formula, shortlist presence, comparative advantage, examples.
Overview Dashboard — Key Brand Visibility Metrics in AI
How to read the BrandWise dashboard: KPI cards, trend charts, scenario comparison table, Key Insights, and competitive analysis. Navigation, filters, and data interpretation.