Derived and Aggregate Metrics in BrandWise

Aggregation Levels

BrandWise aggregates scores at three levels — from individual model responses to scenario-wide summaries:

Level	Description
Item	One response from one model to one intent
Model	All responses from one model within a run
Scenario	All models, all intents — the run summary

Report hero cards — Overall Score, Confidence, Mention Rate

Overall Item Score (0–100)

The score for a single response — a weighted average of applicable metrics:

Metric	Weight
Top of Mind	30%
Consideration	25%
Visibility	20%
Relevance	10%
Positioning Match	10%
Usefulness	5%

Calculation rules:

Only metrics where the score is applicable are included in the calculation
For Visibility, Top of Mind, and Consideration: additionally requires Query Context = Organic
Mention context multiplier: if the brand is mentioned negatively or with caveats, a multiplier is applied to all metrics except Visibility (Positive = ×1.0, Mixed = ×0.5, Negative = ×0.2)
Final score: sum of (weight × adjusted value) divided by sum of weights of applicable metrics
If no metrics are applicable — Overall Score = N/A

Overall Model Score (0–100)

The median of Overall Item Scores across all model responses within a run. Shows how a given model generally represents the brand.

Overall Scenario Score (0–100)

Weighted average across all models in the run. This is the bottom-line indicator of brand representation quality in AI for the given scenario.

Confidence (0–1)

A measure of evaluation reliability. Higher Confidence means more trustworthy results.

Confidence is calculated per metric based on agreement between multiple judge runs:

For each metric, standard deviation across judge runs is measured
Per-metric confidence: 1.0 − min(1.0, std_dev / 25.0) — the lower the spread, the higher the confidence
With a single judge run, per-metric confidence defaults to 0.5
Overall confidence: average of all per-metric confidence values

At the scenario level, confidence is additionally multiplied by coverage — the ratio of evaluated items to total items. This penalizes incomplete evaluations.

Interpreting Confidence

Range	Level	Recommendation
0.8–1.0	High	Results can be fully trusted
0.6–0.79	Medium	Results are reliable, but worth verifying evidence quotes
0.4–0.59	Low	Verification recommended — potential data quality issues
0–0.39	Very Low	Data unreliable — check original model responses

Score Drivers — metric decomposition of Overall Score

Statistical Metrics

Mention Rate %

Share of responses where the brand is mentioned. The baseline presence indicator.

Mention Rate = responses with mention / total responses × 100%

Eligible Rate %

Share of responses where the brand is deemed eligible (Eligibility = Eligible). Shows what percentage of the time the brand is relevant to the query.

Share of responses with organic context (Query Context = Organic). Higher Organic Share means more data for Visibility, Top of Mind, and Consideration metrics.

Top-1 Rate %

Share of responses where the brand is mentioned first. A key indicator of category leadership.

Top-3 Rate %

Share of responses where the brand is in the top 3 mentioned. A softer criterion showing consistent presence at the top.

Avg Mention Order

Average position of the brand's first mention. Lower numbers mean the brand appears earlier. Calculated only from responses where the brand is mentioned.

vs Target Brand Delta

Difference between a competitor's TOM score and your brand's TOM score. Positive values mean the competitor leads; negative values mean your brand is ahead.

Mention Context

Every response where the brand is mentioned is classified by mention context — the tone in which the brand is described:

Context	Multiplier	Description
Positive	×1.0	Brand presented positively or neutrally
Mixed	×0.5	Brand mentioned with caveats ("good, but...")
Negative	×0.2	Brand used as a counter-example, not recommended, criticized

The multiplier applies to all metrics except Visibility when computing the Overall Score. This means a negative mention of the brand significantly reduces the Overall Score even if individual metrics are high.

Query Context Classification

Every model response is classified by context type:

Type	Description	Impact on Metrics
Organic	User didn't name the brand in their query	All 6 metrics applicable and count toward Overall Score
Brand Prompted	Brand explicitly mentioned in query	Visibility, Top of Mind, Consideration excluded from Overall Score
Unclear	Ambiguous context	Top of Mind and Consideration not applicable

Organic context is the most valuable for analysis because it shows the model's autonomous behavior without prompting.

How to Use Aggregate Metrics

Overall Score — for quick assessment and comparison between runs
Confidence — for filtering unreliable results
Mention Rate — for baseline brand presence evaluation
Top-1 / Top-3 Rate — for competitive analysis
vs Target Delta — for tracking dynamics relative to competitors

All these metrics are available on the overview dashboard and in custom reports.

Metrics System Overview — the 6 core metrics
Top of Mind — Brand TOM competitive table
First Project in 10 Minutes — how to run an evaluation

Start monitoring your brand

Derived and Aggregate Metrics in BrandWise

On this page