BETA — This feature is in beta testing. Metrics are for informational purposes only and should not be used for go/no-go decisions. v0.1

Executive Report

Back to Reports

Hallucination Detection Metrics - Measuring AI Trustworthiness

Filters
Report Period

9

Reports Analyzed

36

Total Insights

188

RAG Queries

Dec 3, 2025

to

Dec 10, 2025

Overall Status
✗ FAIL
1 Passing 0 Below 4 Failing
Hallucination Detection Metrics
Metric Current Target Status Details
Citation Coverage Rate
Insights with documentation references
11.1% ≥85% ✗ Fail 4 / 36
Documentation Grounding Rate
References with specific page numbers
0.0% ≥80% ✗ Fail 0 / 40
Recommendation Specificity Rate
Recommendations with navigation steps
97.2% ≥75% ✓ Pass 35 / 36
RAG Retrieval Success Rate
RAG queries returning high-relevance results
0.0% ≥80% ✗ Fail 0 / 188
Average RAG Relevance Score
Mean relevance across all RAG queries
0.0% ≥70% ✗ Fail 0 / 188
Understanding the Metrics
Citation Coverage Rate

Measures whether the AI is grounding its observations and recommendations in documentation. A high rate means the AI is citing sources for its claims.

Documentation Grounding Rate

Measures how specific the citations are. References with page numbers (e.g., "Page 256, Section: Procedure") are verifiable and actionable.

Recommendation Specificity Rate

Measures whether recommendations include specific navigation steps (e.g., "Navigate to Settings → Security → Access Control"). Specific steps are actionable; vague advice is not.

RAG Retrieval Success Rate

Measures whether the RAG system is finding relevant documentation. A query is "successful" if it returns at least one result with relevance score ≥0.70.

Average RAG Relevance Score

The mean of the top relevance scores across all RAG queries. Higher scores mean the RAG system is returning more semantically relevant documentation.

Tip: If RAG metrics are low, consider uploading better documentation or adjusting chunking strategies. If citation metrics are low, review the system prompts to emphasize documentation grounding.
Status Legend
✓ Pass Metric meets or exceeds target threshold
⚠ Below Metric is below target but within acceptable range
✗ Fail Metric is significantly below target - action required