Executive Report
Back to ReportsHallucination Detection Metrics - Measuring AI Trustworthiness
Filters
Report Period
9
Reports Analyzed36
Total Insights188
RAG QueriesDec 3, 2025
toDec 10, 2025
Overall Status
Hallucination Detection Metrics
| Metric | Current | Target | Status | Details |
|---|---|---|---|---|
|
Citation Coverage Rate
Insights with documentation references |
11.1% | ≥85% | ✗ Fail | 4 / 36 |
|
Documentation Grounding Rate
References with specific page numbers |
0.0% | ≥80% | ✗ Fail | 0 / 40 |
|
Recommendation Specificity Rate
Recommendations with navigation steps |
97.2% | ≥75% | ✓ Pass | 35 / 36 |
|
RAG Retrieval Success Rate
RAG queries returning high-relevance results |
0.0% | ≥80% | ✗ Fail | 0 / 188 |
|
Average RAG Relevance Score
Mean relevance across all RAG queries |
0.0% | ≥70% | ✗ Fail | 0 / 188 |
Understanding the Metrics
Citation Coverage Rate
Measures whether the AI is grounding its observations and recommendations in documentation. A high rate means the AI is citing sources for its claims.
Documentation Grounding Rate
Measures how specific the citations are. References with page numbers (e.g., "Page 256, Section: Procedure") are verifiable and actionable.
Recommendation Specificity Rate
Measures whether recommendations include specific navigation steps (e.g., "Navigate to Settings → Security → Access Control"). Specific steps are actionable; vague advice is not.
RAG Retrieval Success Rate
Measures whether the RAG system is finding relevant documentation. A query is "successful" if it returns at least one result with relevance score ≥0.70.
Average RAG Relevance Score
The mean of the top relevance scores across all RAG queries. Higher scores mean the RAG system is returning more semantically relevant documentation.