The Six Dimensions of Data Quality in Surveillance
The standard framework used in WHO and DHIS2 data quality assessments measures six dimensions:
- Accuracy: Does the data correctly reflect the events it represents?
- Completeness: What proportion of expected data values were actually reported?
- Timeliness: Was data submitted within the required timeframe?
- Consistency: Is the data internally consistent within a report and across periods?
- Integrity: Has data been modified after submission? Is there an audit trail?
- Precision: Does the data capture enough granularity to support the required decisions?
A DHIS2 data quality strategy must address all six not just the one that is easiest to measure.
Layer 1: Validation Rules at the Point of Entry
The most cost-effective quality assurance is prevention at source. DHIS2's validation rules engine checks logical constraints at submission time and alerts the data entry officer before data is saved.
Essential validation rules for disease surveillance:
- Range checks: A facility with 5,000 catchment population cannot report 10,000 outpatient visits in one month. Set plausible maximums based on facility size and historical data.
- Consistency checks: Confirmed cases cannot exceed suspected cases. Laboratory-confirmed cases cannot exceed specimens collected.
- Date logic (Tracker): Date of specimen collection cannot precede registration date. Date of outcome cannot precede investigation date.
- Zero vs. missing distinction: Configure the form to distinguish a reported zero from a blank field. These mean very different things for data quality analysis.
Calibrate rules carefully rules that trigger 40% of the time are obstacles, not controls.
Layer 2: Reporting Completeness Monitoring
Completeness is a prerequisite for interpreting any other data quality metric. A surveillance system showing low case counts may reflect genuine low transmission or 40% missing reports. Without completeness data these interpretations are indistinguishable.
- Define expected reporting units. Every facility expected to submit reports must be pre-defined in the system.
- Set reporting periods and deadlines. Track whether reports arrive before or after the deadline as a timeliness metric.
- Build the completeness dashboard. A single-value card showing percentage of expected reports received, colour-coded green/amber/red, should be the first view surveillance managers open each week.
- Automate completeness reminders. Configure DHIS2's messaging system to send deadline reminders to facilities that have not yet reported. This simple step raised on-time reporting rates from approximately 71% to 89% in the WHO polio programmes within two reporting cycles.
Layer 3: Data Quality Reviews Structure and Frequency
Structure data quality reviews at three levels:
- Facility-level review (monthly)
- A facility supervisor checks the previous month's data for implausible values, unexplained trends, and completeness gaps. Findings are shared with the facility data entry officer within two weeks of the reporting deadline.
- LGA-level review (quarterly)
- Compares reported data against source documents facility registers, case investigation forms, laboratory log books. Produces a data quality score for each indicator in each facility.
- State or national-level review (semi-annual)
- A cross-facility review that identifies systematic patterns LGAs that consistently over-report, facilities that always report exactly at the threshold. These patterns indicate systemic problems requiring intervention beyond individual facility correction.
Track the data quality score for each indicator over time. If quality is improving, document why and replicate the approach. If it is declining, identify where and investigate the cause.
Layer 4: Data Quality Feedback Loops
A data quality review that produces a report but no feedback creates no improvement. Effective feedback loops include:
- Report back to facilities, not just upward. When a review identifies errors, that facility should receive specific written feedback within two weeks specifying which indicator, which period, what the issue is, and what corrective action is required.
- Recognise good data quality. Positive recognition for facilities maintaining high completeness and accuracy over multiple cycles is often sufficient acknowledgement in programme communications works without financial incentives.
- Track corrections. Verify that requested corrections are made and that corrected data is accurate. A feedback loop that triggers corrections that are never verified is only half a loop.
- Use findings to improve forms. When the same error occurs consistently across multiple facilities, the form or indicator definition needs revision, not just data correction.
The Data Quality Audit: A Field-Level Protocol
For programmes conducting formal Data Quality Audits (standard in WHO and USAID-funded programmes), the field-level protocol should cover:
- Source document review: Compare DHIS2-reported data against facility registers and investigation forms. Calculate discrepancy rate percentage of reported values differing from source documents by more than ±10%.
- Register completeness: Review the facility register for completeness of required fields. An incomplete register is a data quality failure upstream of DHIS2 entry.
- Data flow verification: Confirm the chain from source document to DHIS2 entry. Who transfers data? How often? What verification exists before submission?
- Cross-check with independent data sources: Compare DHIS2 case counts with laboratory records or hospital admission registers. Significant divergence indicates either a completeness failure or a data manipulation concern.
When Data Quality Is a Political Problem
Not all data quality problems are technical. Health workers under pressure to meet immunisation coverage targets may report above actual performance. Surveillance officers under pressure to show outbreak control may under-report cases. These are programme governance problems symptoms detectable through data (implausible values, sudden improvements without field evidence), but caused by results frameworks that reward reported performance over actual performance.
A programme that produces honest data on a struggling programme is further from impact than one producing honest data on a high-performing one but it is closer to the evidence-based course correction that will eventually produce results. That is the core of accountability in M&E design.