DHIS2 Data Quality: Common Problems and How to Fix

Moving immunisation data quality from 54% to 81% across all reporting sites in the WHO polio programmes did not come from training health workers to enter data more carefully. It came from redesigning the system so that poor data quality was harder to produce than good data quality. Data quality is a design problem with a design solution.

The Six Dimensions of Data Quality in Surveillance

The standard framework used in WHO and DHIS2 data quality assessments measures six dimensions:

Accuracy: Does the data correctly reflect the events it represents?
Completeness: What proportion of expected data values were actually reported?
Timeliness: Was data submitted within the required timeframe?
Consistency: Is the data internally consistent within a report and across periods?
Integrity: Has data been modified after submission? Is there an audit trail?
Precision: Does the data capture enough granularity to support the required decisions?

A DHIS2 data quality strategy must address all six not just the one that is easiest to measure.

Layer 1: Validation Rules at the Point of Entry

The most cost-effective quality assurance is prevention at source. DHIS2's validation rules engine checks logical constraints at submission time and alerts the data entry officer before data is saved.

Essential validation rules for disease surveillance:

Range checks: A facility with 5,000 catchment population cannot report 10,000 outpatient visits in one month. Set plausible maximums based on facility size and historical data.
Consistency checks: Confirmed cases cannot exceed suspected cases. Laboratory-confirmed cases cannot exceed specimens collected.
Date logic (Tracker): Date of specimen collection cannot precede registration date. Date of outcome cannot precede investigation date.
Zero vs. missing distinction: Configure the form to distinguish a reported zero from a blank field. These mean very different things for data quality analysis.

Calibrate rules carefully rules that trigger 40% of the time are obstacles, not controls.

Layer 2: Reporting Completeness Monitoring

Completeness is a prerequisite for interpreting any other data quality metric. A surveillance system showing low case counts may reflect genuine low transmission or 40% missing reports. Without completeness data these interpretations are indistinguishable.

Define expected reporting units. Every facility expected to submit reports must be pre-defined in the system.
Set reporting periods and deadlines. Track whether reports arrive before or after the deadline as a timeliness metric.
Build the completeness dashboard. A single-value card showing percentage of expected reports received, colour-coded green/amber/red, should be the first view surveillance managers open each week.
Automate completeness reminders. Configure DHIS2's messaging system to send deadline reminders to facilities that have not yet reported. This simple step raised on-time reporting rates from approximately 71% to 89% in the WHO polio programmes within two reporting cycles.

Layer 3: Data Quality Reviews Structure and Frequency

Structure data quality reviews at three levels:

Facility-level review (monthly): A facility supervisor checks the previous month's data for implausible values, unexplained trends, and completeness gaps. Findings are shared with the facility data entry officer within two weeks of the reporting deadline.
LGA-level review (quarterly): Compares reported data against source documents facility registers, case investigation forms, laboratory log books. Produces a data quality score for each indicator in each facility.
State or national-level review (semi-annual): A cross-facility review that identifies systematic patterns LGAs that consistently over-report, facilities that always report exactly at the threshold. These patterns indicate systemic problems requiring intervention beyond individual facility correction.

Track the data quality score for each indicator over time. If quality is improving, document why and replicate the approach. If it is declining, identify where and investigate the cause.

Layer 4: Data Quality Feedback Loops

A data quality review that produces a report but no feedback creates no improvement. Effective feedback loops include:

Report back to facilities, not just upward. When a review identifies errors, that facility should receive specific written feedback within two weeks specifying which indicator, which period, what the issue is, and what corrective action is required.
Recognise good data quality. Positive recognition for facilities maintaining high completeness and accuracy over multiple cycles is often sufficient acknowledgement in programme communications works without financial incentives.
Track corrections. Verify that requested corrections are made and that corrected data is accurate. A feedback loop that triggers corrections that are never verified is only half a loop.
Use findings to improve forms. When the same error occurs consistently across multiple facilities, the form or indicator definition needs revision, not just data correction.

The Data Quality Audit: A Field-Level Protocol

For programmes conducting formal Data Quality Audits (standard in WHO and USAID-funded programmes), the field-level protocol should cover:

Source document review: Compare DHIS2-reported data against facility registers and investigation forms. Calculate discrepancy rate percentage of reported values differing from source documents by more than ±10%.
Register completeness: Review the facility register for completeness of required fields. An incomplete register is a data quality failure upstream of DHIS2 entry.
Data flow verification: Confirm the chain from source document to DHIS2 entry. Who transfers data? How often? What verification exists before submission?
Cross-check with independent data sources: Compare DHIS2 case counts with laboratory records or hospital admission registers. Significant divergence indicates either a completeness failure or a data manipulation concern.

When Data Quality Is a Political Problem

Not all data quality problems are technical. Health workers under pressure to meet immunisation coverage targets may report above actual performance. Surveillance officers under pressure to show outbreak control may under-report cases. These are programme governance problems symptoms detectable through data (implausible values, sudden improvements without field evidence), but caused by results frameworks that reward reported performance over actual performance.

A programme that produces honest data on a struggling programme is further from impact than one producing honest data on a high-performing one but it is closer to the evidence-based course correction that will eventually produce results. That is the core of accountability in M&E design.

Continue Reading

Frequently Asked Questions

What causes poor data quality in DHIS2?

The most common causes are transcription errors during manual data entry, missing values from facilities that failed to report, duplicate records from staff entering the same report twice, and incorrect period assignment when data from one month is entered under another. System configuration errors, such as incorrect validation rules or miscoded organisation units, compound these problems at scale.

How do you check data quality in DHIS2?

DHIS2 has a built-in data quality module that runs validation rules, checks for outliers, and produces completeness and timeliness reports. Beyond the built-in tools, analysts use pivot tables and the data visualiser to compare expected versus reported values, identify implausible jumps between periods, and cross-tabulate indicators that should move together. External audits involve visiting reporting facilities and reconciling paper tally sheets against what was entered in DHIS2.

What is a DHIS2 validation rule?

A DHIS2 validation rule is a logical condition that compares two data values and flags the relationship as a warning or violation when the condition is not met. A typical rule might check that the number of children who received a second polio dose does not exceed those who received a first dose. Validation rules run during data entry or in batch through the data quality module and produce a violations report for follow-up.

What is data completeness in DHIS2?

Data completeness in DHIS2 measures the percentage of expected reports that were actually submitted within a given period. A completeness rate below 80% is generally considered a data quality risk, because missing facilities create gaps that distort aggregate statistics. DHIS2 calculates completeness automatically when expected report counts are configured at each organisation unit level.

DHIS2 Data Quality: How to Build Systems That Produce Reliable Data