International Epidemiology Databases to Evaluate AIDS

Home >> Publications >> Accounting for data errors discovered from an audit in multiple linear regression.



Shepherd BE, Yu C.

Pub Title:

Accounting for data errors discovered from an audit in multiple linear regression.

Pub Date:

Sep 30 2011

Journal Issue:


Page Number:



PubMed: 21281274
Pub PDF: PDF icon 21281274.pdf

A data coordinating team performed onsite audits and discovered discrepancies between the data sent to the coordinating center and that recorded at sites. We present statistical methods for incorporating audit results into analyses. This can be thought of as a measurement error problem, where the distribution of errors is a mixture with a point mass at 0. If the error rate is nonzero, then even if the mean of the discrepancy between the reported and correct values of a predictor is 0, naive estimates of the association between two continuous variables will be biased. We consider scenarios where there are (1) errors in the predictor, (2) errors in the outcome, and (3) possibly correlated errors in the predictor and outcome. We show how to incorporate the error rate and magnitude, estimated from a random subset (the audited records), to compute unbiased estimates of association and proper confidence intervals. We then extend these results to multiple linear regression where multiple covariates may be incorrect in the database and the rate and magnitude of the errors may depend on study site. We study the finite sample properties of our estimators using simulations, discuss some practical considerations, and illustrate our methods with data from 2815 HIV-infected patients in Latin America, of whom 234 had their data audited using a sequential auditing plan.

© 2011, The International Biometric Society.

PMID: 21281274 [PubMed - indexed for MEDLINE]

PMCID: PMC3092800


The following websites provide guidelines and policies when citing from PubMed®: http://www.ncbi.nlm.nih.gov/books/NBK7243/


Shepherd BE, Yu C. Accounting for data errors discovered from an audit in multiple linear regression. Biometrics. 2011 Sep;67(3):1083-91. doi: 10.1111/j.1541-0420.2010.01543.x. Epub 2011 Jan 31. PubMed PMID: 21281274; PubMed Central PMCID: PMC3092800.