- Accounting for Data Errors Discovered from an Audit in Multiple Linear Regression
- Summary A data coordinating team performed onsite audits and discovered discrepancies between the data sent to the coordinating center and that recorded at sites. We present statistical methods for incorporating audit results into analyses. This can be thought of as a measurement error problem, where the distribution of errors is a mixture with a point mass at 0. If the error rate is nonzero, then even if the mean of the discrepancy between the reported and correct values of a predictor is 0, naive estimates of the association between two continuous variables will be biased. We consider scenarios where there are (1) errors in the predictor, (2) errors in the outcome, and (3) possibly correlated errors in the predictor and outcome. We show how to incorporate the error rate and magnitude, estimated from a random subset (the audited records), to compute unbiased estimates of association and proper confidence intervals. We then extend these results to multiple linear regression where multiple covariates may be incorrect in the database and the rate and magnitude of the errors may depend on study site. We study the finite sample properties of our estimators using simulations, discuss some practical considerations, and illustrate our methods with data from 2815 HIV-infected patients in Latin America, of whom 234 had their data audited using a sequential auditing plan.
- An XML model of an enhanced data dictionary to facilitate the exchange of pre-existing clinical research data in international studies
- Pre-existing clinical research data sets exchanged in international epidemiology research often lack the elements needed to assess their suitability for use in multi-region meta-analyses or other clinical studies. While the missing information is generally known to local investigators, it is not contained in the files exchanged between sites. Instead, such content must be solicited by the study coordinating center though a series of lengthy phone and electronic communications: an informal process whose reproducibility and accuracy decays over time. This report describes a set of supplemental information needed to assess whether clinical research data from diverse research sites are truly comparable, and what metadata ("data about the data") should be preserved when a data set is archived for future use. We propose a structured Extensible Markup Language (XML) model that captures this information. The authors hope this model will be a first step towards preserving the metadata associated with clinical research data sets, thereby improving the quality of international data exchange, data archiving, and merged-data research using data collected in many different countries, languages and care settings.
- Preliminary findings on cancer incidence in HIV-infected persons from six countries in Central and South America and the Caribbean
- Cross-Sectional Analysis of Late HAART Initiation in Latin America and the Caribbean: Late Testers and Late Presenters.
- BACKGROUND: Starting HAART in a very advanced stage of disease is assumed to be the most prevalent form of initiation in HIV-infected subjects in developing countries. Data from Latin America and the Caribbean is still lacking. Our main objective was to determine the frequency, risk factors and trends in time for being late HAART initiator (LHI) in this region. METHODOLOGY: Cross-sectional analysis from 9817 HIV-infected treatment-naïve patients initiating HAART at 6 sites (Argentina, Chile, Haiti, Honduras, Peru and Mexico) from October 1999 to July 2010. LHI had CD4(+) count ≤200 cells/mm(3) prior to HAART. Late testers (LT) were those LHI who initiated HAART within 6 months of HIV diagnosis. Late presenters (LP) initiated after 6 months of diagnosis. Prevalence, risk factors and trends over time were analyzed. PRINCIPAL FINDINGS: Among subjects starting HAART (n = 9817) who had baseline CD4(+) available (n = 8515), 76% were LHI: Argentina (56%[95%CI:52-59]), Chile (80%[95%CI:77-82]), Haiti (76%[95%CI:74-77]), Honduras (91%[95%CI:87-94]), Mexico (79%[95%CI:75-83]), Peru (86%[95%CI:84-88]). The proportion of LHI statistically changed over time (except in Honduras) (p≤0.02; Honduras p = 0.7), with a tendency towards lower rates in recent years. Males had increased risk of LHI in Chile, Haiti, Peru, and in the combined site analyses (CSA). Older patients were more likely LHI in Argentina and Peru (OR 1.21 per +10-year of age, 95%CI:1.02-1.45; OR 1.20, 95%CI:1.02-1.43; respectively), but not in CSA (OR 1.07, 95%CI:0.94-1.21). Higher education was associated with decreased risk for LHI in Chile (OR 0.92 per +1-year of education, 95%CI:0.87-0.98) (similar trends in Mexico, Peru, and CSA). LHI with date of HIV-diagnosis available, 55% were LT and 45% LP. CONCLUSION: LHI was highly prevalent in CCASAnet sites, mostly due to LT; the main risk factors associated were being male and older age. Earlier HIV-diagnosis and earlier treatment initiation are needed to maximize benefits from HAART in the region.