The balance test fallacy: Why you shouldn’t put p-values in Table 1

What is the purpose of Table 1? The purpose of table 1 is to describe the groups in the study, not to engage in inferential analysis and hypothesis testing. Statistically significant differences in a given characteristic in two groups does not mean that these groups are not comparable, or that the results will be biased.

“Inferential measures such as standard errors and confidence intervals should not be used to describe the variability of characteristics, and significance tests should be avoided in descriptive tables. Also, P values are not an appropriate criterion for selecting which confounders to adjust for in analysis; even small differences in a confounder that has a strong effect on the outcome can be important.”

Vandenbroucke, J. P., Elm, E. V., Altman, D. G., Gøtzsche, P. C., Mulrow, C. D., Pocock, S. J., … & Strobe Initiative. (2007). Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Annals of internal medicine, 147(8), W-163.

Should there be p-values in Table 1 of controlled trials (RCT)? P-values were traditionally used to assess for imbalanced baseline covariates in RCTs. However, this practice is increasingly ceasing in favor of qualitative description or no description of baseline imbalances. Randomization distributes known and unknown confounding factors, and any differences seen will be due to chance. Showing p-values can actually give a misleading picture of these differences.

“The authors of this debate are obviously more (Baethge) or less enthusiastic (Stang), with the former advocating the presentation of CIPs [covariate imbalance p values], its careful use as a screening tool, and its interpretation within the context of each study, while the latter emphasizes the dangers of misuse. The aim of this debate was to further trigger the discussion of the role of NHST in biomedical research that uses randomization… For the detection and judgment of imbalances between the study groups, it remains important that descriptive statistics of the groups (categorical characteristics: percentage values; continuous characteristics: eg, mean values and SDs) are presented. Whether a baseline imbalance is meaningful or not depends on subject matter knowledge. For example, it is clinically relevant in a stroke prevention study if 30% diabetics are in one arm of the study and only 15% are diabetics in the other arm, regardless of the p value, as diabetes mellitus is a very relevant risk factor for stroke.”

Stang, A., & Baethge, C. (2018). Imbalance p values for baseline covariates in randomized controlled trials: a last resort for the use of p values? A pro and contra debate. Clinical epidemiology, 531-535.

Are there any alternatives to the use of hypothesis testing for assessing balance? Multiple alternatives have been proposed which avoid hypothesis testing and p-values. These include tools that can assess balance graphically and numerically.

“In any study where all observed covariates were not fully blocked ahead of time, balance should be checked routinely by comparing observed covariate differences between the treated and control groups. Any statistic that is used to evaluate balance should have two key features: (a) it should be a characteristic of the sample and not of some hypothetical population and (b) the sample size should not affect the value of the statistic.”

Imai, K., King, G., & Stuart, E. A. (2008). Misunderstandings between experimentalists and observationalists about causal inference. Journal of the Royal Statistical Society Series A: Statistics in Society, 171(2), 481-502.

MatchIt contains several tools to assess balance numerically and graphically. The primary balance assessment function is summary.matchit(), which is called when using summary() on a MatchIt object and produces several tables of balance statistics before and after matching. plot.summary.matchit() generates a Love plot using R’s base graphics system containing the standardized mean differences resulting from a call to summary.matchit() and provides a nice way to display balance visually for inclusion in an article or report. plot.matchit() generates several plots that display different elements of covariate balance, including propensity score overlap and distribution plots of the covariates. These functions together form a suite that can be used to assess and report balance in a variety of ways.”
0 0 votes
Article Rating

Marzieh Ghiasi (@marziehg), MSc is an MD/PhD epidemiology trainee at Michigan State University. Her current research is focused on the genetic epidemiology of gynecologic disease, focusing on endometriosis. Her background in research is in airborne infectious disease transmission and environmental health. She is passionate about promoting stronger medical education, particularly focusing on epidemiological, biostatistics and clinical research skills.

Notify of

Inline Feedbacks
View all comments