Bad science: Statistical Parametric Mapping

In a previous blog post I showed that 3D-DXA can not measure the cortical thickness, despite claims to the contrary by companies that are trying to sell this software. Nonetheless several studies on the cortical thickness maps have already been presented at conferences. Considering the fact that they did not actually measure the thickness, how is it possible that they are still able to present significant positive results? I would like to illustrate this with a study by Robert Güerri Fernández1 of which the poster can be found here. 37 HIV patients (30 men and 7 women) with a median age of 38 were included in this study. Let us for now ignore the fact that one should not even attempt to reconstruct the bone of a 38 year old male HIV patient using a statistical model that is constructed from elderly osteoporotic woman.

In this study the thickness map was extracted from a 3D reconstruction at baseline and 1 year after treatment with Tenofovir Disproxil Fumarate. A population based evaluation was then performed by, at each location on the surface, assessing the significance of the change by a paired T-test. This then creates a map with patches where the authors claim there is a significant decrease in bone thickness. To understand why this is fundamentally wrong I will give a line from the book “Bad Pharma” by Ben Goldacre: “if you give yourself multiple chances to finding a positive result, but use statistical tests that assume you only had one go, you hugely increase your chances of getting a misleading false positive”.

When you do a significance test you calculate what the chance is that you have gotten this positive result from pure luck. The threshold is then usually set at 20%. When you do two tests, you have double the chance to get a positive result. What you then have to do is to double your threshold to 40%. Correcting your significance level in this way is called a Bonferroni correction. If you take thousands of samples all over the bone surface and you set a threshold as if you take only one sample (20%), you are bound to find some points with statistically significant changes, just by pure luck. These are the results presented by Robert Güerri Fernández.

Now, obviously there should still be value in doing a comparison of thickness maps. Neighboring points will be very similar and can perhaps be considered as one test. Bonferroni would correction greatly over correct the significance level. This is a topic that has be extensively researched in the field of neuroimaging where large sets of brain scans are compared with each other and has been solved by some very smart mathematicians using random field theory. The mathematics are to complicated for most of us, but fortunately there are several free tools available that will do this so called “statistical parametric mapping”. So the keyword is “Statistical Parametric Mapping”. If this technique is not used when assessing changes or differences in cortical maps, you should dismiss this study. In fact, don’t even look at it to prevent you from unintentionally distorting your opinion of a treatment type. If you haven’t already looked at the poster of this study I linked to above. Don’t!

  1. Robert Güerri-Fernández, Ludovic Humbert, Judit Villar-García, Roger Fonollà, Lucia Moro, Leo Mellibovsky, Xavier Nogues, Marta Trenchs-Rodríguez, Hernando Knobel, Adolfo Díez-Pérez. Analyzing the cortical and trabecular bone of tenofovir-treated HIV patients using 3D-DXA. ASBMR 2016 Annual Meeting.