Bad science: Part 1
In a recent scientific publication by Galgo Medical1, Humbert et al. evaluated the cortical thickness extracted from 3D reconstructions from DXA images with the same thickness measurements from CT scans. He reported a mean absolute error of 0.31 mm. Does that sound good to you? Who knows. The truth is that this evaluation is meaningless. We do not know what accuracy is required for a diagnosis or to extract any meaningful information from a patients bone. It is also no indication that the software actually works.The best you can do is show that it is better than nothing. Nothing here means just an informed guess of the thickness. We could for instance just assume an average cortical thickness. Then the question would become: does this method measure the cortical thickness better than just assuming an average cortical thickness? Surely such complicated piece of software should do better than just presuming any particular individual has the population mean thickness distribution. Well, I’m sorry to tell you this, but no, it does not.
Ludovic Humbert previously applied the cortical thickness mapping software “Stradwin” to the reconstructions I obtained from my thesis, with the intention of publishing it at a conference. When he was informed about the importance of this evaluation he indeed was not able to show an improvement over the mean estimator. In e-mail communications he said this was an important evaluation that he should have done. He never published this work, partly because with this study he was trying to republish my experimental results without my approval.
When he started to commercialize the 3D-DXA software, he clearly was determined to sell also the cortical thickness mapping component. Knowing full well that this technology is no better than a mean estimator he left out this evaluation from the above publication. I think few would disagree with me when I say this is scientific misconduct, made all the worse because of this financial interest in this technology. I will discuss about conflict of interest in medical image analysis publications in a later blog post.
Unfortunately it does not end here. In this paper he referenced a previous publication by him2 where he evaluated the proposed technique to measure the cortical thickness. It was evaluated by comparing the measurements from clinical CT with the measurements from the same subject micro CT scans. This would be a correct way to evaluate this technique, if it did not have one very basic flaw. Let’s assume I want to make a model of the number of bicycles people have in the Netherlands. I go onto the street in Amsterdam and ask a 100 people how many bicycles they own. It turns out that in total these people own 90 bicycles. My model is now 0.9 bicycles per person. To evaluate if this model is correct I then ask these same 100 people again how many bicycles they own and compare it with the 0.9 x 100 = 90 bicycles I get from applying my model. Wow, exactly the same! My model must be perfect! Of course this is not true. I should have asked a different group of people to evaluate my model. In general, you always perform the evaluation on a different dataset as the one with which you did the training. This is one of the basic principles of good science. Something Humbert et al. failed to do in this publication. Intentional or a mistake. Both equally disturbing.
- L. Humbert et al. 3D-DXA: Assessing the Femoral Shape, the Trabecular Macrostructure and the Cortex in 3D from DXA images. IEEE Trans Med Imaging. 2017 Jan;36(1):27-39.
- L. Humbert et al. Technical Note: Cortical thickness and density estimation from clinical CT using a prior thickness-density relationship. Med Phys, 2016; 43(4):1945-1954.